python discard gibberish from unicode string -
i'm processing strings in python, , encountered string involves gibberish chars , other languages. example,
u'أضعت طريقي! أضعت طري asd bla bla ���� bla bla □□□□ more english , 123123 numbers'
what want keep real languanges (such arabic, chinese, english , hebrew), numbers , other utf-8 characters (or preferably al32utf8). above example u'أضعت طريقي! أضعت طري asd bla bla bla bla more english , 123123 numbers'
.
i have tried many combinations of python's encode, decode, unicode functions, in order rid of gibberish, have failed time after time.
i appreciate in matter :)
Comments
Post a Comment