python discard gibberish from unicode string -


i'm processing strings in python, , encountered string involves gibberish chars , other languages. example,

u'أضعت طريقي! أضعت طري asd bla bla ���� bla bla □□□□ more english , 123123 numbers' 

what want keep real languanges (such arabic, chinese, english , hebrew), numbers , other utf-8 characters (or preferably al32utf8). above example u'أضعت طريقي! أضعت طري asd bla bla bla bla more english , 123123 numbers'.

i have tried many combinations of python's encode, decode, unicode functions, in order rid of gibberish, have failed time after time.

i appreciate in matter :)


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -