python discard gibberish from unicode string -


i'm processing strings in python, , encountered string involves gibberish chars , other languages. example,

u'أضعت طريقي! أضعت طري asd bla bla ���� bla bla □□□□ more english , 123123 numbers' 

what want keep real languanges (such arabic, chinese, english , hebrew), numbers , other utf-8 characters (or preferably al32utf8). above example u'أضعت طريقي! أضعت طري asd bla bla bla bla more english , 123123 numbers'.

i have tried many combinations of python's encode, decode, unicode functions, in order rid of gibberish, have failed time after time.

i appreciate in matter :)


Comments

Popular posts from this blog

python Tkinter Capturing keyboard events save as one single string -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

javascript - Z-index in d3.js -