ruby - How do I remove non UTF-8 characters from a String? -


i need remove non utf-8 characters string. here snap of text.

enter image description here

this how looks when open string in npp, , set encoding utf-8:

enter image description here

i think ack , ff non utf-8 characters.

i tried str.scrub str.encode. neither of them seems work. scrub returns same result, , encode results in error.

we have few problems.

the biggest ruby string stores arbitrary bytes along supposed encoding, no guarantee bytes valid in encoding , no obvious reason encoding have been chosen. (i might biased heavy user of python 3. never speak of "changing string 1 encoding another".)

fortunately, editor did not eat post, it's hard see that. i'm guessing decoded string windows-1252 in order display it, obscures issue.

here's string of bytes see it:

>> s = "\x06-~$a\xa7rug\xf9\"\x9a\f\xb6/k".b => "\x06-~$a\xa7rug\xf9\"\x9a\f\xb6/k" >> s.bytes => [6, 45, 126, 36, 65, 167, 114, 117, 71, 249, 34, 154, 12, 182, 47, 75] 

and contain bytes not valid utf-8.

>> s.encoding => #<encoding:ascii-8bit> >> string::new(s).force_encoding(encoding::utf_8).valid_encoding? => false 

we can ask decode utf-8 , insert � encounter bytes not valid utf-8:

>> s.encode('utf-8', 'binary', :undef => :replace) => "\u0006-~$a�rug�\"�\f�/k" 

Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -