ruby - Changing odd (very high) ASCII chars in scraped HTML -


i processing html , had odd characters (which give line feed when use print command) did following:

d.each_char |c|; puts c + " " + c.ord.to_s; end 

i found character ord of 9644. seems unicode black rectangle. there ascii 219 looks similar wanted map ascii code. tried:

d = d.gsub( 9644.chr, 219.chr)  

this gave me error "exception: rangeerror: 9644 out of char range".

is there way can (i.e. change ord.9644 ord.219.

alternatively can change characters on ascii 255 '?', if can know how this.

regards, ben

how about:

d.chars.map(&:ord).map { |int| int == 9644 ? 219 : int }.map(&:chr).join 

if want replace high values ?, use instead:

high_limit = 999 # use whatever integer `#chr` method can handle d.chars.map(&:ord).map { |int| int > high_limit ? 255 : int }.map(&:chr).join 

the #chars method breaks string array of single characters, map(&:ord) converts array of integers representing utf-8 codes. map conditional block replaces each occurrence of 9644 219 , leaves other integers intact. map integers single characters map(&:chr) , join resulting array of characters string.

note code creates few arrays along way, each of equal in size length of original string, using on large strings may consume fair amount of memory. if case, might consider replacing map map! modify array in place.


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -