ruby - Changing odd (very high) ASCII chars in scraped HTML -
i processing html , had odd characters (which give line feed when use print command) did following:
d.each_char |c|; puts c + " " + c.ord.to_s; end
i found character ord of 9644. seems unicode black rectangle. there ascii 219 looks similar wanted map ascii code. tried:
d = d.gsub( 9644.chr, 219.chr)
this gave me error "exception: rangeerror: 9644 out of char range".
is there way can (i.e. change ord.9644 ord.219.
alternatively can change characters on ascii 255 '?', if can know how this.
regards, ben
how about:
d.chars.map(&:ord).map { |int| int == 9644 ? 219 : int }.map(&:chr).join
if want replace high values ?
, use instead:
high_limit = 999 # use whatever integer `#chr` method can handle d.chars.map(&:ord).map { |int| int > high_limit ? 255 : int }.map(&:chr).join
the #chars
method breaks string array of single characters, map(&:ord)
converts array of integers representing utf-8 codes. map
conditional block replaces each occurrence of 9644 219 , leaves other integers intact. map integers single characters map(&:chr)
, join resulting array of characters string.
note code creates few arrays along way, each of equal in size length of original string, using on large strings may consume fair amount of memory. if case, might consider replacing map
map!
modify array in place.
Comments
Post a Comment