Issues reading a string with weird characters in SQL Server 2008 and previous versions -


here issue today: load info text files sql server table. thing field realized string value little bit weird.

when query table sql server client 2008 or previous, got result set:

enter image description here

even if try run query 1 below, resultset empty:

select replace(ltrim(rtrim(cust_po)), '  ', ' ')  dbo.test_char cust_po '%076929%' 

the weird thing is: sql server 2016, opening table looks normal, although query doesn't return results either:

enter image description here

now, following couple of instruction @solomonrutzky have performed query below result in varbinary

enter image description here

the new result : 0x300037003600390032003900bc05bc05bc05bc05bc05bc05bc05bc05bc05

how varchar without weird chars or space or wherever is?

what going on?

now varbinary representation of data has been posted, issue clearer.

as can see, 2 lines "odd" characters are:

0x300037003600390032003900bc05bc05bc05bc05bc05bc05bc05bc05bc05 

this column nvarchar means encoding utf-16 little endian. being utf-16 means @ each block of 2 bytes (characters either 1 or 2 sets of 2-byte blocks), , being little endian means bytes in each 2-byte block in reverse order. meaning, first character 3000 (which 2 bytes), , equates code point u+0030, number 0. next character 3700 code point u+0037 number 7. , on down through 3900 final 9 in value.

but 9 sets of bc05 code point u+05bc, hebrew point dagesh or mapiq. , gets interesting there 3 separate things going on here:

  1. what displayed function of font being used. , not fonts handle characters correctly, or have mappings characters. why see series of circles left in results grid, , single dot on far left in row editor (hard see, if don't know it, closely in row editor image , top 2 lines both have tiny dot left of 0).
  2. hebrew right-to-left language, explains why circles in results grid on left, though characters in data come after numbers.
  3. this particular character combining character, means supposed attach character precedes it. since different fonts display characters differently variation seeing between results grid , row editor. results grid font not seem handling character correctly, while row editor does. but, in handling correctly, see combining character not allow showing multiples, see single dot on far left instead of 9 dots (but 9 there, 1 on top of other.

to see in action, following query:

select n'4   11' + replicate(nchar(0x05bc), 10) + n'  88  f '; 

returns:

4 11ּּּּּּּּּּ 88 f

notice single dot on left side of second 1. there single dot though string has 10 of them (due replicate). well, browser showing, captured following image:

enter image description here

yet see following in results grid:

enter image description here

notice how dot left of first, not second, 1, , placement of 11 between 88 , f,

and, if copy , paste results grid query editor, see:

enter image description here

notice there several red dots.

why doesn't where cust_po '%076929%' return rows?

this due string comparison doing should , applying linguistic rules. doesn't matter order bytes in, matters how rendered string viewed human reading perspective. , since particular character combining character, not come after preceding character, part of it. meaning, second 9 in 076929 value isn't 9 anymore, 9 + dagesh.

select 1 n'123' + replicate(nchar(0x05bc), 5) n'%123' + replicate(nchar(0x05bc), 4) + n'%'; -- no rows returned  select 2 n'123' + replicate(nchar(0x05bc), 5) n'%123' + replicate(nchar(0x05bc), 5) + n'%'; -- 2  select 3 n'9' + nchar(0x05bc) = n'9' -- no rows returned 

how rid of characters?

assuming u+05bc character issue in data, can simple replace. have tried this, replace isn't finding match. when happens, need use binary collation shown here:

select replace(n'4   11' + replicate(nchar(0x05bc), 5) + n'  88  f ',                nchar(0x05bc) collate latin1_general_100_bin2,                n'~'); -- 4   11~~~~~  88  f  

Comments

Popular posts from this blog

PHP and MySQL WP -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

go - golang pprof for c library code -