c - Unicode escape sequences vs hexadecimal values -


to encode unicode/utf-8 characters in program, i've been using \uxxxx escape sequences, such as:

wchar_t superscript_4 = l'\u2074';  // u+2074 superscript 4 '⁴' wchar_t subscript_4   = l'\u2084';  // u+2084 subscript 4 '₄' 

however, using hexadecimal should work fine, since unicode encoded in hexadecimal.

wchar_t superscript_4 = 0x2074; wchar_t subscript_4   = 0x2084; 

will second example encode character properly? run wide-char issues, segmentation faults, or incorrectly stored character values? if so, why? if not, why?

you could initialize them hex constants, initialize normal chars numeric constants, e.g. char c = 67;. works same way; assigns whatever char or wchar_t has value of int. in example give, , assuming unicode execution environment (not quite guaranteed highly probable) it’s subscript or superscript 4; in example it’s capital c.

in particular, regular chars, technically character constants 'c' have type int, , assigning int values chars. wchar_ts, constants have wchar_t type, , integral value same value you’d calling mbtowc. assuming you’re working in unicode environment, hex constants equivalent unicode escapes.

usually don’t want this, though; using character literals makes clearer intention is. true if use non-ascii characters in source code, in case can make code be

wchar_t superscript_4 = l'⁴' wchar_t subscript_4   = l'₄' 

also note many purposes it’s better use char16_t or char32_t, because wchar_t can have different widths on different platforms; might cleaner use utf-8 until have specific need switch else.


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -