c - Unicode escape sequences vs hexadecimal values -
to encode unicode/utf-8 characters in program, i've been using \uxxxx escape sequences, such as:
wchar_t superscript_4 = l'\u2074'; // u+2074 superscript 4 '⁴' wchar_t subscript_4 = l'\u2084'; // u+2084 subscript 4 '₄' however, using hexadecimal should work fine, since unicode encoded in hexadecimal.
wchar_t superscript_4 = 0x2074; wchar_t subscript_4 = 0x2084; will second example encode character properly? run wide-char issues, segmentation faults, or incorrectly stored character values? if so, why? if not, why?
you could initialize them hex constants, initialize normal chars numeric constants, e.g. char c = 67;. works same way; assigns whatever char or wchar_t has value of int. in example give, , assuming unicode execution environment (not quite guaranteed highly probable) it’s subscript or superscript 4; in example it’s capital c.
in particular, regular chars, technically character constants 'c' have type int, , assigning int values chars. wchar_ts, constants have wchar_t type, , integral value same value you’d calling mbtowc. assuming you’re working in unicode environment, hex constants equivalent unicode escapes.
usually don’t want this, though; using character literals makes clearer intention is. true if use non-ascii characters in source code, in case can make code be
wchar_t superscript_4 = l'⁴' wchar_t subscript_4 = l'₄' also note many purposes it’s better use char16_t or char32_t, because wchar_t can have different widths on different platforms; might cleaner use utf-8 until have specific need switch else.
Comments
Post a Comment