c - Unicode escape sequences vs hexadecimal values -
to encode unicode/utf-8 characters in program, i've been using \uxxxx
escape sequences, such as:
wchar_t superscript_4 = l'\u2074'; // u+2074 superscript 4 '⁴' wchar_t subscript_4 = l'\u2084'; // u+2084 subscript 4 '₄'
however, using hexadecimal should work fine, since unicode encoded in hexadecimal.
wchar_t superscript_4 = 0x2074; wchar_t subscript_4 = 0x2084;
will second example encode character properly? run wide-char issues, segmentation faults, or incorrectly stored character values? if so, why? if not, why?
you could initialize them hex constants, initialize normal char
s numeric constants, e.g. char c = 67;
. works same way; assigns whatever char
or wchar_t
has value of int
. in example give, , assuming unicode execution environment (not quite guaranteed highly probable) it’s subscript or superscript 4; in example it’s capital c
.
in particular, regular char
s, technically character constants 'c'
have type int
, , assigning int
values char
s. wchar_t
s, constants have wchar_t
type, , integral value same value you’d calling mbtowc
. assuming you’re working in unicode environment, hex constants equivalent unicode escapes.
usually don’t want this, though; using character literals makes clearer intention is. true if use non-ascii characters in source code, in case can make code be
wchar_t superscript_4 = l'⁴' wchar_t subscript_4 = l'₄'
also note many purposes it’s better use char16_t
or char32_t
, because wchar_t
can have different widths on different platforms; might cleaner use utf-8 until have specific need switch else.
Comments
Post a Comment