unicode - Escaped Characters Outside the Basic Multilingual Plane (BMP) in Prolog -
for reference, i'm using prolog v7.4.2 on windows 10, 64-bit
entering following code in repl:
write("\u0001d7f6"). % mathematical monospace digit 0
gives me error in output:
error: syntax error: illegal character code error: write(" error: ** here ** error: \u0001d7f6") .
i know fact u+1d7f6 valid unicode character, what's up?
swi-prolog internally uses c wchar_t
represent unicode characters. on windows these 16 bit , intended hold utf-16 encoded strings. swi-prolog uses wchar_t
nice arrays of code points , supports ucs-2 on windows (code points u0000..uffff
).
on non-windows systems, wchar_t
32 bits , complete unicode range supported.
it not trivial thing fix handling wchar_t
utf-16 looses nice property each element of array 1 code point , using our own 32-bit type means cannot use c library wide character functions , have reimplement them in swi-prolog. not work, replacing them pure c versions looses optimization typically present in modern c runtime libraries.
Comments
Post a Comment