Mail Archives: djgpp-workers/2005/05/21/12:07:08
Hello.
ams AT ludd DOT ltu DOT se wrote:
> According to Eli Zaretskii:
>
>>>From: <ams AT ludd DOT ltu DOT se>
>>>Date: Sat, 14 May 2005 05:00:39 +0200 (CEST)
>>>
>>>Let say we decide to encode Unicode in wchar_t, which is the only
>>>sane choice today.
>>
>>What exactly do you mean by ``encode Unicode in wchar_t''? Do you
>>mean we will store Unicode codepoints in there? If so, it's a mistake
>>to call this ``an encoding'', since encoding means you transform
>>Unicode codepoints top some other form, like UTF-8 or cp1250.
>
>
> I don't know the terminology. Or find it confusing. I mean we put
> Unicode values in it, just like we put ASCII values in the type char.
>
> I wrote the previous mail, thinking that wchar_t was int. Now I've
> looked and found it to be unsigned short.
>
> That's one thing that has to change.
>
>
>>Alternatively, perhaps you meant UTF-16 or some such, which is indeed
>>an encoding. But then it's not fixed-size, which is generally
>>inappropriate for wchar_t.
>
>
> No. I mean Unicode encoding, which defines the range 0-0x10ffff.
> Are you telling me that that isn't an encoding?
>
> In your terminology, perhaps I want to say, "let's use Unicode
> codepoints".
>
> But logically (to me) _that_ _is_ a certain encoding. Weird
> terminology.
[snip]
You're confusing the codepoint, which is the numbering of characters,
symbols, etc. with how you represent them. The codepoints are abstract.
When you talk about "Unicode encoding", this is UTF-32, a mapping of
0x10ffff to a 32-bit integer. That may not seem like an encoding, but it
is, because of endianness in the encoded data.
UTF-8 encodes the codepoints into 1 to 6 bytes, depending on the
codepoints. The ASCII codepoints happen to be representable using a
single byte in UTF-8.
The Unicode FAQ is pretty helpful:
http://www.cl.cam.ac.uk/~mgk25/unicode.html
Specifically:
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
Bye, Rich =]
--
Richard Dawe [ http://homepages.nildram.co.uk/~phekda/richdawe/ ]
"You can't evaluate a man by logic alone."
-- McCoy, "I, Mudd", Star Trek
- Raw text -