Mail Archives: djgpp/2003/08/29/08:45:04
Eli Zaretskii wrote:
>
> Wide characters is one representation of non-ASCII characters.
> Another representation, which should also be supported by the library,
> is the multibyte representation, whereby every characters is
> represented as a series of 8-bit bytes. (Many libraries choose UTF-8
> as their multibyte representation.) The is* macros should support the
> multibyte representation in a manner equivalent to what the isw*
> macros do with the wide characters. That is, if you pass a wide
> representation of a character CH to iswprint and the multibyte
> representation of the same character to isprint, you should get ther
> same result (I think).
No; is*() and to*() work only with "plain"
one-`char'-is-one-character
data. (And with the special value EOF, of course.) In fact, since the
argument to any of these is the value of a single character, there's no
way they could see the second and subsequent characters of a multibyte
encoding.
The point remains, though, that introducing wide character and
multibyte support involves more than merely implementing the functions
with `w' in their names. For example, the *printf() family must be
made aware of multibyte encodings (searching the format string for the
single character '%' does *not* suffice), and in C99 a FILE* stream
can have either wide- or narrow-character orientation.
--
Eric Sosman
esosman AT acm DOT org
- Raw text -