delorie.com/archives/browse.cgi | search |
On Feb 2 12:29, Bruno Haible wrote: > Hello Eric, > > > ... POSIX requires that 1 wchar_t corresponds to 1 character > > ... > > > What consequences does this have? > > > > > > 1) All code that uses the functions from <wctype.h> (wide character > > > classification and mapping) or wcwidth() malfunctions on strings that > > > contains Unicode characters outside the BMP, i.e. outside the range > > > U+0000..U+FFFF. > > > > Not necessarily. Such code falls outside of POSIX, but it may still be > > a well-behaved extension if given sane behavior for how to deal with > > surrogates. > > No. Code that uses <wctype.h> and wcwidth() is written precisely according > to POSIX. The problem is that this code cannot work correctly when wchar_t[] > is in UTF-16 encoding. There simply is no way to define these functions > in a reasonable way for surrogates. > > For example: > U+1031E = 0xD800 0xDF1E is a letter (iswalpha should be true) > U+10320 = 0xD800 0xDF20 is not a letter (iswalpha should be false) > U+1D31E = 0xD834 0xDF1E is not a letter (iswalpha should be false) > U+1D320 = 0xD834 0xDF20 is not a letter (iswalpha should be false) > U+1D71E = 0xD835 0xDF1E is a letter (iswalpha should be true) > U+1D720 = 0xD835 0xDF20 is a letter (iswalpha should be true) > There is no way that a system can provide this information through a > function 'iswalpha' that takes a single wchar_t argument. iswalpha takes wint_t, not wchar_t. Since sizeof (wint_t) is 4 byte, the function can return the correct value, provided that the application converts the UTF-16 surrogate to UTF-32 before calling iswalpha. > We agree that it is a bug. And it is caused by > - the fact that Cygwin's wchar_t[] encoding is UTF-16, and > - there is no way to define the <wctype.h> POSIX functions sanely in this > setting, and See above. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |