Mail Archives: cygwin/2011/01/29/13:12:52
On Jan 29 10:21, Eric Blake wrote:
> On 01/29/2011 09:01 AM, Corinna Vinschen wrote:
> >> So, using UTF-16 surrogate encodings for characters outside the basic
> >> plane violates POSIX, but it's the best we can do for those characters.
> >
> > Right, and we discussed this already on this list. Or the developer
> > list, I don't remember. Maybe we should have stick to the base plane
> > and only use UCS-2 to be more POSIX compatible.
>
> The burden is on the application, not on cygwin. If the application
> wants POSIX behavior, then they obey __STDC_ISO_10646__ and use ONLY
> characters from the basic plane (no surrogates), at which point their
> use of wchar_t fits the POSIX definition (one wchar_t per character).
> The moment they pass a surrogate, they are no longer honoring the
> restriction documented by __STDC_ISO_10646__ so they are no longer under
> the rules of POSIX, and then cygwin can do whatever it wants (and in
Erm... hang on. __STDC_ISO_10646__ and the POSIX requirement are two
different beasts. I still think that __STDC_ISO_10646__ does not
restrict a 2 byte wchar_t to UCS-2. Per the definition UTF-16 is a
valid coded representation of characters from ISO/IEC 10646.
So, to say it with your words, the moment applications pass a surrogate,
they are no longer under the rules of POSIX, but they still honor the
restriction documented by __STDC_ISO_10646__.
However, *usually* an application shouldn't really notice that a
surrogate has been used, at least as long as they only manipulate entire
strings.
> this case, QoI demands that we honor surrogates to the best of our
> ability for full UTF-16 support, and you can have multi-wchar_t
> characters just as you already have multi-byte UTF-8 char characters).
> In other words, cygwin IS being POSIX-compliant by advertising only the
> Unicode 4.0 character set in the __STDC_ISO_10646__, while still
> supporting Unicode 5.2 (should we upgrade to Unicode 6.0?) as an
> extension when you no longer care about POSIX.
>
> > However, the POSIX definition doesn't contradict what I said about the
> > definition of __STDC_ISO_10646__ as far as I'm concerned.
>
> Yep - I think we're in violent agreement :)
Hmm, I'm not quite sure, see above.
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
- Raw text -