delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2011/01/29/13:12:52

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Sat, 29 Jan 2011 19:12:31 +0100
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com, bug-gnu-libiconv AT gnu DOT org
Subject: Re: Bug in libiconv?
Message-ID: <20110129181231.GC1057@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com, bug-gnu-libiconv AT gnu DOT org
Mail-Followup-To: cygwin AT cygwin DOT com, bug-gnu-libiconv AT gnu DOT org
References: <201101282312 DOT 50298 DOT bruno AT clisp DOT org> <20110129123014 DOT GA8671 AT calimero DOT vinschen DOT de> <4D442DDA DOT 4050807 AT redhat DOT com> <20110129160157 DOT GA1057 AT calimero DOT vinschen DOT de> <4D444CAC DOT 2010300 AT redhat DOT com>
MIME-Version: 1.0
In-Reply-To: <4D444CAC.2010300@redhat.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Jan 29 10:21, Eric Blake wrote:
> On 01/29/2011 09:01 AM, Corinna Vinschen wrote:
> >> So, using UTF-16 surrogate encodings for characters outside the basic
> >> plane violates POSIX, but it's the best we can do for those characters.
> > 
> > Right, and we discussed this already on this list.  Or the developer
> > list, I don't remember.  Maybe we should have stick to the base plane
> > and only use UCS-2 to be more POSIX compatible.
> 
> The burden is on the application, not on cygwin.  If the application
> wants POSIX behavior, then they obey __STDC_ISO_10646__ and use ONLY
> characters from the basic plane (no surrogates), at which point their
> use of wchar_t fits the POSIX definition (one wchar_t per character).
> The moment they pass a surrogate, they are no longer honoring the
> restriction documented by __STDC_ISO_10646__ so they are no longer under
> the rules of POSIX, and then cygwin can do whatever it wants (and in

Erm... hang on.  __STDC_ISO_10646__ and the POSIX requirement are two
different beasts.  I still think that __STDC_ISO_10646__ does not
restrict a 2 byte wchar_t to UCS-2.  Per the definition UTF-16 is a
valid coded representation of characters from ISO/IEC 10646.

So, to say it with your words, the moment applications pass a surrogate,
they are no longer under the rules of POSIX, but they still honor the
restriction documented by __STDC_ISO_10646__.

However, *usually* an application shouldn't really notice that a
surrogate has been used, at least as long as they only manipulate entire
strings.

> this case, QoI demands that we honor surrogates to the best of our
> ability for full UTF-16 support, and you can have multi-wchar_t
> characters just as you already have multi-byte UTF-8 char characters).
> In other words, cygwin IS being POSIX-compliant by advertising only the
> Unicode 4.0 character set in the __STDC_ISO_10646__, while still
> supporting Unicode 5.2 (should we upgrade to Unicode 6.0?) as an
> extension when you no longer care about POSIX.
> 
> > However, the POSIX definition doesn't contradict what I said about the
> > definition of __STDC_ISO_10646__ as far as I'm concerned.
> 
> Yep - I think we're in violent agreement :)

Hmm, I'm not quite sure, see above.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019