delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2011/01/29/12:22:01

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-6.8 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Message-ID: <4D444CAC.2010300@redhat.com>
Date: Sat, 29 Jan 2011 10:21:48 -0700
From: Eric Blake <eblake AT redhat DOT com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.7
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: Bug in libiconv?
References: <201101282312 DOT 50298 DOT bruno AT clisp DOT org> <20110129123014 DOT GA8671 AT calimero DOT vinschen DOT de> <4D442DDA DOT 4050807 AT redhat DOT com> <20110129160157 DOT GA1057 AT calimero DOT vinschen DOT de>
In-Reply-To: <20110129160157.GA1057@calimero.vinschen.de>
OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

--------------enigF4F1B794E755F7A78ED13804
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 01/29/2011 09:01 AM, Corinna Vinschen wrote:
>> So, using UTF-16 surrogate encodings for characters outside the basic
>> plane violates POSIX, but it's the best we can do for those characters.
>=20
> Right, and we discussed this already on this list.  Or the developer
> list, I don't remember.  Maybe we should have stick to the base plane
> and only use UCS-2 to be more POSIX compatible.

The burden is on the application, not on cygwin.  If the application
wants POSIX behavior, then they obey __STDC_ISO_10646__ and use ONLY
characters from the basic plane (no surrogates), at which point their
use of wchar_t fits the POSIX definition (one wchar_t per character).
The moment they pass a surrogate, they are no longer honoring the
restriction documented by __STDC_ISO_10646__ so they are no longer under
the rules of POSIX, and then cygwin can do whatever it wants (and in
this case, QoI demands that we honor surrogates to the best of our
ability for full UTF-16 support, and you can have multi-wchar_t
characters just as you already have multi-byte UTF-8 char characters).
In other words, cygwin IS being POSIX-compliant by advertising only the
Unicode 4.0 character set in the __STDC_ISO_10646__, while still
supporting Unicode 5.2 (should we upgrade to Unicode 6.0?) as an
extension when you no longer care about POSIX.

> However, the POSIX definition doesn't contradict what I said about the
> definition of __STDC_ISO_10646__ as far as I'm concerned.

Yep - I think we're in violent agreement :)

--=20
Eric Blake   eblake AT redhat DOT com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org


--------------enigF4F1B794E755F7A78ED13804
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBCAAGBQJNREysAAoJEKeha0olJ0NqfiwIAJZfj1vdLxRh3cyoPauQrBxG
d51zsO0dMg8bTFMY0cO6amh23/nV8HWD3rBNl3Qzusehl1HfQF1vGG7zZvkcATxN
0PdSM+uAkhbQ2dtwWakh5gr0ZUkMFDB5qFNU0PXRC+tloZ74+c2+7vVag1rYBBhg
HRKbK+hawbWBACyYPv7aLYCzd58JMJdccXA2CbuHony/aR3CiMHSpJplYdwzdNIg
W24mumKp/CPldpmutHlgGtb3mKhmgLkfumU5DoIWVQhox3rbWNu0Wwcihz50S71P
8VdDw0kb35eIErei3WfMzWTKSwJ9fzlaD6MRnXah0BJBz68N5+iXlaUu9qNKXUs=
=+NPU
-----END PGP SIGNATURE-----

--------------enigF4F1B794E755F7A78ED13804--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019