X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-6.8 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Message-ID: <4D444CAC.2010300@redhat.com> Date: Sat, 29 Jan 2011 10:21:48 -0700 From: Eric Blake User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.7 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: Bug in libiconv? References: <201101282312 DOT 50298 DOT bruno AT clisp DOT org> <20110129123014 DOT GA8671 AT calimero DOT vinschen DOT de> <4D442DDA DOT 4050807 AT redhat DOT com> <20110129160157 DOT GA1057 AT calimero DOT vinschen DOT de> In-Reply-To: <20110129160157.GA1057@calimero.vinschen.de> OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------enigF4F1B794E755F7A78ED13804" X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com --------------enigF4F1B794E755F7A78ED13804 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 01/29/2011 09:01 AM, Corinna Vinschen wrote: >> So, using UTF-16 surrogate encodings for characters outside the basic >> plane violates POSIX, but it's the best we can do for those characters. >=20 > Right, and we discussed this already on this list. Or the developer > list, I don't remember. Maybe we should have stick to the base plane > and only use UCS-2 to be more POSIX compatible. The burden is on the application, not on cygwin. If the application wants POSIX behavior, then they obey __STDC_ISO_10646__ and use ONLY characters from the basic plane (no surrogates), at which point their use of wchar_t fits the POSIX definition (one wchar_t per character). The moment they pass a surrogate, they are no longer honoring the restriction documented by __STDC_ISO_10646__ so they are no longer under the rules of POSIX, and then cygwin can do whatever it wants (and in this case, QoI demands that we honor surrogates to the best of our ability for full UTF-16 support, and you can have multi-wchar_t characters just as you already have multi-byte UTF-8 char characters). In other words, cygwin IS being POSIX-compliant by advertising only the Unicode 4.0 character set in the __STDC_ISO_10646__, while still supporting Unicode 5.2 (should we upgrade to Unicode 6.0?) as an extension when you no longer care about POSIX. > However, the POSIX definition doesn't contradict what I said about the > definition of __STDC_ISO_10646__ as far as I'm concerned. Yep - I think we're in violent agreement :) --=20 Eric Blake eblake AT redhat DOT com +1-801-349-2682 Libvirt virtualization library http://libvirt.org --------------enigF4F1B794E755F7A78ED13804 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBCAAGBQJNREysAAoJEKeha0olJ0NqfiwIAJZfj1vdLxRh3cyoPauQrBxG d51zsO0dMg8bTFMY0cO6amh23/nV8HWD3rBNl3Qzusehl1HfQF1vGG7zZvkcATxN 0PdSM+uAkhbQ2dtwWakh5gr0ZUkMFDB5qFNU0PXRC+tloZ74+c2+7vVag1rYBBhg HRKbK+hawbWBACyYPv7aLYCzd58JMJdccXA2CbuHony/aR3CiMHSpJplYdwzdNIg W24mumKp/CPldpmutHlgGtb3mKhmgLkfumU5DoIWVQhox3rbWNu0Wwcihz50S71P 8VdDw0kb35eIErei3WfMzWTKSwJ9fzlaD6MRnXah0BJBz68N5+iXlaUu9qNKXUs= =+NPU -----END PGP SIGNATURE----- --------------enigF4F1B794E755F7A78ED13804--