X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; q=dns; s= default; b=uLm77eQxp0TbDIaMdKbxCkXrIrQOIvxuThK0mMwUatWGyycLrkH19 d/2kJ+j9NO0t6KY0iRPF7SyMlguiRQCqBf1TuwImNdH34ZcvpsQTkhY4t0+Zy8H3 xWBnZNfdUi7RtRSHXfQdI8G6LCx8sXR2aqRtTGkEPx0n8CkcbXYmcQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; s=default; bh=GkTY1jbaIoXaEY5LPkvffd5hjrE=; b=Ptnkh2udt3KWqgrCamxUO08q3oT6 SWFM0WnjjwzDS0+hxqV6fnaWVq1NqL9BHZQ5JNO7p7boaosmBTPfW44yUzN3sMe0 QB12h2EW3woajPTTIc/c6bv8GVjxWmhOT9pt76pDDP+Iuqa53J9/hOlll9u4Hsi6 V4gMTTkU5UopoNg= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-5.4 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2 X-HELO: calimero.vinschen.de Date: Fri, 30 Oct 2015 13:03:20 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: Bug in collation functions? Message-ID: <20151030120320.GO5319@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <20151029075050 DOT GE5319 AT calimero DOT vinschen DOT de> <20151029083057 DOT GH5319 AT calimero DOT vinschen DOT de> <56321815 DOT 7000203 AT cornell DOT edu> <20151029153516 DOT GJ5319 AT calimero DOT vinschen DOT de> <56323F2E DOT 4030807 AT cornell DOT edu> <56324598 DOT 9060604 AT cornell DOT edu> <56324E82 DOT 7000402 AT redhat DOT com> <563268A4 DOT 6000005 AT cornell DOT edu> <56329462 DOT 2090206 AT cornell DOT edu> <56329BE8 DOT 808 AT cornell DOT edu> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="EAECBqR2X0HbR9Ag" Content-Disposition: inline In-Reply-To: <56329BE8.808@cornell.edu> User-Agent: Mutt/1.5.23 (2014-03-12) --EAECBqR2X0HbR9Ag Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Ken, On Oct 29 18:21, Ken Brown wrote: > On 10/29/2015 5:49 PM, Ken Brown wrote: > >On 10/29/2015 2:42 PM, Ken Brown wrote: > >>On 10/29/2015 12:51 PM, Eric Blake wrote: > >>>Careful. POSIX is proposing some wording that say that normal locales > >>>should always implement a fallback of last resort (and that locales th= at > >>>do not do so should have a special name including '@', to make it > >>>obvious). It is not standardized yet, but worth thinking about. > >>> > >>>http://austingroupbugs.net/view.php?id=3D938 > >>>http://austingroupbugs.net/view.php?id=3D963 > >>> > >>>The intent of that wording is that if ignoring punctuation could cause > >>>two strings to otherwise compare equal, the fallback of a total orderi= ng > >>>on all characters means that the final result of strcoll() will not be= 0 > >>>unless the two strings are identical. > >> > >>In that case, I think Cygwin should start by using NORM_IGNORESYMBOLS in > >>non-POSIX locales, with the goal of eventually moving toward emulating > >>glibc. I don't know what fallback glibc uses or how hard it would be to > >>implement this on Cygwin. > > > >I withdraw this suggestion. I took a look at the glibc code, and I > >don't see any reasonable way for Cygwin to emulate it precisely. On the > >other hand, I have an idea for a simple fallback. I'll play with it a > >little and then submit a patch. >=20 > The fallback I had in mind is to return the shorter string if they have > different lengths and otherwise to revert to wcscmp. Using this, both > Cygwin and Linux give the following comparisons: >=20 > "11" > "1.1" in POSIX locale > "11" < "1.1" in en_US.UTF-8 locale > "11" > "1 2" in POSIX locale > "11" < "1.2" in en_US.UTF-8 locale > "1 1" < "1.1" in POSIX locale > "1 1" < "1.1" in en_US.UTF-8 locale >=20 > If this seems reasonable, I'll test it more extensively and then submit a > patch. I had a longer look into this suggestion and the below code and it took me some time to find out what bugged me with it: What about str/wcsxfrm? Per POSIX, calling strcmp on the result of strxfrm is equivalent to calling strcoll (analogue with wcs*). If you extend *coll to perform an extra check on the length, you will have cases in which the above rule fails. You can't perform the length test on the result of *xfrm and expect the same result as in *coll. In fact, when calling LCMapStringW with NORM_IGNORESYMBOLS (you would have to do this anyway if we add this flag in *coll), the resulting transformed strings created from the input strings "11" and "1.1" would be identical, so a length test on the xfrm string is not meaningful at all. The bottom line is, afaics, we must make sure that CompareStringW and LCMapStringW are called the same way, and their result/output has to be returned to the caller. Performing an extra check in *coll which can't be reliably performed in *xfrm is not feasible. Does that make sense? Corinna --=20 Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat --EAECBqR2X0HbR9Ag Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWM1yIAAoJEPU2Bp2uRE+gD2UP/3k+l1xlgcqw4Buq0xdxGdib bi6vR8VR2EKNaopNQvVemwL332z7Yn4OlIoqNRsUgv6S822KcgobcGTt7o+uDBSn lR1SNGhMyOtFGlMJczA9b6tmRyMC7/fHs3qldQfneS8mhuime/dYd4DKu4oqmX1A V1XYDEK/1h4AahYGniRfEJ5bjCf8qain68t06mSt9DxtLKvGnYarnjfXUkeAg40K xKKPbL40WTMOLVU6q+C0RyIZPEA+F9HULtGTrjz9D2i6aUgVIUBtth+QC4G0zndu zWC5UWvyPhMfnzR8zzukYgsVbfUpj2pdq8rbxNZ/GTrHQrCgSY5mVlKG7KFGrgAo FI01mjWGytJ7S90B4AhZxTxoSJubxzPTKK0zQdlwQqH0YKISwaMy7CgUKyx6zKmC KqrwaVOYnah1VodiVGpbYHmrBft+1Q4swBmt/fqW2Te7i/1WDGYEEbMKjCQVqoCx msRVX7W8p+FgYi81IF90ZwayLLnSMBwXnm0MbOWczruKnstVvnW9m8AP5JnhB/IV 6WAgrx0O7kjPufMoYGqz3RRVlZ9+42WorQbpSJtCjuvX09IlD6Brhut6TvDnPseB gq5mWWProFJGYYV85DjvuBA+/qvtoZnPVJt1ebqk5+IGvOAtPEMeoR+2lljhmQcc qbqUSux4gwXJo8XHQAPS =5yIS -----END PGP SIGNATURE----- --EAECBqR2X0HbR9Ag--