X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; q=dns; s= default; b=PKTMDoM6rpobe9Tig1UES9OBBoIBpc+eym9PSGZmRjZP1SkIVQEAl 5KY70t1uaJCibVbCksvEv8sKAB013vjE/KazsJ/FFyWE4csWfOLV9uZsWXT4DdFs OWhHcOyNG4Ho32YNfKqsMbOp+gc83fi3v1wKwxuvM6ypOziCHLH5wc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; s=default; bh=VorSQoK6LVuQEALwQw7JAvr+UYQ=; b=uAY3zZeaBBT3F8xIen+Qu2ev8jh8 i2P6z8V4xGzGYT5nQU9H4b553kNjQahquw9aocO+rmnDR/uXZHbS8GvMJK9xsUXP 0nvPJgKwmHMJMNJxzrJZUazu88/9B0BT3Pt6iBoOS5qRrL/3NGmck5RarS9GnUXx UNhQ08cOo0wGA+4= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-5.4 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2 X-HELO: calimero.vinschen.de Date: Fri, 30 Oct 2015 20:14:40 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: Bug in collation functions? Message-ID: <20151030191440.GP5319@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <56321815 DOT 7000203 AT cornell DOT edu> <20151029153516 DOT GJ5319 AT calimero DOT vinschen DOT de> <56323F2E DOT 4030807 AT cornell DOT edu> <56324598 DOT 9060604 AT cornell DOT edu> <56324E82 DOT 7000402 AT redhat DOT com> <563268A4 DOT 6000005 AT cornell DOT edu> <56329462 DOT 2090206 AT cornell DOT edu> <56329BE8 DOT 808 AT cornell DOT edu> <20151030120320 DOT GO5319 AT calimero DOT vinschen DOT de> <56337996 DOT 2000400 AT cornell DOT edu> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="V3GHqwm1rrtpHsCJ" Content-Disposition: inline In-Reply-To: <56337996.2000400@cornell.edu> User-Agent: Mutt/1.5.23 (2014-03-12) --V3GHqwm1rrtpHsCJ Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Oct 30 10:07, Ken Brown wrote: > Hi Corinna, >=20 > On 10/30/2015 8:03 AM, Corinna Vinschen wrote: > >On Oct 29 18:21, Ken Brown wrote: > >>The fallback I had in mind is to return the shorter string if they have > >>different lengths and otherwise to revert to wcscmp. > > > >I had a longer look into this suggestion and the below code and it took > >me some time to find out what bugged me with it: > > > >What about str/wcsxfrm? > > > >Per POSIX, calling strcmp on the result of strxfrm is equivalent to > >calling strcoll (analogue with wcs*). If you extend *coll to perform an > >extra check on the length, you will have cases in which the above rule > >fails. You can't perform the length test on the result of *xfrm and > >expect the same result as in *coll. > > > >In fact, when calling LCMapStringW with NORM_IGNORESYMBOLS (you would > >have to do this anyway if we add this flag in *coll), the resulting > >transformed strings created from the input strings "11" and "1.1" would > >be identical, so a length test on the xfrm string is not meaningful at > >all. > > > >The bottom line is, afaics, we must make sure that CompareStringW and > >LCMapStringW are called the same way, and their result/output has to be > >returned to the caller. Performing an extra check in *coll which can't > >be reliably performed in *xfrm is not feasible. > > > >Does that make sense? >=20 > Yes, I see the problem, and I don't see a good way around it. So I think= we > probably have to leave things as they are and live with the fact that we > can't do comparisons that ignore whitespace and punctuation. >=20 > The alternative of allowing str/wcscoll to return 0 on unequal strings > doesn't seem feasible in view of Eric's comments. >=20 > What about the other issue I raised: Should setlocale return null to > indicate an error if it's given an invalid locale name like en_DE.UTF-8? Huh. Interesting. You're runing Windows10, right? After some digging it turns out there's a bug in W10. LocaleNameToLCID() does *not* fail and return with an error if it doesn't know a locale. That would be too simple I guess. Rather, it returns a value LOCALE_CUSTOM_UNSPECIFIED, 0x1000. So all unknown locales are now treated as custom locale. Duh! I fear the answer when trying to report this. Probably it's a feature... I applied a patch to workaround this feature. Thanks for the testcase, btw :) Corinna --=20 Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat --V3GHqwm1rrtpHsCJ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWM8GgAAoJEPU2Bp2uRE+g+BAP/2Pl1tuE4ieRxZpnPTCb8d78 5pIipE8VFbUwX1cQIaDLN8E01Lck+9w7zrz33YmDSBFESzafrLL45jTTTIs6KiaT 6Z/dFWHkSQ8/ySiHugzPjIHxGwygOvlSSYs/u5+8zV9TKUqCzUP8CGtrVXB//+UF Lw/Lzt4ijuvSNupy8unzJwqpflZM11sVfZjpAcrKlw3uYS/tcAcJWvUV6Ty9WQs4 Dvi//oeFRUzA5npeShfp/JioiPW9bV5dr6R5PWDd4pDH8wp85e0GoWiTM0c/zKxH HRWDnUKxc9WAYMgLISSqBtVrswo9HxkSh+JV7ShhD/hVQlovDbZSr0xCy8Mv1FOx Kj2adRmrCQWrzv3/2rI753/qNv7uSQ7nvJjhDF+KP4k7ONAkqgQAbn13mxQjbHwt /cjvYx5U+q85g6x/w1guYt+XdDLY1qcDQslHyxbO7T+TosMfhaUGr65FobROlTUm uWb8HHNsoWy5HmOXpfxbfHZTmPRHgcDp/7VsgfL0Q2KS9FxHl4lRCK5rBIslcahP rh0gdw+zYmOiJQq1NsYpf/qvyxWBCXhepf0/FSum7YBNREREl+ppM1lNhroA+p3/ H+uLP3dK5IpngyGiJD2eTbY4a5p+0X3vCU2FXj22Pq+JOTO30G8WCXCtxY/i863i zmd//aHtTuFGsayJ3ZQM =MhPb -----END PGP SIGNATURE----- --V3GHqwm1rrtpHsCJ--