delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2015/10/30/15:14:57

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:to:subject:message-id:reply-to
:references:mime-version:content-type:in-reply-to; q=dns; s=
default; b=PKTMDoM6rpobe9Tig1UES9OBBoIBpc+eym9PSGZmRjZP1SkIVQEAl
5KY70t1uaJCibVbCksvEv8sKAB013vjE/KazsJ/FFyWE4csWfOLV9uZsWXT4DdFs
OWhHcOyNG4Ho32YNfKqsMbOp+gc83fi3v1wKwxuvM6ypOziCHLH5wc=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:to:subject:message-id:reply-to
:references:mime-version:content-type:in-reply-to; s=default;
bh=VorSQoK6LVuQEALwQw7JAvr+UYQ=; b=uAY3zZeaBBT3F8xIen+Qu2ev8jh8
i2P6z8V4xGzGYT5nQU9H4b553kNjQahquw9aocO+rmnDR/uXZHbS8GvMJK9xsUXP
0nvPJgKwmHMJMNJxzrJZUazu88/9B0BT3Pt6iBoOS5qRrL/3NGmck5RarS9GnUXx
UNhQ08cOo0wGA+4=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-5.4 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2
X-HELO: calimero.vinschen.de
Date: Fri, 30 Oct 2015 20:14:40 +0100
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: Bug in collation functions?
Message-ID: <20151030191440.GP5319@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <56321815 DOT 7000203 AT cornell DOT edu> <20151029153516 DOT GJ5319 AT calimero DOT vinschen DOT de> <56323F2E DOT 4030807 AT cornell DOT edu> <56324598 DOT 9060604 AT cornell DOT edu> <56324E82 DOT 7000402 AT redhat DOT com> <563268A4 DOT 6000005 AT cornell DOT edu> <56329462 DOT 2090206 AT cornell DOT edu> <56329BE8 DOT 808 AT cornell DOT edu> <20151030120320 DOT GO5319 AT calimero DOT vinschen DOT de> <56337996 DOT 2000400 AT cornell DOT edu>
MIME-Version: 1.0
In-Reply-To: <56337996.2000400@cornell.edu>
User-Agent: Mutt/1.5.23 (2014-03-12)

--V3GHqwm1rrtpHsCJ
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Oct 30 10:07, Ken Brown wrote:
> Hi Corinna,
>=20
> On 10/30/2015 8:03 AM, Corinna Vinschen wrote:
> >On Oct 29 18:21, Ken Brown wrote:
> >>The fallback I had in mind is to return the shorter string if they have
> >>different lengths and otherwise to revert to wcscmp.
> >
> >I had a longer look into this suggestion and the below code and it took
> >me some time to find out what bugged me with it:
> >
> >What about str/wcsxfrm?
> >
> >Per POSIX, calling strcmp on the result of strxfrm is equivalent to
> >calling strcoll (analogue with wcs*).  If you extend *coll to perform an
> >extra check on the length, you will have cases in which the above rule
> >fails.  You can't perform the length test on the result of *xfrm and
> >expect the same result as in *coll.
> >
> >In fact, when calling LCMapStringW with NORM_IGNORESYMBOLS (you would
> >have to do this anyway if we add this flag in *coll), the resulting
> >transformed strings created from the input strings "11" and "1.1" would
> >be identical, so a length test on the xfrm string is not meaningful at
> >all.
> >
> >The bottom line is, afaics, we must make sure that CompareStringW and
> >LCMapStringW are called the same way, and their result/output has to be
> >returned to the caller.  Performing an extra check in *coll which can't
> >be reliably performed in *xfrm is not feasible.
> >
> >Does that make sense?
>=20
> Yes, I see the problem, and I don't see a good way around it.  So I think=
 we
> probably have to leave things as they are and live with the fact that we
> can't do comparisons that ignore whitespace and punctuation.
>=20
> The alternative of allowing str/wcscoll to return 0 on unequal strings
> doesn't seem feasible in view of Eric's comments.
>=20
> What about the other issue I raised: Should setlocale return null to
> indicate an error if it's given an invalid locale name like en_DE.UTF-8?

Huh.  Interesting.  You're runing Windows10, right?  After some digging
it turns out there's a bug in W10.  LocaleNameToLCID() does *not* fail
and return with an error if it doesn't know a locale.  That would be too
simple I guess.  Rather, it returns a value LOCALE_CUSTOM_UNSPECIFIED,
0x1000.  So all unknown locales are now treated as custom locale.  Duh!
I fear the answer when trying to report this.  Probably it's a feature...

I applied a patch to workaround this feature.


Thanks for the testcase, btw :)


Corinna

--=20
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

--V3GHqwm1rrtpHsCJ
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJWM8GgAAoJEPU2Bp2uRE+g+BAP/2Pl1tuE4ieRxZpnPTCb8d78
5pIipE8VFbUwX1cQIaDLN8E01Lck+9w7zrz33YmDSBFESzafrLL45jTTTIs6KiaT
6Z/dFWHkSQ8/ySiHugzPjIHxGwygOvlSSYs/u5+8zV9TKUqCzUP8CGtrVXB//+UF
Lw/Lzt4ijuvSNupy8unzJwqpflZM11sVfZjpAcrKlw3uYS/tcAcJWvUV6Ty9WQs4
Dvi//oeFRUzA5npeShfp/JioiPW9bV5dr6R5PWDd4pDH8wp85e0GoWiTM0c/zKxH
HRWDnUKxc9WAYMgLISSqBtVrswo9HxkSh+JV7ShhD/hVQlovDbZSr0xCy8Mv1FOx
Kj2adRmrCQWrzv3/2rI753/qNv7uSQ7nvJjhDF+KP4k7ONAkqgQAbn13mxQjbHwt
/cjvYx5U+q85g6x/w1guYt+XdDLY1qcDQslHyxbO7T+TosMfhaUGr65FobROlTUm
uWb8HHNsoWy5HmOXpfxbfHZTmPRHgcDp/7VsgfL0Q2KS9FxHl4lRCK5rBIslcahP
rh0gdw+zYmOiJQq1NsYpf/qvyxWBCXhepf0/FSum7YBNREREl+ppM1lNhroA+p3/
H+uLP3dK5IpngyGiJD2eTbY4a5p+0X3vCU2FXj22Pq+JOTO30G8WCXCtxY/i863i
zmd//aHtTuFGsayJ3ZQM
=MhPb
-----END PGP SIGNATURE-----

--V3GHqwm1rrtpHsCJ--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019