delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2015/11/02/06:14:21

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:to:subject:message-id:reply-to
:references:mime-version:content-type:in-reply-to; q=dns; s=
default; b=suZXyWiPMh/oMGc7IAbyibPnkpHpsHOAyzbDNXI/DOCrWgDiID3ZH
s4McDt7xJrC2m1fhsrskiXN07c4XMVX2H3Mm601/PKSvb3uktJD1xZV+roK99rAT
oW9ydumXS4uZPU/4/Z1wDZsjQheqrH1dE4+IgP/A+LnCOPUu4iR88M=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:to:subject:message-id:reply-to
:references:mime-version:content-type:in-reply-to; s=default;
bh=nslAeG+T0655A6eARAAZxTNHKK8=; b=Ubu/PceyvR3m1nyl3GmkOM028C+t
uOnhayABw+kovl4W1lPIkFJOwpLd+njyIZHFMzzP7/JY3EQy4lUrhEOiW6ntsfx7
150VpGxPRc7LC1PJtlw906BbRoy6dQQKpfBTvZx+QxEMiGnUUNJpMQBRzGMXxXpa
9hYFjijd6hUFW6Y=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-5.4 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2
X-HELO: calimero.vinschen.de
Date: Mon, 2 Nov 2015 12:13:58 +0100
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: Bug in collation functions?
Message-ID: <20151102111358.GZ5319@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <20151029153516 DOT GJ5319 AT calimero DOT vinschen DOT de> <56323F2E DOT 4030807 AT cornell DOT edu> <56324598 DOT 9060604 AT cornell DOT edu> <56324E82 DOT 7000402 AT redhat DOT com> <563268A4 DOT 6000005 AT cornell DOT edu> <56329462 DOT 2090206 AT cornell DOT edu> <56329BE8 DOT 808 AT cornell DOT edu> <20151030120320 DOT GO5319 AT calimero DOT vinschen DOT de> <56337996 DOT 2000400 AT cornell DOT edu> <5634F6BA DOT 7070301 AT cornell DOT edu>
MIME-Version: 1.0
In-Reply-To: <5634F6BA.7070301@cornell.edu>
User-Agent: Mutt/1.5.23 (2014-03-12)

--ZPRTZ4IozCHrMUKG
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Oct 31 13:13, Ken Brown wrote:
> On 10/30/2015 10:07 AM, Ken Brown wrote:
> >Hi Corinna,
> >
> >On 10/30/2015 8:03 AM, Corinna Vinschen wrote:
> >>On Oct 29 18:21, Ken Brown wrote:
> >>>The fallback I had in mind is to return the shorter string if they have
> >>>different lengths and otherwise to revert to wcscmp.
> > >
> >>I had a longer look into this suggestion and the below code and it took
> >>me some time to find out what bugged me with it:
> >>
> >>What about str/wcsxfrm?
> >>
> >>Per POSIX, calling strcmp on the result of strxfrm is equivalent to
> >>calling strcoll (analogue with wcs*).  If you extend *coll to perform an
> >>extra check on the length, you will have cases in which the above rule
> >>fails.  You can't perform the length test on the result of *xfrm and
> >>expect the same result as in *coll.
> >>
> >>In fact, when calling LCMapStringW with NORM_IGNORESYMOLS (you would
> >>have to do this anyway if we add this flag in *coll), the resulting
> >>transformed strings created from the input strings "11" and "1.1" would
> >>be identical, so a length test on the xfrm string is not meaningful at
> >>all.
> >>
> >>The bottom line is, afaics, we must make sure that CompareStringW and
> >>LCMapStringW are called the same way, and their result/output has to be
> >>returned to the caller.  Performing an extra check in *coll which can't
> >>be reliably performed in *xfrm is not feasible.
> >>
> >>Does that make sense?
> >
> >Yes, I see the problem, and I don't see a good way around it.  So I
> >think we probably have to leave things as they are and live with the
> >fact that we can't do comparisons that ignore whitespace and punctuation.
> >
> >The alternative of allowing str/wcscoll to return 0 on unequal strings
> >doesn't seem feasible in view of Eric's comments.
>=20
> I have one other idea.  What would you think of defining a function
> cygwin_strcoll that's like strcoll but with an extra bool parameter
> 'ignoresymbols'?  If ignoresymbols =3D false, this would be the same as
> strcoll.  If ignoresymbols =3D true, this would use NORM_IGNORESYMBOLS wi=
th
> the fallback I suggested.
>=20
> That way applications that prefer to be more glibc-compatible and don't n=
eed
> strxfrm could do something like
>=20
>   #define strcoll(A,B) cygwin_strcoll ((A), (B), true)
>=20
> If you think this is reasonable, I'll submit a patch.  If not, no problem.

No, I don't think this is feasible.  Given Eric's comments, can an
application ever expect that strcoll behaves exactly as on Linux?  For
portability reasons, it has to expect different results on different
platforms.  Only if the result is POSIXly incorrect, it makes sense to
fix the behaviour, IMHO.


Corinna

--=20
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

--ZPRTZ4IozCHrMUKG
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJWN0V2AAoJEPU2Bp2uRE+gJsUP/0eNYlGi6b3sc1K9nBqx8HUT
P4X3r6sqQGxDqNi5xGDa2ff57sSWaPt2rsGfJwR1rtEC/tjjPqMT4nBAVeDjMeyU
6JMRcECKHG4U55JMrp0ZxE9EY8Xt+6KScgAztxqa3KRFrkY2v+u82FDV/h1Xn/Kl
8T3o3u5mgBhmkjiXODsMVRpw1EOy0+lxpDxa3ko60kWSnWlde9WwU5w9NpJC6QMs
QTprnU/kjdd47crlglDr2qfPTBDUtIP06VrtqyEBQQfbqqEdW9wT2grP4nKLluXV
8mAHq8HGITV7fxszWSoIpMMrgONKj9l/gtUVsywhi3MVixqxYXZ0fHlWKu7cFlH3
a6nEijZ/faBRBH3I4xuBjLvegaVwWCBR9AIGVYd4y1PJ14p/Qq4uIpxcl6S1gdOZ
tLDYNOux+Fo1qkQ2FocnT+fyG1wTMP1Kx+8XdS6lwxjZem10DOHcgbEiZ58IrvPA
JnCr13x5ED+Hx1BfTy6f7tEpYki/5pWNPZehTK7oAd7QSwll3eAosK7bOZXsC72I
SqykTQjrE5BSy9VvHeuiWqcYuaIA8AcVpUzCtMFwN3Zijx/qA0RX/YLzgV839UhU
B4c4SwopJvVSV9adciVztgwEccfhij5Mbt3SvLQJj2PlWU6MZJpfHbOgf498cRWx
W47+u+Wl91V/CJaduwSX
=xFVZ
-----END PGP SIGNATURE-----

--ZPRTZ4IozCHrMUKG--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019