delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2015/10/29/11:35:37

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:to:subject:message-id:reply-to
:references:mime-version:content-type:in-reply-to; q=dns; s=
default; b=tk00RDd3gtjiYId9mg48qGGbrCKL5D8vlgg22loMJqwjVfGApJ95X
S0eJy3SpMQiuuKjjR8Y4RaDPG9jidFTsILxeDhiaslYRXzkLiU5e2dZ8WnKQ1+rq
u+nRbM4gZAGgPOOWGs/8XLz1LIF4MlowM3VR1QWYiCsZIVDfaEZtuk=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:to:subject:message-id:reply-to
:references:mime-version:content-type:in-reply-to; s=default;
bh=bHRyHBmyQauH7R3xfsWBhT5XR7M=; b=pFY8EgNJGDhctuRnfuqg3ytxaTAg
axDY0e6VkEB/xTu8PUPyRlHOoOPzoC/SAdU6efWVM+dFipKDUNIWZ3e+qHQOcMkm
zC6c+90miDnL6fNFCw2EOm5Dw/GQE1jR61r2sspkn1AskiGA0QshGiwNmQMUwII0
xWmKCXDJY8HQwxA=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-5.4 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2
X-HELO: calimero.vinschen.de
Date: Thu, 29 Oct 2015 16:35:16 +0100
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: Bug in collation functions?
Message-ID: <20151029153516.GJ5319@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <563148AF DOT 1000502 AT cornell DOT edu> <5631996D DOT 7040908 AT redhat DOT com> <20151029075050 DOT GE5319 AT calimero DOT vinschen DOT de> <20151029083057 DOT GH5319 AT calimero DOT vinschen DOT de> <56321815 DOT 7000203 AT cornell DOT edu>
MIME-Version: 1.0
In-Reply-To: <56321815.7000203@cornell.edu>
User-Agent: Mutt/1.5.23 (2014-03-12)

--6ovzACdnYbOKIO0z
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Oct 29 08:59, Ken Brown wrote:
> On 10/29/2015 4:30 AM, Corinna Vinschen wrote:
> >On Oct 29 08:50, Corinna Vinschen wrote:
> >>On Oct 28 21:58, Eric Blake wrote:
> >>>On 10/28/2015 04:14 PM, Ken Brown wrote:
> >>>>It's my understanding that collation is supposed to take whitespace a=
nd
> >>>>punctuation into account in the POSIX locale but not in other locales.
> >>>
> >>>Not quite right. It is up to the locale definition whether whitespace
> >>>affects collation.  But you are correct that in the POSIX locale,
> >>>whitespace must not be ignored in collation.
> >>>
> >>>>This doesn't seem to be the case on Cygwin.  Here's a test case using
> >>>>wcscoll, but the same problem occurs with strcoll.
> >>>
> >>>That's because the locale definitions are different in cygwin than they
> >>>are in glibc.  But it is not a bug in Cygwin; POSIX allows for differe=
nt
> >>>systems to have different locale definitions while still using the same
> >>>locale name like en_US.UTF-8.
> >>
> >>Btw, strcoll and wcscoll in Cygwin are implemented using the Windows
> >>function CompareStringW with the LCID set to the locale matching the
> >>POSIX locale setting.  I'm rather glad I didn't have to implement this
> >>by myself... :}
> >
> >OTOH, CompareString has a couple of flags to control its behaviour, see
> >https://msdn.microsoft.com/en-us/library/windows/desktop/dd317761%28v=3D=
vs.85%29.aspx
> >
> >Right now Cygwin calls CompareStringW with dwCmpFlags set to 0, but there
> >are flags like NORM_IGNORENONSPACE, NORM_IGNORESYMBOLS.  I'm open to a
> >discussion how to change the settings to more closely resemble the rules
> >on Linux.
> >
> >E.g.  wcscoll simply calls wcscmp rather than CompareStringW for the
> >C/POSIX locale anyway.  So, would it makes sense to set the flags to
> >NORM_IGNORESYMBOLS in other locales?
>=20
> I think so.  That's what the native Windows build of emacs does in this
> situation.

Is that all it's doing?  I'm asking because using NORM_IGNORESYMBOLS
does not exaclty resemble the behaviour on Linux on my W10 box:

    "11" > "1.1" in POSIX locale
!!! "11" > "1.1" in en_US.UTF-8 locale
    "11" > "1 2" in POSIX locale
    "11" < "1 2" in en_US.UTF-8 locale


Corinna

--=20
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

--6ovzACdnYbOKIO0z
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJWMjy0AAoJEPU2Bp2uRE+gU5MP/0tvublBPkmzo+AHPy7XAyXy
0RhJg7klIf2ruW0yRF+QahWnTtkl6Ml37Uu1/pHssio6YyAJ7o96k+N+qTblI0XD
D0MfWZMwsNbvaeE6OOZPaP0uXj5Ou66p8qund2C8ujXuU9egpq73GgUF8tx/AHWT
JXrhhBSC3fGY698sPrGdDiv4PYNukn+QjSC1a7R0Xs3arKeO7/q9dg++kImiAZ2z
1j278dhJd8vL037Uj1ehxQL7W48oPzkmoV2Ch9vfswf7pLh2T4t0J8PJOIRDHRXw
cUAIR1F40my9cVj52EwP/0WwL0ws7qTvCz+Ox422qAcZnSWegNdrXlIm8SD6A0yq
xaujHtifD8Cw8Z5PPTDPc0hAjGn0HYkjFeD5vzCe1c6227GHWSMzlzsH/0/wxE60
QJMZPpQWLS9noa11SZo61FdQBsmOIHluZh+Ui9EDpzFdRWJ8SKML5go+yGgIS1DF
LnuSCSNpbref519G4eUT1ErzIb37qW+VJjNE/bp0aPndRksNHyx12SsAeK+QTkP6
d1Ewdbz9auAJzKtkZ/MzlXIOcC/m+CbS+KNiHxmAvo3g7fXr4WsOSjt8boPdUCvo
EEqqb8dQaiTReTXR58exDktGw8bXQABVL5zq2tP7MO4N731onPsTotiK+UhsX8Bn
aZtyhmWM89VgrLMDZIqB
=+XWf
-----END PGP SIGNATURE-----

--6ovzACdnYbOKIO0z--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019