delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2015/10/29/11:52:06

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type; q=dns; s=default; b=dpSB
BrB5h1DAuxOXRdV9p2Bt9WWz0+hUQ2omTBLfE/HZ/g6EqE1koBl/b2GaN0uMgB2O
bEMaltM/rhPdJWqKtYpx0WqEdVjc+qCbuB9ZbtP/jsD6/SKsgWakW6VcrPnoF1UG
EwWa7FuxFWjo+8/jDuRvxrRZvNyvsjGBLggsPaY=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type; s=default; bh=/OnKmEh+Xb
1bYfFKMikjrv4hcUI=; b=yS9f9pUgiDXeZANSY1Vck8LXimZBi1qCAa7tKZT/ro
ILsCItSZiSd+Ji59SH600BUrJ2EVi/nN70rUc5uZDGDq1ehnkYbvjWRdiGjC6LAX
GaoeAYao1yjSJlbTu5vshJwN7UzxFIC8HemTUUBkYIobCejfl3uBHZy7UrXLYw2f
8=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2
X-HELO: mx1.redhat.com
Subject: Re: Bug in collation functions?
To: cygwin AT cygwin DOT com
References: <563148AF DOT 1000502 AT cornell DOT edu> <5631996D DOT 7040908 AT redhat DOT com> <20151029075050 DOT GE5319 AT calimero DOT vinschen DOT de> <20151029083057 DOT GH5319 AT calimero DOT vinschen DOT de> <56321815 DOT 7000203 AT cornell DOT edu> <20151029153516 DOT GJ5319 AT calimero DOT vinschen DOT de>
From: Eric Blake <eblake AT redhat DOT com>
Openpgp: url=http://people.redhat.com/eblake/eblake.gpg
Message-ID: <56324089.2090702@redhat.com>
Date: Thu, 29 Oct 2015 09:51:37 -0600
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <20151029153516.GJ5319@calimero.vinschen.de>
X-IsSubscribed: yes

--hfdbkeeam81pIsOEvvsSWISvu2CcS9N4R
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

On 10/29/2015 09:35 AM, Corinna Vinschen wrote:

>>> Right now Cygwin calls CompareStringW with dwCmpFlags set to 0, but the=
re
>>> are flags like NORM_IGNORENONSPACE, NORM_IGNORESYMBOLS.  I'm open to a
>>> discussion how to change the settings to more closely resemble the rules
>>> on Linux.
>>>
>>> E.g.  wcscoll simply calls wcscmp rather than CompareStringW for the
>>> C/POSIX locale anyway.  So, would it makes sense to set the flags to
>>> NORM_IGNORESYMBOLS in other locales?
>>
>> I think so.  That's what the native Windows build of emacs does in this
>> situation.
>=20
> Is that all it's doing?  I'm asking because using NORM_IGNORESYMBOLS
> does not exaclty resemble the behaviour on Linux on my W10 box:
>=20
>     "11" > "1.1" in POSIX locale
> !!! "11" > "1.1" in en_US.UTF-8 locale
>     "11" > "1 2" in POSIX locale
>     "11" < "1 2" in en_US.UTF-8 locale
>=20

I'm not sure if blindly enabling the flags for all locales makes sense,
though.  I haven't audited glibc locales to know for sure, but it is my
impression that it is up to the locale author on whether whitespace
affects collation; and while the author of glibc en_US.UTF-8 may have
chosen that way, I can't guarantee that some other locales in glibc
still treat whitespace as significant.

POSIX has a notion of writing your own locale definition - and glibc
definitely supports that (although I haven't personally tried doing it),
where you can set your OWN collation rules while inheriting the bulk of
the work from an existing locale.   So in glibc, it is possible to have
a locale similar to en_US.UTF-8 but where whitespace IS significant in
collation.  I know cygwin isn't there yet (we expose the Windows locale,
but do not let you define your own).

This seems like the sort of thing where maybe we'd want support for
user-defined locales, compiled into a binary format, and then cygwin
opens the binary locale definition for deciding which flags to use
according to the locale being used.  But that sounds like a LOT of work,
for a questionable amount of gain.

--=20
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


--hfdbkeeam81pIsOEvvsSWISvu2CcS9N4R
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBCAAGBQJWMkCJAAoJEKeha0olJ0NqfYYIAIaC3ujaDnsMJ5pd2RjUG/Ve
DfFGAfi/CChGdyxN8eUSyfK6T2+HcoaDgH3qrWBfb9/V3h81exkmyFnEarxXaJVw
1gt24MhB1ZqNuclX484RE7tuN0j4WQ8EDjWy+Eqnwxp64JwIG/ag9oCCZ7TUbA++
fr0/KjObWMYoiyzE4I0szU+JWGw/dMTqAQDIFMMgZWGIs2pUBjLCI7nNHX1ObMN8
VJTT3B1bXbw8A8UZ6yVUyz8PwGU/X/TMF5lwylChcjWFys4+PS2UpheC3Uq1GkfF
LoN0eBpFn7Rir+NKEgKwFx7uAoRop8e1SE4LgqH8MxYNPnis6mbRMOIsKvDhuH0=
=sOMy
-----END PGP SIGNATURE-----

--hfdbkeeam81pIsOEvvsSWISvu2CcS9N4R--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019