X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; q=dns; s= default; b=tk00RDd3gtjiYId9mg48qGGbrCKL5D8vlgg22loMJqwjVfGApJ95X S0eJy3SpMQiuuKjjR8Y4RaDPG9jidFTsILxeDhiaslYRXzkLiU5e2dZ8WnKQ1+rq u+nRbM4gZAGgPOOWGs/8XLz1LIF4MlowM3VR1QWYiCsZIVDfaEZtuk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; s=default; bh=bHRyHBmyQauH7R3xfsWBhT5XR7M=; b=pFY8EgNJGDhctuRnfuqg3ytxaTAg axDY0e6VkEB/xTu8PUPyRlHOoOPzoC/SAdU6efWVM+dFipKDUNIWZ3e+qHQOcMkm zC6c+90miDnL6fNFCw2EOm5Dw/GQE1jR61r2sspkn1AskiGA0QshGiwNmQMUwII0 xWmKCXDJY8HQwxA= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-5.4 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2 X-HELO: calimero.vinschen.de Date: Thu, 29 Oct 2015 16:35:16 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: Bug in collation functions? Message-ID: <20151029153516.GJ5319@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <563148AF DOT 1000502 AT cornell DOT edu> <5631996D DOT 7040908 AT redhat DOT com> <20151029075050 DOT GE5319 AT calimero DOT vinschen DOT de> <20151029083057 DOT GH5319 AT calimero DOT vinschen DOT de> <56321815 DOT 7000203 AT cornell DOT edu> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="6ovzACdnYbOKIO0z" Content-Disposition: inline In-Reply-To: <56321815.7000203@cornell.edu> User-Agent: Mutt/1.5.23 (2014-03-12) --6ovzACdnYbOKIO0z Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Oct 29 08:59, Ken Brown wrote: > On 10/29/2015 4:30 AM, Corinna Vinschen wrote: > >On Oct 29 08:50, Corinna Vinschen wrote: > >>On Oct 28 21:58, Eric Blake wrote: > >>>On 10/28/2015 04:14 PM, Ken Brown wrote: > >>>>It's my understanding that collation is supposed to take whitespace a= nd > >>>>punctuation into account in the POSIX locale but not in other locales. > >>> > >>>Not quite right. It is up to the locale definition whether whitespace > >>>affects collation. But you are correct that in the POSIX locale, > >>>whitespace must not be ignored in collation. > >>> > >>>>This doesn't seem to be the case on Cygwin. Here's a test case using > >>>>wcscoll, but the same problem occurs with strcoll. > >>> > >>>That's because the locale definitions are different in cygwin than they > >>>are in glibc. But it is not a bug in Cygwin; POSIX allows for differe= nt > >>>systems to have different locale definitions while still using the same > >>>locale name like en_US.UTF-8. > >> > >>Btw, strcoll and wcscoll in Cygwin are implemented using the Windows > >>function CompareStringW with the LCID set to the locale matching the > >>POSIX locale setting. I'm rather glad I didn't have to implement this > >>by myself... :} > > > >OTOH, CompareString has a couple of flags to control its behaviour, see > >https://msdn.microsoft.com/en-us/library/windows/desktop/dd317761%28v=3D= vs.85%29.aspx > > > >Right now Cygwin calls CompareStringW with dwCmpFlags set to 0, but there > >are flags like NORM_IGNORENONSPACE, NORM_IGNORESYMBOLS. I'm open to a > >discussion how to change the settings to more closely resemble the rules > >on Linux. > > > >E.g. wcscoll simply calls wcscmp rather than CompareStringW for the > >C/POSIX locale anyway. So, would it makes sense to set the flags to > >NORM_IGNORESYMBOLS in other locales? >=20 > I think so. That's what the native Windows build of emacs does in this > situation. Is that all it's doing? I'm asking because using NORM_IGNORESYMBOLS does not exaclty resemble the behaviour on Linux on my W10 box: "11" > "1.1" in POSIX locale !!! "11" > "1.1" in en_US.UTF-8 locale "11" > "1 2" in POSIX locale "11" < "1 2" in en_US.UTF-8 locale Corinna --=20 Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat --6ovzACdnYbOKIO0z Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWMjy0AAoJEPU2Bp2uRE+gU5MP/0tvublBPkmzo+AHPy7XAyXy 0RhJg7klIf2ruW0yRF+QahWnTtkl6Ml37Uu1/pHssio6YyAJ7o96k+N+qTblI0XD D0MfWZMwsNbvaeE6OOZPaP0uXj5Ou66p8qund2C8ujXuU9egpq73GgUF8tx/AHWT JXrhhBSC3fGY698sPrGdDiv4PYNukn+QjSC1a7R0Xs3arKeO7/q9dg++kImiAZ2z 1j278dhJd8vL037Uj1ehxQL7W48oPzkmoV2Ch9vfswf7pLh2T4t0J8PJOIRDHRXw cUAIR1F40my9cVj52EwP/0WwL0ws7qTvCz+Ox422qAcZnSWegNdrXlIm8SD6A0yq xaujHtifD8Cw8Z5PPTDPc0hAjGn0HYkjFeD5vzCe1c6227GHWSMzlzsH/0/wxE60 QJMZPpQWLS9noa11SZo61FdQBsmOIHluZh+Ui9EDpzFdRWJ8SKML5go+yGgIS1DF LnuSCSNpbref519G4eUT1ErzIb37qW+VJjNE/bp0aPndRksNHyx12SsAeK+QTkP6 d1Ewdbz9auAJzKtkZ/MzlXIOcC/m+CbS+KNiHxmAvo3g7fXr4WsOSjt8boPdUCvo EEqqb8dQaiTReTXR58exDktGw8bXQABVL5zq2tP7MO4N731onPsTotiK+UhsX8Bn aZtyhmWM89VgrLMDZIqB =+XWf -----END PGP SIGNATURE----- --6ovzACdnYbOKIO0z--