X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=ilsMMSGgNhZJzJEa UzYQhfo79MLFRzU5D9AR4pXz9OBudGOfKVnFLfv9VsTQm1nb3qg4PviLCWGxej8H 26yKqqw9yxrU/aNVYPtUslmX0wkX+u154EO+lOPm1ky3FNAWs3Dr+vDsNUbs2qlb v2eF7lYeDgAL6OtJ+yVeCGq79NI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=2TqEBBu+w1Q9eThb97yX3k Cq0Sk=; b=oYRLppFgeFum36tuCyyTLjaOH62epV1aYR2juR3n7Dv007PnPToGEr 43ws0UF4C0XUx4A6DlgkVdZTG+h0K4ojpDtxkFyuBtJbE7v8ZbYz31dgzayhwGqf zile2qxdA516Kx8D7VCugBWC6/DUY7ZQyXvBKYR4nyQYHvcxtJaL4= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 X-HELO: limerock02.mail.cornell.edu X-CornellRouted: This message has been Routed already. Subject: Re: Bug in collation functions? To: cygwin AT cygwin DOT com References: <563148AF DOT 1000502 AT cornell DOT edu> <5631996D DOT 7040908 AT redhat DOT com> <20151029075050 DOT GE5319 AT calimero DOT vinschen DOT de> <20151029083057 DOT GH5319 AT calimero DOT vinschen DOT de> <56321815 DOT 7000203 AT cornell DOT edu> <20151029153516 DOT GJ5319 AT calimero DOT vinschen DOT de> <56323F2E DOT 4030807 AT cornell DOT edu> From: Ken Brown Message-ID: <56324598.9060604@cornell.edu> Date: Thu, 29 Oct 2015 12:13:12 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <56323F2E.4030807@cornell.edu> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes On 10/29/2015 11:45 AM, Ken Brown wrote: > On 10/29/2015 11:35 AM, Corinna Vinschen wrote: >> On Oct 29 08:59, Ken Brown wrote: >>> On 10/29/2015 4:30 AM, Corinna Vinschen wrote: >>>> On Oct 29 08:50, Corinna Vinschen wrote: >>>>> On Oct 28 21:58, Eric Blake wrote: >>>>>> On 10/28/2015 04:14 PM, Ken Brown wrote: >>>>>>> It's my understanding that collation is supposed to take >>>>>>> whitespace and >>>>>>> punctuation into account in the POSIX locale but not in other >>>>>>> locales. >>>>>> >>>>>> Not quite right. It is up to the locale definition whether whitespace >>>>>> affects collation. But you are correct that in the POSIX locale, >>>>>> whitespace must not be ignored in collation. >>>>>> >>>>>>> This doesn't seem to be the case on Cygwin. Here's a test case >>>>>>> using >>>>>>> wcscoll, but the same problem occurs with strcoll. >>>>>> >>>>>> That's because the locale definitions are different in cygwin than >>>>>> they >>>>>> are in glibc. But it is not a bug in Cygwin; POSIX allows for >>>>>> different >>>>>> systems to have different locale definitions while still using the >>>>>> same >>>>>> locale name like en_US.UTF-8. >>>>> >>>>> Btw, strcoll and wcscoll in Cygwin are implemented using the Windows >>>>> function CompareStringW with the LCID set to the locale matching the >>>>> POSIX locale setting. I'm rather glad I didn't have to implement this >>>>> by myself... :} >>>> >>>> OTOH, CompareString has a couple of flags to control its behaviour, see >>>> https://msdn.microsoft.com/en-us/library/windows/desktop/dd317761%28v=vs.85%29.aspx >>>> >>>> >>>> Right now Cygwin calls CompareStringW with dwCmpFlags set to 0, but >>>> there >>>> are flags like NORM_IGNORENONSPACE, NORM_IGNORESYMBOLS. I'm open to a >>>> discussion how to change the settings to more closely resemble the >>>> rules >>>> on Linux. >>>> >>>> E.g. wcscoll simply calls wcscmp rather than CompareStringW for the >>>> C/POSIX locale anyway. So, would it makes sense to set the flags to >>>> NORM_IGNORESYMBOLS in other locales? >>> >>> I think so. That's what the native Windows build of emacs does in this >>> situation. >> >> Is that all it's doing? I'm asking because using NORM_IGNORESYMBOLS >> does not exaclty resemble the behaviour on Linux on my W10 box: >> >> "11" > "1.1" in POSIX locale >> !!! "11" > "1.1" in en_US.UTF-8 locale >> "11" > "1 2" in POSIX locale >> "11" < "1 2" in en_US.UTF-8 locale > > I just noticed that myself and was going to ask about that difference. I > don't see anything else that emacs is doing on native Windows. But in > the test I referred to above, the locale is set to "enu_USA" in the > native Windows build. Does that explain the discrepancy? If not, I can > ask on the emacs-devel list whether the test passes on Windows. Never mind. My test case was flawed, because it didn't check for the possibility that wcscoll might return 0. Here's a revised definition of the "compare" function: void compare (const wchar_t *a, const wchar_t *b, const char *loc) { setlocale (LC_COLLATE, loc); int res = wcscoll (a, b); char c = res < 0 ? '<' : res > 0 ? '>' : '='; printf ("\"%ls\" %c \"%ls\" in %s locale\n", a, c, b, loc); } With this change (and the use of NORM_IGNORESYMBOLS) the test returns the following on Cygwin: $ ./wcscoll_test "11" > "1.1" in POSIX locale "11" = "1.1" in en_US.UTF-8 locale "11" > "1 2" in POSIX locale "11" < "1 2" in en_US.UTF-8 locale It still differs from Linux, but it's good enough to make the emacs test pass. Moreover, this behavior actually seems more reasonable to me than the Linux behavior. After all, if you're ignoring punctuation, how can you decide which of "11" or "1.1" comes first? Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple