delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
DomainKey-Signature: | a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:subject:to:references:from:message-id:date | |
:mime-version:in-reply-to:content-type | |
:content-transfer-encoding; q=dns; s=default; b=TQLch6azSRCdxtVP | |
gupUHCxOa7B26Gz1I+LJv/e6LC85IgoarN1ktoieUTdWyulQ2nB0lemwVZq3I/GH | |
6ZtZwd0QqSwk0Nxai/nVMyCA01oXhHhSAswJn59prS2bYTMwGBk5S8h6AgjtVe+p | |
oARugUXnQk4U63Wam+hols0GQ4A= | |
DKIM-Signature: | v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:subject:to:references:from:message-id:date | |
:mime-version:in-reply-to:content-type | |
:content-transfer-encoding; s=default; bh=lCX3QKAF/cXHgOLmETqaG4 | |
cmV44=; b=MIIHa+q9A5OjJ6tLmMI4cWVGWe5CMazMc9hwJrj1GwMbyP8iyh8kHE | |
BDjA7dcTatnS78sXkAI+CnU34p4LoUQ0berUeQ3BSvs6M4Ix4D78YAVHDOisemVL | |
JHIEh9DsT1p8lXrnF/PhsZM/igXp6gI/sb4IwNiF4Oirp5Qy5HOs4= | |
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
List-Id: | <cygwin.cygwin.com> |
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
List-Archive: | <http://sourceware.org/ml/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> |
Sender: | cygwin-owner AT cygwin DOT com |
Mail-Followup-To: | cygwin AT cygwin DOT com |
Delivered-To: | mailing list cygwin AT cygwin DOT com |
Authentication-Results: | sourceware.org; auth=none |
X-Virus-Found: | No |
X-Spam-SWARE-Status: | No, score=-2.1 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 |
X-HELO: | limerock01.mail.cornell.edu |
X-CornellRouted: | This message has been Routed already. |
Subject: | Re: Bug in collation functions? |
To: | cygwin AT cygwin DOT com |
References: | <20151029075050 DOT GE5319 AT calimero DOT vinschen DOT de> <20151029083057 DOT GH5319 AT calimero DOT vinschen DOT de> <56321815 DOT 7000203 AT cornell DOT edu> <20151029153516 DOT GJ5319 AT calimero DOT vinschen DOT de> <56323F2E DOT 4030807 AT cornell DOT edu> <56324598 DOT 9060604 AT cornell DOT edu> <56324E82 DOT 7000402 AT redhat DOT com> <563268A4 DOT 6000005 AT cornell DOT edu> <56329462 DOT 2090206 AT cornell DOT edu> <56329BE8 DOT 808 AT cornell DOT edu> <20151030120320 DOT GO5319 AT calimero DOT vinschen DOT de> |
From: | Ken Brown <kbrown AT cornell DOT edu> |
Message-ID: | <56337996.2000400@cornell.edu> |
Date: | Fri, 30 Oct 2015 10:07:18 -0400 |
User-Agent: | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 |
MIME-Version: | 1.0 |
In-Reply-To: | <20151030120320.GO5319@calimero.vinschen.de> |
X-IsSubscribed: | yes |
Hi Corinna, On 10/30/2015 8:03 AM, Corinna Vinschen wrote: > On Oct 29 18:21, Ken Brown wrote: >> The fallback I had in mind is to return the shorter string if they have >> different lengths and otherwise to revert to wcscmp. > > I had a longer look into this suggestion and the below code and it took > me some time to find out what bugged me with it: > > What about str/wcsxfrm? > > Per POSIX, calling strcmp on the result of strxfrm is equivalent to > calling strcoll (analogue with wcs*). If you extend *coll to perform an > extra check on the length, you will have cases in which the above rule > fails. You can't perform the length test on the result of *xfrm and > expect the same result as in *coll. > > In fact, when calling LCMapStringW with NORM_IGNORESYMBOLS (you would > have to do this anyway if we add this flag in *coll), the resulting > transformed strings created from the input strings "11" and "1.1" would > be identical, so a length test on the xfrm string is not meaningful at > all. > > The bottom line is, afaics, we must make sure that CompareStringW and > LCMapStringW are called the same way, and their result/output has to be > returned to the caller. Performing an extra check in *coll which can't > be reliably performed in *xfrm is not feasible. > > Does that make sense? Yes, I see the problem, and I don't see a good way around it. So I think we probably have to leave things as they are and live with the fact that we can't do comparisons that ignore whitespace and punctuation. The alternative of allowing str/wcscoll to return 0 on unequal strings doesn't seem feasible in view of Eric's comments. What about the other issue I raised: Should setlocale return null to indicate an error if it's given an invalid locale name like en_DE.UTF-8? Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |