X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=OhPyaohq1kiwzcDj khBZnLmUBlmaQGc9HXlzSEQVIVRIhRO4O/4AST4Q+rVvvbP2Ylh/KNx2jQ+NJGjU bUw8eCF3Lt4lTufSP296MrHUY1Dq3TlHXnI09Mmn7MF1eAZmARvWGbzH+dUXNq/3 gPkvaGXdSDa246vru2D3Tc+j9zM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=YJt9X513AoYnn2fTdL2nyF 1EyXI=; b=bXVBvUBTSW9aetziJu7vEFfEbhyUFyS8ojvpZs/TNHbadDFj9G0lbR zkQQT5JnWpjQcCmolu4njknya/e+wDJOzhXD9+EzUSvoM42n6xh8dljapEQWxXI6 ywjCVTiYJhfbht2ZEZL/8SDO3LeHOFriaSBY231Q5nJHj5mPjA66g= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_PASS,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: limerock03.mail.cornell.edu X-CornellRouted: This message has been Routed already. Subject: Re: Bug in collation functions? To: cygwin AT cygwin DOT com References: <20151029075050 DOT GE5319 AT calimero DOT vinschen DOT de> <20151029083057 DOT GH5319 AT calimero DOT vinschen DOT de> <56321815 DOT 7000203 AT cornell DOT edu> <20151029153516 DOT GJ5319 AT calimero DOT vinschen DOT de> <56323F2E DOT 4030807 AT cornell DOT edu> <56324598 DOT 9060604 AT cornell DOT edu> <56324E82 DOT 7000402 AT redhat DOT com> <563268A4 DOT 6000005 AT cornell DOT edu> <56329462 DOT 2090206 AT cornell DOT edu> <56329BE8 DOT 808 AT cornell DOT edu> <20151030120320 DOT GO5319 AT calimero DOT vinschen DOT de> <56337996 DOT 2000400 AT cornell DOT edu> From: Ken Brown Message-ID: <5634F6BA.7070301@cornell.edu> Date: Sat, 31 Oct 2015 13:13:30 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <56337996.2000400@cornell.edu> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes On 10/30/2015 10:07 AM, Ken Brown wrote: > Hi Corinna, > > On 10/30/2015 8:03 AM, Corinna Vinschen wrote: >> On Oct 29 18:21, Ken Brown wrote: >>> The fallback I had in mind is to return the shorter string if they have >>> different lengths and otherwise to revert to wcscmp. > > >> I had a longer look into this suggestion and the below code and it took >> me some time to find out what bugged me with it: >> >> What about str/wcsxfrm? >> >> Per POSIX, calling strcmp on the result of strxfrm is equivalent to >> calling strcoll (analogue with wcs*). If you extend *coll to perform an >> extra check on the length, you will have cases in which the above rule >> fails. You can't perform the length test on the result of *xfrm and >> expect the same result as in *coll. >> >> In fact, when calling LCMapStringW with NORM_IGNORESYMOLS (you would >> have to do this anyway if we add this flag in *coll), the resulting >> transformed strings created from the input strings "11" and "1.1" would >> be identical, so a length test on the xfrm string is not meaningful at >> all. >> >> The bottom line is, afaics, we must make sure that CompareStringW and >> LCMapStringW are called the same way, and their result/output has to be >> returned to the caller. Performing an extra check in *coll which can't >> be reliably performed in *xfrm is not feasible. >> >> Does that make sense? > > Yes, I see the problem, and I don't see a good way around it. So I > think we probably have to leave things as they are and live with the > fact that we can't do comparisons that ignore whitespace and punctuation. > > The alternative of allowing str/wcscoll to return 0 on unequal strings > doesn't seem feasible in view of Eric's comments. I have one other idea. What would you think of defining a function cygwin_strcoll that's like strcoll but with an extra bool parameter 'ignoresymbols'? If ignoresymbols = false, this would be the same as strcoll. If ignoresymbols = true, this would use NORM_IGNORESYMBOLS with the fallback I suggested. That way applications that prefer to be more glibc-compatible and don't need strxfrm could do something like #define strcoll(A,B) cygwin_strcoll ((A), (B), true) If you think this is reasonable, I'll submit a patch. If not, no problem. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple