From: cgf AT cygnus DOT com (Christopher Faylor) Subject: Re: strcasecmp revisited 30 Nov 1998 12:55:39 -0800 Message-ID: <19981130152146.A16484.cygnus.cygwin32.developers@cygnus.com> References: <19981130132257 DOT B15656 AT cygnus DOT com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii To: Larry Hall , cygwin32-developers AT cygnus DOT com On Mon, Nov 30, 1998 at 02:42:43PM -0500, Larry Hall wrote: >At 01:22 PM 11/30/98 -0500, Christopher Faylor wrote: >>Someone on the gnu-win32 mailing list noted that strcasecmp does not return >>"the right thing" when comparing "a" to "_". Instead of returning the >>difference between "_" and "a", it returns the difference between "_" and "A". >> >>I was just about to check in a fix for this behavior when it occurred to >>me that I should check the Single UNIX Specification to see how they say >>this should be handled. Here's what they say: >> >> The strcasecmp() function compares, while ignoring differences in case, >> the string pointed to by s1 to the string pointed to by s2. The >> strncasecmp() function compares, while ignoring differences in case, not >> more than n bytes from the string pointed to by s1 to the string pointed >> to by s2. >> >> In the POSIX locale, strcasecmp() and strncasecmp() do upper to lower >> conversions, then a byte comparison. The results are unspecified in >> other locales. >> >>The newlib strcasecmp does a toupper on the string and ignores locales >>but, except for that, it seems to be complying with the spirit of the >>above paragraphs. >> >>My change detected the case where a non-alpha was being compared to an >>alpha and avoided doing a toupper in that case. I'm wondering if this >>is the correct thing to do given the above description? >> >>Does anybody have any opinions? > >Here's mine. The excerpt you've given states that there is a specific way >to do the conversion for POSIX locales and an unspecified way to do it for >other locales. If newlib ignores the locale, that means to me that it does >the same kind of comparison regardless of the locale. If so, in order to >be compliant with the statement above, it would need to do one of 2 things: > > 1. Perform a POSIX locale-style comparison for all locales. That is almost what is happening now except for the tolower case. Hmm. I just checked this behavior on Linux and it appears that I was just misinterpreting what was going on here. Linux apparently does compare a non-alpha character to the lowercased character. That's all it does. So, my change was actually wrong. > 2. Perform a POSIX locale-style comparison for POSIX locales and any other > kind of comparison for any other locale. Eventually it would be nice to be fully POSIX compliant and to understand locales. >If you still agree with my reasoning, it seems to me that what you've done >already fits the latter half of my statement (2) above. It doesn't address >the first half nor does it address statement (1). To me, it seems like (1) >is the preferable and easiest thing to do from an implementational >perspective, although filling out the implementation for (2) would leverage >what's already there and shouldn't be much harder. The "right thing" seems to be to change the occurrences of toupper in strcasecmp to tolower. That should make the function slightly more compliant. -- cgf AT cygnus DOT com http://www.cygnus.com/