X-Authentication-Warning: delorie.com: mail set sender to djgpp-bounces using -f X-Trace-PostClient-IP: 68.147.232.190 From: Brian Inglis Newsgroups: comp.os.msdos.djgpp Subject: Re: fnmatch("\\\\", "\\", 0) == 1 ??? Organization: Systematic Software Message-ID: References: <456dad03$0$486$cc7c7865 AT news DOT luth DOT se> <200611291634 DOT kATGYcbw010800 AT envy DOT delorie DOT com> X-Newsreader: Forte Agent 1.93/32.576 English (American) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Lines: 192 Date: Sat, 02 Dec 2006 08:33:03 GMT NNTP-Posting-Host: 64.59.135.176 X-Complaints-To: abuse AT shaw DOT ca X-Trace: pd7urf1no 1165048383 64.59.135.176 (Sat, 02 Dec 2006 01:33:03 MST) NNTP-Posting-Date: Sat, 02 Dec 2006 01:33:03 MST To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com On Fri, 1 Dec 2006 03:14:01 -0800 in comp.os.msdos.djgpp, "Alexei A. Frounze" wrote: >Alexei A. Frounze wrote: >> Alexei A. Frounze wrote: >>> Brian Inglis wrote: >>>> fOn Wed, 29 Nov 2006 11:34:38 -0500 in comp.os.msdos.djgpp, DJ >>>> Delorie wrote: >>>> >>>>> >>>>>> This function indicates if STRING matches the PATTERN. ..." >>>>>> >>>>>> So DJGPP says that "\" doesn't match "\\" while Linux says it >>>>>> does. Well, I say DJGPP is right as the pattern says there should be >>>>>> two >>>>>> backslashes and you only provide one. >>>>> >>>>> Except that PATTERN is a regex influenced by FNM_NOESCAPE and >>>>> FNM_PATHNAME, and STRING isn't. So a pattern of "\\" is a single >>>>> escaped backslash, whereas a string of "\" is a single backslash. >>>>> They should match. >>>> >>>> switch ((c = *pattern++)) >>>> { >>>> ... >>>> ... >>>> ... >>>> case '\\': >>>> /*+++ pattern already post-incremented to point to next char */ >>>> if (!(flags & FNM_NOESCAPE) && pattern[1] && strchr("*?[\\", >>>> pattern[1])) >>>> /*+++ should be: >>>> if (!(flags & FNM_NOESCAPE) && strchr("*?[\\", *pattern)) >>>> *+++ as end of input pattern will match end char in escapes string >>>> */ { >>>> /*+++ end of input pattern might be clearer with ! or == '\0' */ >>>> if ((c = *pattern++) == 0) >>>> { >>>> c = '\\'; >>>> --pattern; >>>> } >>>> if (c != *string++) >>>> return FNM_NOMATCH; >>>> break; >>>> } >>> >>> I don't think the above is enough. There's another problem. With the >>> above code you'd never see (c = *pattern++) == 0. My bet is that the >>> intent was to treat the slash in the last character of pattern as an >>> ordinary character. That would explain the {c = '\\'; --pattern;} >>> thing along with the fallthrough behavior. But the code is broken in >>> this place. Dunno if it was tested against the single unix spec or >>> just a little bit to see that it seems to work (in some basic cases). >> >> One more thing to consider, closing bracket as first character in the >> list/range inside the bracket expression: >> fnmatch("[]]", "]", 0) must return 0, doesn't >> fnmatch("[!]]", "]", 0) must return 1, does (by luck) >> fnmatch("[!]]", "a", 0) must return 0, doesn't >> >> And one more, which seems to be partially wrong even in that same RH9 >> linux distro: >> fnmatch("\ab\c","abc",0) must return 0, does in linux, doesn't in >> DJGPP fnmatch("\[abc\]","[abc]",0) must return 0, doesn't in both >> linux and DJGPP -- the spec doesn't make an exception for the escaped >> opening >> bracket, \[ when describes the escaping option. Dunno, maybe it would >> be wise not to escape the bracket, but other characters should be >> made escapable w/o a problem. > >Actually, I was wrong about linux failing fnmatch("\[abc\]","[abc]",0)==0 -- >I didn't put quotation marks around arguments to fnmatch that were passed to >it from the command line and therefore fnmatch wasn't comparing the same >thing (shell stripped some stuff). So, the above two things are only wrong >in DJGPP. > >> Alex >> P.S. all the details obtained from fnmatch()'s description in The >> Single Unix Specification V3 2004 issue 6. N.B. details from SUSV3 http://unix.org "2.13.3 Patterns Used for Filename Expansion The rules described so far in Patterns Matching a Single Character and Patterns Matching Multiple Characters are qualified by the following rules that apply when pattern matching notation is used for filename expansion: 1. The slash character in a pathname shall be explicitly matched by using one or more slashes in the pattern; it shall neither be matched by the asterisk or question-mark special characters nor by a bracket expression. Slashes in the pattern shall be identified before bracket expressions; thus, a slash cannot be included in a pattern bracket expression used for filename expansion. If a slash character is found following an unescaped open square bracket character before a corresponding closing square bracket is found, the open bracket shall be treated as an ordinary character. For example, the pattern "a[b/c]d" does not match such pathnames as abd or a/d. It only matches a pathname of literally a[b/c]d. 2. If a filename begins with a period ( '.' ), the period shall be explicitly matched by using a period as the first character of the pattern or immediately following a slash character. The leading period shall not be matched by: * The asterisk or question-mark special characters * A bracket expression containing a non-matching list, such as "[!a]", a range expression, such as "[%-0]", or a character class expression, such as "[[:punct:]]" It is unspecified whether an explicit period in a bracket expression matching list, such as "[.abc]", can match a leading period in a filename." "The flags argument shall modify the interpretation of pattern and string. It is the bitwise-inclusive OR of zero or more of the flags defined in . If the FNM_PATHNAME flag is set in flags, then a slash character ( '/' ) in string shall be explicitly matched by a slash in pattern; it shall not be matched by either the asterisk or question-mark special characters, nor by a bracket expression. If the FNM_PATHNAME flag is not set, the slash character shall be treated as an ordinary character. If FNM_NOESCAPE is not set in flags, a backslash character ( '\' ) in pattern followed by any other character shall match that second character in string. In particular, "\\" shall match a backslash in string. If FNM_NOESCAPE is set, a backslash character shall be treated as an ordinary character. If FNM_PERIOD is set in flags, then a leading period ( '.' ) in string shall match a period in pattern; as described by rule 2 in the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.13.3, Patterns Used for Filename Expansion where the location of "leading" is indicated by the value of FNM_PATHNAME: * If FNM_PATHNAME is set, a period is "leading" if it is the first character in string or if it immediately follows a slash. * If FNM_PATHNAME is not set, a period is "leading" only if it is the first character of string. If FNM_PERIOD is not set, then no special restrictions are placed on matching a period." >> P.P.S. of course there's a ton of what DJGPP's fnmatch() doesn't >> support, but the above things are pretty basic and it would be nice >> to have them handled properly, unless I overlook some major >> DOS-related issue for which it would be desirable to deviate from the >> spec. > >Another couple of examples revealing incorrect behavior of fnmatch() in >DJGPP: >fnmatch("*\a", "a", 0) must return 0, doesn't >fnmatch("\[a]", "[a]", 0) must return 0, doesn't >fnmatch("\[a]", "a", 0) must return 1, does > >So, as I understand it, the fnmatch() code flaws are: >1. rangematch() doesn't allow for the following two patterns: []...] and >[!]...] where ] is a valid char in the range ISTM that the [- and [!- cases where - is treated as a literal character aren't handled either. >2. asterisk handling doesn't distinguish in the following the various >options for c and what follows c: > else if (isslash(c) && flags & FNM_PATHNAME) > { > if ((string = find_slash(string)) == NULL) > return FNM_NOMATCH; > break; > } >a) c=='/' // forward slash >b) c=='\', (flags & FNM_NOESCAPE)!=0 // back slash >c) c=='\', (flags & FNM_NOESCAPE)==0, isslash(pattern[1])==1 // any escaped >slash >3. '\\' handling is completely broken (wrong indices and logic). If escaping >is on, it must fall through the case to default if '\\' is followed by >anything from "\\?*[" to make sure those aren't interpreted as special chars >again but instead are interpreted as ordinary chars. For all other chars >(including '\0') it's better to break out from the case/switch to treat >those chars as ordinary by the other existing cases. Multiple slashes should also be treated the same as a single slash. Leading period handling does not seem to be dealt with either, nor the DOS _ equivalent: perhaps this should be handled similar to slash and backslash, only treated the same as period when FNM_NOESCAPE and FNM_PERIOD are specified. -- Thanks. Take care, Brian Inglis Calgary, Alberta, Canada Brian DOT Inglis AT CSi DOT com (Brian[dot]Inglis{at}SystematicSW[dot]ab[dot]ca) fake address use address above to reply