delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2006/12/02/03:47:31

X-Authentication-Warning: delorie.com: mail set sender to djgpp-bounces using -f
X-Trace-PostClient-IP: 68.147.232.190
From: Brian Inglis <Brian DOT Inglis AT SystematicSW DOT Invalid>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: fnmatch("\\\\", "\\", 0) == 1 ???
Organization: Systematic Software
Message-ID: <b9c2n2lgp236c6fu7qlpru9qk71b0ht9mr@4ax.com>
References: <iKydndAovO98z_DYnZ2dnUVZ_sadnZ2d AT comcast DOT com> <LO6dnbePOLnKzvDYnZ2dnUVZ_sGdnZ2d AT comcast DOT com> <456dad03$0$486$cc7c7865 AT news DOT luth DOT se> <200611291634 DOT kATGYcbw010800 AT envy DOT delorie DOT com> <n6qsm2l2pk3svjtu7aqu68mmhdmvldprk5 AT 4ax DOT com> <Sb6dnSX2_Plh5_PYnZ2dnUVZ_rSdnZ2d AT comcast DOT com> <IZWdndsG8_TvCfPYnZ2dnUVZ_rGdnZ2d AT comcast DOT com> <S6idnQbIIM9ck-3YnZ2dnUVZ_uSdnZ2d AT comcast DOT com>
X-Newsreader: Forte Agent 1.93/32.576 English (American)
MIME-Version: 1.0
Lines: 192
Date: Sat, 02 Dec 2006 08:33:03 GMT
NNTP-Posting-Host: 64.59.135.176
X-Complaints-To: abuse AT shaw DOT ca
X-Trace: pd7urf1no 1165048383 64.59.135.176 (Sat, 02 Dec 2006 01:33:03 MST)
NNTP-Posting-Date: Sat, 02 Dec 2006 01:33:03 MST
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

On Fri, 1 Dec 2006 03:14:01 -0800 in comp.os.msdos.djgpp, "Alexei A.
Frounze" <alexfru AT chat DOT ru> wrote:

>Alexei A. Frounze wrote:
>> Alexei A. Frounze wrote:
>>> Brian Inglis wrote:
>>>> fOn Wed, 29 Nov 2006 11:34:38 -0500 in comp.os.msdos.djgpp, DJ
>>>> Delorie <dj AT delorie DOT com> wrote:
>>>>
>>>>>
>>>>>> This function indicates if STRING matches the PATTERN. ..."
>>>>>>
>>>>>> So DJGPP says that "\" doesn't match "\\" while Linux says it
>>>>>> does. Well, I say DJGPP is right as the pattern says there should be 
>>>>>> two
>>>>>> backslashes and you only provide one.
>>>>>
>>>>> Except that PATTERN is a regex influenced by FNM_NOESCAPE and
>>>>> FNM_PATHNAME, and STRING isn't.  So a pattern of "\\" is a single
>>>>> escaped backslash, whereas a string of "\" is a single backslash.
>>>>> They should match.
>>>>
>>>>    switch ((c = *pattern++))
>>>>    {
>>>> ...
>>>> ...
>>>> ...
>>>>    case '\\':
>>>> /*+++ pattern already post-incremented to point to next char */
>>>>      if (!(flags & FNM_NOESCAPE) && pattern[1] && strchr("*?[\\",
>>>> pattern[1]))
>>>> /*+++ should be:
>>>>      if (!(flags & FNM_NOESCAPE) && strchr("*?[\\", *pattern))
>>>> *+++ as end of input pattern will match end char in escapes string
>>>>      */ {
>>>> /*+++ end of input pattern might be clearer with ! or == '\0' */
>>>> if ((c = *pattern++) == 0)
>>>> {
>>>>   c = '\\';
>>>>   --pattern;
>>>> }
>>>> if (c != *string++)
>>>>   return FNM_NOMATCH;
>>>> break;
>>>>      }
>>>
>>> I don't think the above is enough. There's another problem. With the
>>> above code you'd never see (c = *pattern++) == 0. My bet is that the
>>> intent was to treat the slash in the last character of pattern as an
>>> ordinary character. That would explain the {c = '\\'; --pattern;}
>>> thing along with the fallthrough behavior. But the code is broken in
>>> this place. Dunno if it was tested against the single unix spec or
>>> just a little bit to see that it seems to work (in some basic cases).
>>
>> One more thing to consider, closing bracket as first character in the
>> list/range inside the bracket expression:
>> fnmatch("[]]", "]", 0) must return 0, doesn't
>> fnmatch("[!]]", "]", 0) must return 1, does (by luck)
>> fnmatch("[!]]", "a", 0) must return 0, doesn't
>>
>> And one more, which seems to be partially wrong even in that same RH9
>> linux distro:
>> fnmatch("\ab\c","abc",0) must return 0, does in linux, doesn't in
>> DJGPP fnmatch("\[abc\]","[abc]",0) must return 0, doesn't in both
>> linux and DJGPP -- the spec doesn't make an exception for the escaped 
>> opening
>> bracket, \[ when describes the escaping option. Dunno, maybe it would
>> be wise not to escape the bracket, but other characters should be
>> made escapable w/o a problem.
>
>Actually, I was wrong about linux failing fnmatch("\[abc\]","[abc]",0)==0 --  
>I didn't put quotation marks around arguments to fnmatch that were passed to 
>it from the command line and therefore fnmatch wasn't comparing the same 
>thing (shell stripped some stuff). So, the above two things are only wrong 
>in DJGPP.
>
>> Alex
>> P.S. all the details obtained from fnmatch()'s description in The
>> Single Unix Specification V3 2004 issue 6.

N.B. details from SUSV3 http://unix.org

"2.13.3 Patterns Used for Filename Expansion

The rules described so far in Patterns Matching a Single Character and
Patterns Matching Multiple Characters are qualified by the following
rules that apply when pattern matching notation is used for filename
expansion:

1. The slash character in a pathname shall be explicitly matched by
using one or more slashes in the pattern; it shall neither be matched
by the asterisk or question-mark special characters nor by a bracket
expression. Slashes in the pattern shall be identified before bracket
expressions; thus, a slash cannot be included in a pattern bracket
expression used for filename expansion. If a slash character is found
following an unescaped open square bracket character before a
corresponding closing square bracket is found, the open bracket shall
be treated as an ordinary character. For example, the pattern
"a[b/c]d" does not match such pathnames as abd or a/d. It only matches
a pathname of literally a[b/c]d.

2. If a filename begins with a period ( '.' ), the period shall be
explicitly matched by using a period as the first character of the
pattern or immediately following a slash character. The leading period
shall not be matched by:
   * The asterisk or question-mark special characters
   * A bracket expression containing a non-matching list, such as
"[!a]", a range expression, such as "[%-0]", or a character class
expression, such as "[[:punct:]]"
   It is unspecified whether an explicit period in a bracket
expression matching list, such as "[.abc]", can match a leading period
in a filename."

"The flags argument shall modify the interpretation of pattern and
string. It is the bitwise-inclusive OR of zero or more of the flags
defined in <fnmatch.h>. If the FNM_PATHNAME flag is set in flags, then
a slash character ( '/' ) in string shall be explicitly matched by a
slash in pattern; it shall not be matched by either the asterisk or
question-mark special characters, nor by a bracket expression. If the
FNM_PATHNAME flag is not set, the slash character shall be treated as
an ordinary character.

If FNM_NOESCAPE is not set in flags, a backslash character ( '\' ) in
pattern followed by any other character shall match that second
character in string. In particular, "\\" shall match a backslash in
string. If FNM_NOESCAPE is set, a backslash character shall be treated
as an ordinary character.

If FNM_PERIOD is set in flags, then a leading period ( '.' ) in string
shall match a period in pattern; as described by rule 2 in the Shell
and Utilities volume of IEEE Std 1003.1-2001, Section 2.13.3, Patterns
Used for Filename Expansion where the location of "leading" is
indicated by the value of FNM_PATHNAME:

 * If FNM_PATHNAME is set, a period is "leading" if it is the first
character in string or if it immediately follows a slash.
 * If FNM_PATHNAME is not set, a period is "leading" only if it is the
first character of string.

If FNM_PERIOD is not set, then no special restrictions are placed on
matching a period."

>> P.P.S. of course there's a ton of what DJGPP's fnmatch() doesn't
>> support, but the above things are pretty basic and it would be nice
>> to have them handled properly, unless I overlook some major
>> DOS-related issue for which it would be desirable to deviate from the
>> spec.
>
>Another couple of examples revealing incorrect behavior of fnmatch() in 
>DJGPP:
>fnmatch("*\a", "a", 0) must return 0, doesn't
>fnmatch("\[a]", "[a]", 0) must return 0, doesn't
>fnmatch("\[a]", "a", 0) must return 1, does
>
>So, as I understand it, the fnmatch() code flaws are:
>1. rangematch() doesn't allow for the following two patterns: []...] and 
>[!]...] where ] is a valid char in the range

ISTM that the [- and [!- cases where - is treated as a literal
character aren't handled either. 

>2. asterisk handling doesn't distinguish in the following the various 
>options for c and what follows c:
>      else if (isslash(c) && flags & FNM_PATHNAME)
>      {
>        if ((string = find_slash(string)) == NULL)
>          return FNM_NOMATCH;
>        break;
>      }
>a) c=='/' // forward slash
>b) c=='\', (flags & FNM_NOESCAPE)!=0 // back slash
>c) c=='\', (flags & FNM_NOESCAPE)==0, isslash(pattern[1])==1 // any escaped 
>slash
>3. '\\' handling is completely broken (wrong indices and logic). If escaping 
>is on, it must fall through the case to default if '\\' is followed by 
>anything from "\\?*[" to make sure those aren't interpreted as special chars 
>again but instead are interpreted as ordinary chars. For all other chars 
>(including '\0') it's better to break out from the case/switch to treat 
>those chars as ordinary by the other existing cases.

Multiple slashes should also be treated the same as a single slash. 

Leading period handling does not seem to be dealt with either, nor the
DOS _ equivalent: perhaps this should be handled similar to slash and
backslash, only treated the same as period when FNM_NOESCAPE and
FNM_PERIOD are specified. 

-- 
Thanks. Take care, Brian Inglis 	Calgary, Alberta, Canada

Brian DOT Inglis AT CSi DOT com 	(Brian[dot]Inglis{at}SystematicSW[dot]ab[dot]ca)
    fake address		use address above to reply

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019