Mail Archives: djgpp/2006/12/02/03:47:31
On Fri, 1 Dec 2006 03:14:01 -0800 in comp.os.msdos.djgpp, "Alexei A.
Frounze" <alexfru AT chat DOT ru> wrote:
>Alexei A. Frounze wrote:
>> Alexei A. Frounze wrote:
>>> Brian Inglis wrote:
>>>> fOn Wed, 29 Nov 2006 11:34:38 -0500 in comp.os.msdos.djgpp, DJ
>>>> Delorie <dj AT delorie DOT com> wrote:
>>>>
>>>>>
>>>>>> This function indicates if STRING matches the PATTERN. ..."
>>>>>>
>>>>>> So DJGPP says that "\" doesn't match "\\" while Linux says it
>>>>>> does. Well, I say DJGPP is right as the pattern says there should be
>>>>>> two
>>>>>> backslashes and you only provide one.
>>>>>
>>>>> Except that PATTERN is a regex influenced by FNM_NOESCAPE and
>>>>> FNM_PATHNAME, and STRING isn't. So a pattern of "\\" is a single
>>>>> escaped backslash, whereas a string of "\" is a single backslash.
>>>>> They should match.
>>>>
>>>> switch ((c = *pattern++))
>>>> {
>>>> ...
>>>> ...
>>>> ...
>>>> case '\\':
>>>> /*+++ pattern already post-incremented to point to next char */
>>>> if (!(flags & FNM_NOESCAPE) && pattern[1] && strchr("*?[\\",
>>>> pattern[1]))
>>>> /*+++ should be:
>>>> if (!(flags & FNM_NOESCAPE) && strchr("*?[\\", *pattern))
>>>> *+++ as end of input pattern will match end char in escapes string
>>>> */ {
>>>> /*+++ end of input pattern might be clearer with ! or == '\0' */
>>>> if ((c = *pattern++) == 0)
>>>> {
>>>> c = '\\';
>>>> --pattern;
>>>> }
>>>> if (c != *string++)
>>>> return FNM_NOMATCH;
>>>> break;
>>>> }
>>>
>>> I don't think the above is enough. There's another problem. With the
>>> above code you'd never see (c = *pattern++) == 0. My bet is that the
>>> intent was to treat the slash in the last character of pattern as an
>>> ordinary character. That would explain the {c = '\\'; --pattern;}
>>> thing along with the fallthrough behavior. But the code is broken in
>>> this place. Dunno if it was tested against the single unix spec or
>>> just a little bit to see that it seems to work (in some basic cases).
>>
>> One more thing to consider, closing bracket as first character in the
>> list/range inside the bracket expression:
>> fnmatch("[]]", "]", 0) must return 0, doesn't
>> fnmatch("[!]]", "]", 0) must return 1, does (by luck)
>> fnmatch("[!]]", "a", 0) must return 0, doesn't
>>
>> And one more, which seems to be partially wrong even in that same RH9
>> linux distro:
>> fnmatch("\ab\c","abc",0) must return 0, does in linux, doesn't in
>> DJGPP fnmatch("\[abc\]","[abc]",0) must return 0, doesn't in both
>> linux and DJGPP -- the spec doesn't make an exception for the escaped
>> opening
>> bracket, \[ when describes the escaping option. Dunno, maybe it would
>> be wise not to escape the bracket, but other characters should be
>> made escapable w/o a problem.
>
>Actually, I was wrong about linux failing fnmatch("\[abc\]","[abc]",0)==0 --
>I didn't put quotation marks around arguments to fnmatch that were passed to
>it from the command line and therefore fnmatch wasn't comparing the same
>thing (shell stripped some stuff). So, the above two things are only wrong
>in DJGPP.
>
>> Alex
>> P.S. all the details obtained from fnmatch()'s description in The
>> Single Unix Specification V3 2004 issue 6.
N.B. details from SUSV3 http://unix.org
"2.13.3 Patterns Used for Filename Expansion
The rules described so far in Patterns Matching a Single Character and
Patterns Matching Multiple Characters are qualified by the following
rules that apply when pattern matching notation is used for filename
expansion:
1. The slash character in a pathname shall be explicitly matched by
using one or more slashes in the pattern; it shall neither be matched
by the asterisk or question-mark special characters nor by a bracket
expression. Slashes in the pattern shall be identified before bracket
expressions; thus, a slash cannot be included in a pattern bracket
expression used for filename expansion. If a slash character is found
following an unescaped open square bracket character before a
corresponding closing square bracket is found, the open bracket shall
be treated as an ordinary character. For example, the pattern
"a[b/c]d" does not match such pathnames as abd or a/d. It only matches
a pathname of literally a[b/c]d.
2. If a filename begins with a period ( '.' ), the period shall be
explicitly matched by using a period as the first character of the
pattern or immediately following a slash character. The leading period
shall not be matched by:
* The asterisk or question-mark special characters
* A bracket expression containing a non-matching list, such as
"[!a]", a range expression, such as "[%-0]", or a character class
expression, such as "[[:punct:]]"
It is unspecified whether an explicit period in a bracket
expression matching list, such as "[.abc]", can match a leading period
in a filename."
"The flags argument shall modify the interpretation of pattern and
string. It is the bitwise-inclusive OR of zero or more of the flags
defined in <fnmatch.h>. If the FNM_PATHNAME flag is set in flags, then
a slash character ( '/' ) in string shall be explicitly matched by a
slash in pattern; it shall not be matched by either the asterisk or
question-mark special characters, nor by a bracket expression. If the
FNM_PATHNAME flag is not set, the slash character shall be treated as
an ordinary character.
If FNM_NOESCAPE is not set in flags, a backslash character ( '\' ) in
pattern followed by any other character shall match that second
character in string. In particular, "\\" shall match a backslash in
string. If FNM_NOESCAPE is set, a backslash character shall be treated
as an ordinary character.
If FNM_PERIOD is set in flags, then a leading period ( '.' ) in string
shall match a period in pattern; as described by rule 2 in the Shell
and Utilities volume of IEEE Std 1003.1-2001, Section 2.13.3, Patterns
Used for Filename Expansion where the location of "leading" is
indicated by the value of FNM_PATHNAME:
* If FNM_PATHNAME is set, a period is "leading" if it is the first
character in string or if it immediately follows a slash.
* If FNM_PATHNAME is not set, a period is "leading" only if it is the
first character of string.
If FNM_PERIOD is not set, then no special restrictions are placed on
matching a period."
>> P.P.S. of course there's a ton of what DJGPP's fnmatch() doesn't
>> support, but the above things are pretty basic and it would be nice
>> to have them handled properly, unless I overlook some major
>> DOS-related issue for which it would be desirable to deviate from the
>> spec.
>
>Another couple of examples revealing incorrect behavior of fnmatch() in
>DJGPP:
>fnmatch("*\a", "a", 0) must return 0, doesn't
>fnmatch("\[a]", "[a]", 0) must return 0, doesn't
>fnmatch("\[a]", "a", 0) must return 1, does
>
>So, as I understand it, the fnmatch() code flaws are:
>1. rangematch() doesn't allow for the following two patterns: []...] and
>[!]...] where ] is a valid char in the range
ISTM that the [- and [!- cases where - is treated as a literal
character aren't handled either.
>2. asterisk handling doesn't distinguish in the following the various
>options for c and what follows c:
> else if (isslash(c) && flags & FNM_PATHNAME)
> {
> if ((string = find_slash(string)) == NULL)
> return FNM_NOMATCH;
> break;
> }
>a) c=='/' // forward slash
>b) c=='\', (flags & FNM_NOESCAPE)!=0 // back slash
>c) c=='\', (flags & FNM_NOESCAPE)==0, isslash(pattern[1])==1 // any escaped
>slash
>3. '\\' handling is completely broken (wrong indices and logic). If escaping
>is on, it must fall through the case to default if '\\' is followed by
>anything from "\\?*[" to make sure those aren't interpreted as special chars
>again but instead are interpreted as ordinary chars. For all other chars
>(including '\0') it's better to break out from the case/switch to treat
>those chars as ordinary by the other existing cases.
Multiple slashes should also be treated the same as a single slash.
Leading period handling does not seem to be dealt with either, nor the
DOS _ equivalent: perhaps this should be handled similar to slash and
backslash, only treated the same as period when FNM_NOESCAPE and
FNM_PERIOD are specified.
--
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada
Brian DOT Inglis AT CSi DOT com (Brian[dot]Inglis{at}SystematicSW[dot]ab[dot]ca)
fake address use address above to reply
- Raw text -