delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/2001/10/14/14:06:35

Date: Sun, 14 Oct 2001 20:02:49 +0200
From: "Eli Zaretskii" <eliz AT is DOT elta DOT co DOT il>
Sender: halo1 AT zahav DOT net DOT il
To: sandmann AT clio DOT rice DOT edu
Message-Id: <7263-Sun14Oct2001200248+0200-eliz@is.elta.co.il>
X-Mailer: Emacs 20.6 (via feedmail 8.3.emacs20_6 I) and Blat ver 1.8.9
CC: djgpp-workers AT delorie DOT com
In-reply-to: <10110140648.AA14621@clio.rice.edu> (sandmann@clio.rice.edu)
Subject: Re: W2K/XP fncase
References: <10110140648 DOT AA14621 AT clio DOT rice DOT edu>
Reply-To: djgpp-workers AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: djgpp-workers AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com

> From: sandmann AT clio DOT rice DOT edu (Charles Sandmann)
> Date: Sun, 14 Oct 2001 01:48:28 -0500 (CDT)
> 
> Item 2.  fncase=n handling is default with DJGPP.  This behavior is very
>          poorly defined on Windows 9x systems since it relies on an 
>          interrupt than no one can completely explain the behavior.  

Actually, the intended behavior is well defined, and does not rely on
the interrupt.  It just uses the interrupt as a means to achieve a
certain goal.  Sorry, I thought that the goal was clear; let me
explain.

What we really want is to be case-preserving and case-sensitive.
Plain and simple.  The reason for this is that most of our software
comes from Unix, and thus is case-sensitive on the application level
(i.e. the application code assumes case-sensitive filesystem, compares
file names with strcmp, etc.).

But we cannot be 100% case-sensitive, because that breaks on non-LFN
platforms, where all file names come in UPPER case.  You cannot even
run a simplest Makefile there without downcasing file names, because
Make looks for foo.c, but all it sees is FOO.C.

Similar problems happen with files on Windows which were copied from
DOS: in its infinite wisdom, Windows doesn't create an LFN directory
entry for such files, but instead copies only the UPPER-case DOS entry
and that's it.

So we relax the requirement a bit by downcasing file names ``where it
makes sense''.  And there be dragons...

(We also provide the FNCASE=y option for those who want to be closer
to Unix by being strictly case-sensitive, and don't mind the
responsibility for downcasing any DOS file names manually.)

Now, there are all kinds of subtle cases and subcases here, and as
usual, I don't remember all of them, after all these years; sorry.

One thing I _do_ remember is that we cannot simply always downcase DOS
names, because then you cannot say "cp FAQ /foo/bar/" and get an
upper-case `FAQ' file in the target directory.  That would be a
misfeature.  Or imagine a Makefile which uses $(wildcard *.S) or some
such: if we downcase the .S extension, GCC will do the wrong thing.

The best way to deal with this would be to find out whether the file
in question has an LFN entry in its directory (in addition to the 8+3
entry, which every file has), and only downcase the name if it
doesn't.  Alas, Windows doesn't provide any way to get that info, even
on FAT filesystems, where the 8+3 alias is actually stored in the
directory, let alone on NTFS.  And even if there were a way to do
that, it's not enough: we need to DTRT with files which don't yet
exist, as in "gcc -o foo.S"!

>          Predicting before hand if a file you see will be converted to
>          lower case is hit and miss.

Actually, on Windows 9X, it's very easy to predict whether the file
will be downcased or not; in that respect, _lfn_gen_short_fname does
its job quite well on W9X.  It's W2K and XP that introduced the
problem.

> I am trying to make the case that we should remove usage of 
> _lfn_gen_short_name from the 7 places in our libc and replace it with
> simpler code

Out of these 7 places, one (lstat.c) uses _lfn_gen_short_name for a
different purpose: to know how many bytes a directory entry takes (to
fake a ``size'' of a directory).  This place needs to be discussed
separately.

It is also possible that `glob' needs a slightly different handling,
since it works on file names or parts thereof typed by the user, in
addition to what findfirst/findnext return.  _fixpath also works on
names which could partially come from the user.  readdir and getcwd
only work on file names which the OS returns.

> > > For example, if lfn=n we should always lower case
> > > the names (a very simple test) instead of needing to generate a 
> > > string we strcmp with, throw away and then duplicate this behavior.
> > 
> > That would preclude a possibility to see file names on DOS in their
> > original UPPER case; for example, try "djecho [A-Z]*" on plain DOS.
> > IIRC, some package (Groff?) depends on that for its build procedure.
> 
> You would still have fncase and friends in the library to control
> it, but if lfn=n you don't need to create a string and compare it just
> to decide that yes you need to lower case the characters.  

My point was that we cannot just decide to always downcase on DOS; see
the details above.  I agree that we don't need to create a string and
compare it, but any other solution should not downcase unconditionally.

> > This issue is full of hidden gotchas and unintended consequences,
> > because Microsoft's implementation of case-preservation is
> > semi-broken, haphazard, and sometimes downright nonsensical.  
> 
> So why should we base fncase=n behavior on it?

By ``this issue'' I meant the whole question of whether and when to
downcase.  I did not mean function 71A8h on which the solution was
based: that one works quite reliably on Windows 9X.  The semi-broken
handling of letter-case in file names on Windows is what makes our
decisions tricky and prone to unintended consequences.

> I'm looking for a low risk way to make Win2K and XP behave the same
> as Win9x family products.

Same here.

> If there is no way, then maybe fncase=y should be default if lfn is
> enabled for all platforms.

You mean, including Windows 9X?  We could consider making this change
now, but how do we check it won't get users in trouble?  Windows 9X
and ME are still the most popular systems.

> > So maybe the code I wrote is wasteful.  I understand that it might
> > bug you to see a function which issues an RM interrupt, and whose
> > output is used inefficiently, or even not used at all.  But it works;
> > it was proven by two years of intensive use; and it certainly isn't a
> > bottleneck in any real-life application.
> 
> You're taking this part too personally, stop it ! :-(  This should not
> be any personal criticism of the code, or the motives, just a 
> retrospective to say that we are in a mess with now and need to get
> out of it.

I wasn't being personal (took a long, deep breath before writing that ;-), 
it's just that you mentioned the performance issue several times, in a
way that seemed to indicate that it's an important factor in this
discussion.  If this isn't important, then let's simply put this
aspect aside for a moment, okay?  After all, if the function were
working on W2K/XP, we wouldn't even consider rewriting this code, right?

> > Therefore, my suggestion is: let's make a local change in
> > _lfn_gen_short_name so that it calls 71A8h with DH=1 on W2K and XP.
> > (We should see that this doesn't break NT with the LFN TSR.)  The file
> > names which come bogus as the result are very rare, and when they do
> > happen all that we'll see is that the file name is not downcased when
> > it should have been--not a big deal IMHO.
> 
> DH=1 breaks for essentially any non-Alpha character, so is a very
> poor effort at fixing anything.

I agree; I wasn't aware of that when I wrote that suggestion.

> We have these discussions so that I may learn - more about the pain and 
> scars in dealing with these issues in the past.  If I had all the answers
> I'd have already cvs commit'ed it.  If you are not comfortable with what
> I do here, I'm sure I will be missing something that will come back and
> haunt everyone in the future.

I'm not comfortable because I'm afraid to break things.  The
letter-case handling is very fragile, so the more localized the change
and more predictable its effect, the less risk we run.  Changes that
touch all platforms and modify the resulting string (the one returned
by _lfn_gen_short_fname) in non-trivial ways is something whose
effects I have no way of knowing in advance.

> At this point I'm probably most interested in how fncase=n should behave
> on lfn systems.

I hope I helped to understand that: in a nutshell, we want to be as
case-sensitive as possible.

> Should leading periods prevent downcase?  Should special
> characters (such as +, space) prevent downcase?

Yes and yes, since these names cannot be DOS names.

> Any 8-bit chars prevent downcase?

Not necessarily: DOS also allows 8-bit characters in file names.

> Should it only be for 8.3 type filenames?

Ideally, we should only downcase names of files which don't have an
LFN entry.  But I'm afraid that this criterion doesn't have a
practical solution.

Failing that, any file name that is valid on DOS should be downcased,
unless we have a clear hint from the user that she doesn't want that,
for example when we know that the upper-case part was actually typed
by the user, not came from the OS.

> What percentage of files on a disk being different via ls -R in case with
> different algorithms would be acceptable?

I cannot answer this question.  I do know that with the current code
from djlsr203.zip, the number of cases where I had to deal with this
issue manually was very small, even though I routinely shuffle files
between plain DOS and Windows 9X.  In those cases where I had a
problem, setting FNCASE=y for the duration of the offending command
solved it.

Hope this helps, and thanks again for working on this.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019