Date: Sun, 14 Oct 2001 20:02:49 +0200 From: "Eli Zaretskii" Sender: halo1 AT zahav DOT net DOT il To: sandmann AT clio DOT rice DOT edu Message-Id: <7263-Sun14Oct2001200248+0200-eliz@is.elta.co.il> X-Mailer: Emacs 20.6 (via feedmail 8.3.emacs20_6 I) and Blat ver 1.8.9 CC: djgpp-workers AT delorie DOT com In-reply-to: <10110140648.AA14621@clio.rice.edu> (sandmann@clio.rice.edu) Subject: Re: W2K/XP fncase References: <10110140648 DOT AA14621 AT clio DOT rice DOT edu> Reply-To: djgpp-workers AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: djgpp-workers AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk > From: sandmann AT clio DOT rice DOT edu (Charles Sandmann) > Date: Sun, 14 Oct 2001 01:48:28 -0500 (CDT) > > Item 2. fncase=n handling is default with DJGPP. This behavior is very > poorly defined on Windows 9x systems since it relies on an > interrupt than no one can completely explain the behavior. Actually, the intended behavior is well defined, and does not rely on the interrupt. It just uses the interrupt as a means to achieve a certain goal. Sorry, I thought that the goal was clear; let me explain. What we really want is to be case-preserving and case-sensitive. Plain and simple. The reason for this is that most of our software comes from Unix, and thus is case-sensitive on the application level (i.e. the application code assumes case-sensitive filesystem, compares file names with strcmp, etc.). But we cannot be 100% case-sensitive, because that breaks on non-LFN platforms, where all file names come in UPPER case. You cannot even run a simplest Makefile there without downcasing file names, because Make looks for foo.c, but all it sees is FOO.C. Similar problems happen with files on Windows which were copied from DOS: in its infinite wisdom, Windows doesn't create an LFN directory entry for such files, but instead copies only the UPPER-case DOS entry and that's it. So we relax the requirement a bit by downcasing file names ``where it makes sense''. And there be dragons... (We also provide the FNCASE=y option for those who want to be closer to Unix by being strictly case-sensitive, and don't mind the responsibility for downcasing any DOS file names manually.) Now, there are all kinds of subtle cases and subcases here, and as usual, I don't remember all of them, after all these years; sorry. One thing I _do_ remember is that we cannot simply always downcase DOS names, because then you cannot say "cp FAQ /foo/bar/" and get an upper-case `FAQ' file in the target directory. That would be a misfeature. Or imagine a Makefile which uses $(wildcard *.S) or some such: if we downcase the .S extension, GCC will do the wrong thing. The best way to deal with this would be to find out whether the file in question has an LFN entry in its directory (in addition to the 8+3 entry, which every file has), and only downcase the name if it doesn't. Alas, Windows doesn't provide any way to get that info, even on FAT filesystems, where the 8+3 alias is actually stored in the directory, let alone on NTFS. And even if there were a way to do that, it's not enough: we need to DTRT with files which don't yet exist, as in "gcc -o foo.S"! > Predicting before hand if a file you see will be converted to > lower case is hit and miss. Actually, on Windows 9X, it's very easy to predict whether the file will be downcased or not; in that respect, _lfn_gen_short_fname does its job quite well on W9X. It's W2K and XP that introduced the problem. > I am trying to make the case that we should remove usage of > _lfn_gen_short_name from the 7 places in our libc and replace it with > simpler code Out of these 7 places, one (lstat.c) uses _lfn_gen_short_name for a different purpose: to know how many bytes a directory entry takes (to fake a ``size'' of a directory). This place needs to be discussed separately. It is also possible that `glob' needs a slightly different handling, since it works on file names or parts thereof typed by the user, in addition to what findfirst/findnext return. _fixpath also works on names which could partially come from the user. readdir and getcwd only work on file names which the OS returns. > > > For example, if lfn=n we should always lower case > > > the names (a very simple test) instead of needing to generate a > > > string we strcmp with, throw away and then duplicate this behavior. > > > > That would preclude a possibility to see file names on DOS in their > > original UPPER case; for example, try "djecho [A-Z]*" on plain DOS. > > IIRC, some package (Groff?) depends on that for its build procedure. > > You would still have fncase and friends in the library to control > it, but if lfn=n you don't need to create a string and compare it just > to decide that yes you need to lower case the characters. My point was that we cannot just decide to always downcase on DOS; see the details above. I agree that we don't need to create a string and compare it, but any other solution should not downcase unconditionally. > > This issue is full of hidden gotchas and unintended consequences, > > because Microsoft's implementation of case-preservation is > > semi-broken, haphazard, and sometimes downright nonsensical. > > So why should we base fncase=n behavior on it? By ``this issue'' I meant the whole question of whether and when to downcase. I did not mean function 71A8h on which the solution was based: that one works quite reliably on Windows 9X. The semi-broken handling of letter-case in file names on Windows is what makes our decisions tricky and prone to unintended consequences. > I'm looking for a low risk way to make Win2K and XP behave the same > as Win9x family products. Same here. > If there is no way, then maybe fncase=y should be default if lfn is > enabled for all platforms. You mean, including Windows 9X? We could consider making this change now, but how do we check it won't get users in trouble? Windows 9X and ME are still the most popular systems. > > So maybe the code I wrote is wasteful. I understand that it might > > bug you to see a function which issues an RM interrupt, and whose > > output is used inefficiently, or even not used at all. But it works; > > it was proven by two years of intensive use; and it certainly isn't a > > bottleneck in any real-life application. > > You're taking this part too personally, stop it ! :-( This should not > be any personal criticism of the code, or the motives, just a > retrospective to say that we are in a mess with now and need to get > out of it. I wasn't being personal (took a long, deep breath before writing that ;-), it's just that you mentioned the performance issue several times, in a way that seemed to indicate that it's an important factor in this discussion. If this isn't important, then let's simply put this aspect aside for a moment, okay? After all, if the function were working on W2K/XP, we wouldn't even consider rewriting this code, right? > > Therefore, my suggestion is: let's make a local change in > > _lfn_gen_short_name so that it calls 71A8h with DH=1 on W2K and XP. > > (We should see that this doesn't break NT with the LFN TSR.) The file > > names which come bogus as the result are very rare, and when they do > > happen all that we'll see is that the file name is not downcased when > > it should have been--not a big deal IMHO. > > DH=1 breaks for essentially any non-Alpha character, so is a very > poor effort at fixing anything. I agree; I wasn't aware of that when I wrote that suggestion. > We have these discussions so that I may learn - more about the pain and > scars in dealing with these issues in the past. If I had all the answers > I'd have already cvs commit'ed it. If you are not comfortable with what > I do here, I'm sure I will be missing something that will come back and > haunt everyone in the future. I'm not comfortable because I'm afraid to break things. The letter-case handling is very fragile, so the more localized the change and more predictable its effect, the less risk we run. Changes that touch all platforms and modify the resulting string (the one returned by _lfn_gen_short_fname) in non-trivial ways is something whose effects I have no way of knowing in advance. > At this point I'm probably most interested in how fncase=n should behave > on lfn systems. I hope I helped to understand that: in a nutshell, we want to be as case-sensitive as possible. > Should leading periods prevent downcase? Should special > characters (such as +, space) prevent downcase? Yes and yes, since these names cannot be DOS names. > Any 8-bit chars prevent downcase? Not necessarily: DOS also allows 8-bit characters in file names. > Should it only be for 8.3 type filenames? Ideally, we should only downcase names of files which don't have an LFN entry. But I'm afraid that this criterion doesn't have a practical solution. Failing that, any file name that is valid on DOS should be downcased, unless we have a clear hint from the user that she doesn't want that, for example when we know that the upper-case part was actually typed by the user, not came from the OS. > What percentage of files on a disk being different via ls -R in case with > different algorithms would be acceptable? I cannot answer this question. I do know that with the current code from djlsr203.zip, the number of cases where I had to deal with this issue manually was very small, even though I routinely shuffle files between plain DOS and Windows 9X. In those cases where I had a problem, setting FNCASE=y for the duration of the offending command solved it. Hope this helps, and thanks again for working on this.