To: djgpp-announce AT sun DOT soe DOT clarkson DOT edu Subject: stat()/fstat() for DJGPP, v.02 Date: Sun, 30 Oct 94 08:53:56 +0200 From: "Eli Zaretskii" I've uploaded to omnigate a corrected version of stat() and fstat() library functions for DJGPP. The first version had several bugs which would crash your machine in DPMI mode. Many thanks to Dieter Buerssner who corrected most of the bugs and provided enough information to find and fix the rest of them. Contents of the README file follows: Unix-compatible stat() and fstat() for DJGPP, version 02 ======================================================== This package includes new versions of stat() and fstat() functions for DJGPP, and several support functions, some of which have a useful purpose on their own. See the file MANUAL.DOC for explanation of the functions. I've also included the GNU ls program from the latest release 3.9 of GNU Fileutils, compiled and linked with my stat(). This can be used as a vehicle to test this implementation under various environments. Try `ls -li' if you want to see all the mode bits and inode numbers. Also included are two test programs, STAT.EXE and FSTAT.EXE, which are stat() and fstat() compiled with TEST #define'd. These accept a list of files and call stat() or fstat() on each of them, printing most of the fields of struct stat. FSTAT.EXE also prints the info for the 5 handles opened automatically by the start-up code, before calling fstat() on files named on the command-line. Try `fstat < file1 > file2 *.*' to see how fstat() gets correct info from redirected standard streams. All of the above were compiled with DJGPP 1.12maint2. If you want to use it with prior versions, you would have to recompile them. There is a makefile in each subdirectory which should make this very easy. However, note that library time routines in versions prior to 1.12maint1 will sometimes return incorrect time stamp, unless you define a complete POSIX TZ string in the environment The functions were tested and worked OK under MS-DOS 3.3, 4.01, 5.0 and 6.20 (with and without DoubleSpace), with networked drives under XFS 1.76, Tsoft NFS 0.24Beta and Novell Netware 3.22, and with CD-ROM drives under MSCDEX 2.23. Also tested in Windows 3.1 DOS Box and under Quarterdeck's QDPMI DPMI server with DOS 5.0 and 6.20. A pre-release of this second version was tested under Novell DOS 7.0 (many thanks to Dieter Buerssner, , for this, and for several bugs he found and suggested solutions for). Feel free to use the functions. If you find time to test them in environments other than those listed above, I'd appreciate if you tell me how did they perform. If they ever fail for you in any set-up, *please* let me know so I could fix them. I'm especially interested in DOS clones and emulations such as DR-DOS, Novell DOS, DV/X, OS/2 DOS Box and the like. This implementation relies heavily on undocumented DOS features, some of which are guaranteed to fail under some of these clones, but I've built in fall-back solutions which should still make the functions work at least as good as any other implementation. However, I couldn't test these fully, as I don't have access to every possible environment. The files stat.c and fstat.c, which do most of the actual work, are heavily commented, as they use many obscure or undocumented DOS features. If anyone is interested enough to go through the source and will have comments on it, I will be glad to hear from you. This implementation has an explicit goal to minimize the porting pains of Unix-born programs to DOS by being as Unix-compatible as possible. It has the following features not usually present in DOS-based C libraries: 1. stat() doesn't fail for root directories. 2. Both stat() and fstat() return the starting cluster number of the file as its inode number. If that number is unavailable, it is ``invented'' using the same name hashing technique as coded by Eric Backus in the existing DJGPP library. This inode invention is necessary for files on networked drives and for empty files on local drives. 3. Both functions set mode bits for all three groups (user, group, other). Only user gets WRITE access to a file (unless it has Read-only, Hidden or System bit set). 4. stat() sets EXECUTE bit for directories, as it happens under Unix, and sets their WRITE access bit for the ``user'' group unless they have one of Read-only, System or Hidden bit set. 5. Directory size is not reported zero by stat(); the number of used directory entries (sans the ``.'' and ``..'' pseudo-entries) multiplied by entry size is returned instead for those directories which are reported zero size by DOS (some network redirectors do bring valid size for a directory). 6. fstat() correctly reports access mode bits and device code (st_dev). 7. Both functions report EXECUTE bits based on file's extension and the two-byte magic number present at the beginning of the file. 8. Character devices (such as CON, LPT1, AUX and others) are treated by both functions as if they were on a special drive called "@:" (st_dev = -1). The ``character special'' bit is set for these devices. 9. stat() assumes "d:" (where `d' is a drive letter) to mean "d:.", i.e., the CURRENT directory on that drive, and thus doesn't fail for this argument. 10. stat() accepts pathnames with redundant trailing slashes. The functions in their current version are known to have these (hopefully, minor) deficiencies: 1. fstat() can't get some of the info under Novell Netware version 3.x and below. Specifically, the WRITE access bit, drive letter, and the file's name and extension are unavailable by file handle alone. Until somebody tells me how to get at this info, this problem will cause mode bits returned for these files to be imprecise, and will return different inode number each time you fstat() the same file. Novell 4.x is said to use DOS Network Redirector interface, which doesn't take over DOS so completely as Novell Shell, so the above should not apply for Novell 4.x (but I didn't have an opportunity to test it yet). 2. Real cluster numbers are 16-bit unsigned numbers, but st_ino is declared short in DJGPP's as of 1.12. In this environment, invented inode numbers start from 65535 and get decremented for each new invented number. This could result in an invented number which is identical to a cluster number for a real file. Every disk I've seen has at least 80 cluster numbers near 64K which are never assigned to clusters (i.e., the highest actual cluster number is around 65450), but that might not be enough. Note that this could only be a problem for an empty vs. a non-empty file on a local drive, because otherwise st_dev will be different for such two files. Hopefully, in some future DJGPP version, the st_ino field of struct stat will be widened to 32 bit, and this problem will go away, as inode numbers will then start at 65536 and go up. 3. I don't know how to obtain time fields for root directories, so they are set to the beginning of the DOS times (1-Jan-80). Suggestions, anyone? 4. The technique of inventing inode numbers causes different inode for the same file to be seen by different programs, and could also assign a file a different inode for different runs of the same program. This is unpleasant, but I don't know if this could be repaired easily. 5. fstat() can at its best only see the filename part without the full directory path. For files which are empty, or belong to (non-Novell) networked drives, this means there is a slight possibility that two files with the same name in different directories will get the same inode number. As of this implementation of fstat(), this can only happen if the handle belonging to one of the two files is closed after it was fstat()'ed, and then an open() call for the second file reuses the same entry in the System File Table as was used by the first file. As long as both files are open, this will NEVER happen. It seems to me improbable that a program will be interested in a file after it was close()'d, but still, this could be a cause of obscure bugs in ported programs. I know only 2 alternatives which avoid this altogether: either invent a new inode on every call to fstat() with the consequences described in (1) above, or trap every call to open() and close() to maintain a table of open files with their full names (this would require to trap every possible way of opening a file, e.g. with direct calls to DPMI server). Both alternatives look less attractive to me, but I would like to hear your comments. Eli Zaretskii