Date: Wed, 6 Oct 1999 16:15:49 +0200 (IST) From: Eli Zaretskii X-Sender: eliz AT is To: Charles Sandmann cc: djgpp AT delorie DOT com Subject: Re: Reading directories, readdir/stat too slow In-Reply-To: <9910051657.AA15777@clio.rice.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Reply-To: djgpp AT delorie DOT com X-Mailing-List: djgpp AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk On Tue, 5 Oct 1999, Charles Sandmann wrote: > > Won't this break if the file changes on disk between the call to > > readdir and the call to stat? > > I used a single file element cache, and invalidated it on any other > dpmi_int call. If you invalidate the cache on any __dpmi_int, I'm afraid you will not gain much. After all, I'd expect that a program that walks a directory actually *does* something with the files it finds, and most file-oriented operations call DOS through __dpmi_int. Heck, stat itself calls __dpmi_int quite a few times before it gets to calling findfirst (which is where the cache gets used). > I wasn't trying to protect against all the possible cases. But I did > need a 50X speedup. It worked on all code except unix ports that assumed > a unix file system (inodes, etc). Right. So these solutions are probably good for highly-specialized versions of stat that are closely coupled with specific needs of a single application. They cannot be implemented in libc.a. > A lot of the stuff which causes readdir/stat to be slow isn't needed > for the typical application I think this depends on the definition of ``the typical application''. My idea seems to be different from yours in this case ;-). Perhaps a DOS-based program which doesn't expect much from the filesystem information indeed doesn't need 90% of what stat generates, but programs of Unix origin do need most of it. I will try to add more bits to _djstat_flags, so that, for example, calls to mktime could be replaced with a simple computation that assumes GMT. Lamentably, most of the time in stat is spent calling findfirst (once you turn off the other heavy code with _djstat_flags), and that cannot be worked around in a general-purpose version. > - if you look at the minimal calls to do > readdir/stat for portable structure entries (no inodes, etc) you get > this from the single findfirst/findnext call - and thus almost identical > speed. One problem is that findfirst is about 10 times more expensive than findnext. Since stat cannot call the latter, you lose by a factor of 10 before you even begin optimizing. Let's face it: the Unix filesystem has one system call to read a directory, and another to get the file info, whereas DOS/Windows implement both in a single call. These two models are just too incompatible to allow portable programs to run at native speed.