delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1999/10/04/23:56:59

From: "Charles Sandmann" <sandmann AT clio DOT rice DOT edu>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Reading directories, readdir/stat too slow
Date: Mon, 4 Oct 1999 21:47:11
Organization: Aspen Technology, Inc.
Lines: 46
Message-ID: <37f9205f.sandmann@clio.rice.edu>
References: <37f307e1 DOT 967161774 AT news DOT xmission DOT com>
NNTP-Posting-Host: dcloan.aco.aspentech.com
X-NewsEditor: ED-1.5.8
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

> For the last little while, I've been avoiding findfirst/findnext when
> I need to read directories, preferring the opendir functions instead
> because they make certain things a lot simpler.
> 
> Unfortunately, I find that opendir/readdir, when combined with a stat
> call (to get file mode/size/etc) is a *lot* slower than findfirst,
> taking on the order of 70-100 times longer to perform the same work.
> When running against tens of thousands of files in hundreds of
> directories, it is a significant problem.
> 
> Is there any way to speed things up?

Yes, but it requires replacing the opendir/readdir implementation in
the standard library.  I also have some code which is portable and was
distressed by the slow speed of the libc code.  (I saw a factor of
around 50 slowdown).  If you trace through the opendir(), readdir(), and 
stat() code you will find that a lot of the calls are made several times.

My solution was to cache the information from the findfirst/findnext calls
(which contains date/time and size) and then the stat() call immediately
finds this information in the cache.  My implementation also made no 
attempt to be as compatible as the current version, so the amount of code
was much smaller.  I also replaced dos-time to unixy time with a hack
since this was a time sink too.

At this point I don't know where the source is - only a V2.0 early-beta
binary library which doesn't work with V2.01+.  But I do remember that I
sat down with an early libc implementation and yanked everything I didn't
need, then cached anything that might be useful a few calls later or that
called dpmi_int, and I got within about 30% of the speed of findfirst.

But that still wasn't as fast as directory opens with FCBs (pre DOS 5), or
direct disk access ... but that's another story.

You have the source available - I strongly recommend going through it and
writing your own implementation for your program.  You may be very happy
with what you end up with and learn a lot about the internals as a bonus.

Do not, I repeat, DO NOT fall into the trap of coding for findfirst directly.
Write portable (or at least as close as you can) and you will find your
code is reusable, has a much longer life, and you will be more productive
in the long run.

The problem is much of the GNU code isn't portable - it incorrectly assumes
unix and DJGPP tries to fix all the bad GNU code out there by adding 
crutches in the libc.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019