delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1999/09/30/18:56:21

From: skb AT xmission DOT removethis DOT com (Scott Brown)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Reading directories, readdir/stat too slow
Date: Thu, 30 Sep 1999 22:34:47 GMT
Organization: (none)
Lines: 56
Message-ID: <37f3c4e1.1015552570@news.xmission.com>
References: <37f307e1 DOT 967161774 AT news DOT xmission DOT com> <Pine DOT SUN DOT 3 DOT 91 DOT 990930160802 DOT 21365J-100000 AT is>
NNTP-Posting-Host: slc1140.modem.xmission.com
X-Trace: news.xmission.com 938730851 1266 166.70.8.124 (30 Sep 1999 22:34:11 GMT)
X-Complaints-To: abuse AT xmission DOT com
NNTP-Posting-Date: 30 Sep 1999 22:34:11 GMT
X-Newsreader: Forte Free Agent 1.11/32.235
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

On Thu, 30 Sep 1999 16:17:21 +0200, Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
wrote:

>
>On Thu, 30 Sep 1999, Scott Brown wrote:
>
>> Unfortunately, I find that opendir/readdir, when combined with a stat
>> call (to get file mode/size/etc) is a *lot* slower than findfirst,
>> taking on the order of 70-100 times longer to perform the same work.
>> When running against tens of thousands of files in hundreds of
>> directories, it is a significant problem.
>
>stat is expensive (it isn't easy to get all that info on DOS; you won't 
>believe how closely DOS guards some of its dirty secrets ;-).  But 100 
>times slower seems to be too much; I suspect your system is not set up in 
>an optimal way.  See section 3.9 of the FAQ; in particular, make sure you 
>have a disk cache installed.

Well, I'm running under Windows 95, which has a built-in disk cache,
and my hardware is plenty powerful for this kind of work (K6-200
w/64Mb).  Statistics from sysmon are limited, but it shows that my
disk cache hasn't dropped below 2.5Mb in the last little while.

>A 10-fold slow-down when using stat is 
>something I would expect, but not 100-fold.

It seemed pretty ridiculous to me as well.  I put together a simple
timed test and ran it against a set of about 30,000 files in 350
directories; I ran each test twice and took the second result, to give
the OS a fair chance to load the cache.

The findfirst test finished in 1.98 seconds, while the readdir/stat
test finished in a whopping 133.96 seconds.  I believe my test code is
fair, but second opinions are welcome; grab a copy here:
ftp://ftp.xmission.com/pub/users/s/skb/pub/dirtest.c

>Read the docs ;-).

That always helps...

>No, seriously: the documentation of _djstat_flags in libc.info describes 
>several flags that can be set to disable computing some expensive 
>members of struct stat for which you don't have any use.  For the 
>fastest operation, you should disable all features but those which your 
>application needs.  Doing so is known to speed up stat tremendously.

I only need the mode, and the size and timestamp for files.  After I
R'd TFM, I tried setting *all* of the _STAT... bits, but the results
were disappointing; certainly not what I'd term a "trememdous"
improvement.  Instead of taking 133.96 seconds to finish, the test
took only 125.44 seconds.

My test program is fairly representative of the kind of directory
traversal code I use in my applications.  Could there be something
else that I am overlooking?

- Raw text -


  webmaster     delorie software   privacy  
  Copyright 2019   by DJ Delorie     Updated Jul 2019