From: "Charles Sandmann" Newsgroups: comp.os.msdos.djgpp Subject: Re: Reading directories, readdir/stat too slow Date: Mon, 4 Oct 1999 21:47:11 Organization: Aspen Technology, Inc. Lines: 46 Message-ID: <37f9205f.sandmann@clio.rice.edu> References: <37f307e1 DOT 967161774 AT news DOT xmission DOT com> NNTP-Posting-Host: dcloan.aco.aspentech.com X-NewsEditor: ED-1.5.8 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com > For the last little while, I've been avoiding findfirst/findnext when > I need to read directories, preferring the opendir functions instead > because they make certain things a lot simpler. > > Unfortunately, I find that opendir/readdir, when combined with a stat > call (to get file mode/size/etc) is a *lot* slower than findfirst, > taking on the order of 70-100 times longer to perform the same work. > When running against tens of thousands of files in hundreds of > directories, it is a significant problem. > > Is there any way to speed things up? Yes, but it requires replacing the opendir/readdir implementation in the standard library. I also have some code which is portable and was distressed by the slow speed of the libc code. (I saw a factor of around 50 slowdown). If you trace through the opendir(), readdir(), and stat() code you will find that a lot of the calls are made several times. My solution was to cache the information from the findfirst/findnext calls (which contains date/time and size) and then the stat() call immediately finds this information in the cache. My implementation also made no attempt to be as compatible as the current version, so the amount of code was much smaller. I also replaced dos-time to unixy time with a hack since this was a time sink too. At this point I don't know where the source is - only a V2.0 early-beta binary library which doesn't work with V2.01+. But I do remember that I sat down with an early libc implementation and yanked everything I didn't need, then cached anything that might be useful a few calls later or that called dpmi_int, and I got within about 30% of the speed of findfirst. But that still wasn't as fast as directory opens with FCBs (pre DOS 5), or direct disk access ... but that's another story. You have the source available - I strongly recommend going through it and writing your own implementation for your program. You may be very happy with what you end up with and learn a lot about the internals as a bonus. Do not, I repeat, DO NOT fall into the trap of coding for findfirst directly. Write portable (or at least as close as you can) and you will find your code is reusable, has a much longer life, and you will be more productive in the long run. The problem is much of the GNU code isn't portable - it incorrectly assumes unix and DJGPP tries to fix all the bad GNU code out there by adding crutches in the libc.