Date: Thu, 22 Feb 1996 08:48:26 +0200 (IST)
From: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
To: "Michael A. Phelps" <morphine AT blaze DOT cs DOT jhu DOT edu>
Cc: "'DJGPP'" <djgpp AT delorie DOT com>
Subject: Re: Increased file reading times with number of files
In-Reply-To: <01BB0073.925B0BE0@hf2rules.res.jhu.edu.res.jhu.edu>
Message-Id: <Pine.SUN.3.91.960222082953.318F-100000@is>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII


On Wed, 21 Feb 1996, Michael A. Phelps wrote:

> longer at the end than at the beginning.  To test this, I wrote a simple 
> test program that creates 676 dummy files with one line of text in them, 
> and then reads them back in, and records the time required to read each 
> 60 files.  As I had suspected, the time increases for files toward the 
> end of the directory.  Curiously, however, when I disabled to portion of 
> the code that does the actual reading, and merely timed the 
> findfirst()/findnext() routine, there was no change in time required.  
> This makes me feel as though that fopen() may have to perform a 
> sequential search through the directory.  Is this true?  Is there anyway 
> to get around this?  I am running Windows 95 (although DOS 6.00 provided 
> similar results), and compiled the program using the -O4 switch from 

Yes, that's true.  To open a file, DOS reads every directory from its
pathname, beginning at the root, until it finds the directory entry of the
next subdirectory from the pathname, then opens and reads that
subdirectory, etc., until it arrives at the file.  How else could it find
the file otherwise, given that the starting cluster of every
file/directory is found in its parent directory?  DOS has no way of
knowing that you mean to open a bunch of files in the same directory, has
it?  This is a basic feature of FAT disk structure, which Win95 cannot do
anything to remedy.

And 676 directory entries take up 676*32 = 21632 bytes, which means that 
the directory is split between 2 and 11 clusters (depending on your disk 
size), so the poor thing has also to read the FAT from time to time.

On the other hand, finfirst/findnext with a wildcard just reads the 
entries of the same directory sequentially, without rereading the entire 
thing every time, so it should be much faster.

Using a large disk cache might help in your situation a bit, though.