Date: Mon, 27 Jan 1997 15:31:15 -0500 (EST) From: "Mike A. Harris" Reply-To: "Mike A. Harris" To: Chip Turner cc: opendos AT mail DOT tacoma DOT net Subject: Re: [opendos] Re: OpenDOS to be released next week! In-Reply-To: <3.0.32.19970122151541.006a1a10@ctrvax.vanderbilt.edu> Message-ID: Organization: Your mom. MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-opendos AT mail DOT tacoma DOT net Precedence: bulk On Wed, 22 Jan 1997, Chip Turner wrote: > > >On Wed, 22 Jan 1997, Mark Habersack wrote: > > >32k clusters!! Ouch!! I wrote a program to analyze disk wastage on a > >"cluster size" basis. With 32k clusters, and the disk being FULL of > >files, you'll find that you have 30% disk wastage. Yes, thats right, > >I said *THIRTY* percent. This means a 1 gig disk that is full is > >wasting 300 megs. Scary eh? > > Okay, the accuracy gland in the back of my head has started pumping out > that icky fluid, so I'm forced to reply and maybe clear up a few things for > the audience. Before I read the rest of your post let me say one additional thing: The figures I quote are based on the disk having many files on it. Each file wastes 1/2 of a cluster on average. If you multiply 1/2 of a cluster times the number of files on the disk, you'll get an amazingly close approximation of cluster wastage. This is usually VERY accurate. > The amount of disk space wasted is more dependent on the number of files > than just the storage needs of those files. This is because each file > takes up Floor(s/n)+1 clusters, where s is the size of the file in bytes > and n is the size of the cluster in bytes (the floor function takes and > number to the largest integer less than or equal to it, so Floor(1.765)==1, > Floot(2)==2, and Floor(-1.5)==-2). So a 1 byte file on a system that has > 32k clusters takes up 1 cluster or 32k, the exact same as a 29k file would. > The remainder of the cluster is packed with garbage (ie, whatever was in > the cluster before this file was put into it). Agreed. Every file on the system takes at least one cluster, so each file will waste at least part of a cluster (1/2 on average as stated above). The smaller the cluster size is, the less the wastage when there is a large number of files (as is usual on todays computer systems). > The FAT file system is a table-based file system, which basically means > there is one large table that keeps track of pointers for the clusters in a > file. The FAT basically is a set of pairs (p,c) organized in one big chunk > on disk. p stands for a pointer in the FAT to the next cluster of a file > and c stands for the pointer to the actual cluster that the data is located > in. So basically we have a bunch of linked lists in the FAT whose data > members are pointers to clusters on disk. This means there's one spot on > your disk where a crazy disk write ends up ruining your entire allocation > scheme, so a second copy of the FAT is kept (physically near the 1st copy, > I believe, but don't quote me). There is one entry in the FAT for every The FAT's are stored one after the other, although this may or may not be possible to change with proprietary software. [snip] > Unix i-node file systems work on a different system; instead of being based > on the idea of a table of linked lists, they work in terms of indexes. [Unix filesystem snip] Yes, the UNIX filesystems are MUCH more efficient, both in terms of speed, and size. My Linux partitions waste virtually no space at all. Also, they are much more reliable. OS/2's HPFS is also a very good file system. I believe that the second extended filesystem is based on HPFS. I also believe that NTFS is in the same circle. > FAT is more efficient space-wise if you have bunches of huge files; inodes > are more efficient if you like speed for big file accesses and don't want > to waste as much space at the end of clusters. It's incorrect to say that > every full 1 gig hard drive is wasting 30%; you must take into account the > sizes of the individual files. This is very true, however I base my results upon USUAL cases. In *ALL* drives that I've studied, I've found this wastage to be very consistent. This is a real world situation where most users don't know or care the technicalities of their filesystems and a 1gig C: partition is much more appealing than 12 drive letters. Also most drives come allready formatted to their maximum capacity with only one partition. You will find very few users will have a 1 gig drive with only files bigger than 5 megs residing on it. The AVERAGE file size on any given computer is less than 40k. This can be easily checked with a simple computer program. It is also this fact which is a major player in all the unix filesystems. They assume that most files are less than 40k. That way most files on the system can be directly accessed without inode indirection. (Files over 40k use indirect access). > >waste no more than 3% on average. I've done extensive testing on a > >great many computers to get these results too. > > Yes, in general, it is a Nasty Problem, but not always. Also, it is > incorrect to say that 'on average' no more than 3% is wasted unless this is > a result of some kind of analysis; otherwise you should say 'I'm xx.xx% It *IS* a result of some very in depth analysis. I've tested over 200 drives. (read that partitions) Whenever I'm at a new computer I run my test program on all of it's drives and generate a report file. Most users have their drives formatted to max capacity. Those that have partitioned drives, were either pre-partitioned, or they did it for some other reason. I find that very few users know anything about the wastage of space inherent in the FAT file system which increases with disk size. > certain that on average no more than 3% is wasted' because statistics are > easy to manipulate and can lead to false conclusions even if that wasn't > the intention (and I'm sure that wasn't your intention; this post is by no > means a flame). Nor is it taken as a flame. The way that I take your post is that you agree with me, but feel that I could have given more information, and you wanted to correct any possible ambiguity. That is fine with me. I've also tried to be more specific in this post. > > >At the end of the listing you will see: > > Heh at the end of my listing I get an out of memory error... 9,000 files > on a 511meg partition is apparently just too much for 4DOS to handle. =( Hmmm. I don't have that problem when I run it. How much memory do you have? I've got 11356 files on one drive (345M). It would be strange to get an error like that. What version of 4DOS are you using? I've used older Norton Commanders and Xtree's and they balk out on drives bigger than 64M or so. Perhaps this is the problem. > Never fear, one of the utilities that came with an old version of Norton > Utilities (fs for those who are interested) tells me roughly 13% is slack > with 8k clusters. Hmm... only 7% slack under the Win95 tree [and admitting > to running Windows 95 is sure to get me flamed on THIS list... ;-) ] 13% on a 511M partition? That is in line with my testing. No flames for '95. Everyone has their own computing needs, even if I don't agree with them. :o) > >Only 1 meg wasted. Less than 1%!! > > On one of my other partitions (880meg) I have 1% slack, too (16k clusters). > Of course, that 600 meg file helps... ;-) I once had everything in 4k Yes, I was going to say, "You must have a bunch of VERY large files!". Again, this is just the *EXCEPTION*. Most cases will tie closely to my figures. Sure, it is possible for someone to get figures that are different if they want to just to try and prove me wrong, but my intention is to inform others of the space problems with the FAT file system, not to get into a silly flame war. (I know that is not your intention too, just posting that to save us from getting flame mail). > clusters, but when I added a 2 gig drive, it became annoying to have 12 > drive letters, so I did a bit of consolidation. Yes, that is a big problem. I personally have drive letters to R: (a lot of subst'd drives) My actual partitions go to I:. I've got some 4k cluster partitions, some 2k, and 1 8k. When I repartition, I'm doing ALL 4k clusters. The solution to the drive letter problem is using the DOS JOIN command. This is semi-equivalent to the MOUNT command in other OS's. Unfortunately it has been dropped from DOS for a while. Anyone have the source? Caldera? > On a related note, I've noticed some, ah, very prominent Anti-Microsoft > sentiments among some members of the list. Admittedly, MS has had some > very questionable (ie, illegal) business practices, and their proprietary > OS's cause stagnation in innovation and limit 3rd parties from competing, > but often times some things get attributes to MS and Bill that aren't > really their fault. Again, that is the exception to the rule. :o) > For instance, the 640k barrier. Technical details omitted, suffice it to > say that the original 8088/8086 processors could only address 1 meg total. > This had to include memory for programs but also ROM for the BIOS and video > card (not much else; this was before there were many different expansion > boards). This had to be placed somewhere in the 1meg, so 10 segments were > chosen for user programs and 6 segments for the BIOS and video ram/bios. > 10 segments of 64k results in the infamous 640k barrier. Gates may have > said 'Nobody will ever need more than 640k,' but he didn't create the > limit. (Nor was this statement necessarily short sighted, imho; PC's > didn't really have much to offer initially [no programs, basically] and I > doubt anyone saw the architecture propogating for 15+ years). Agreed. MS didn't MAKE the INTEL architecture, however the OS could have been coded more portably so that programs could have room to breathe in future architectures. Look at UNIX for example. > Nor did Microsoft conspire to limit hard drive space. They just gave in to > consumer pressure -- hard drives were growing, but DOS was limited to 32meg > partitions. So they extended the barrier a few times but remember, they > were essentially extending a file system designed for 160k floppy disks. Agreed again. However, they never even tried to create a FS that solved the problems inherent in FAT. They just kept kludging the existing FS. They seem to be really good at "kludging" their products instead of actually "improving" them. Microkludge? :o) > PCs were around before hard drives became standard in desktop computers. > Perhaps it wasn't a great choice to use the same file system for floppies > as hard drives, but memory was a major constraint as was compatibility; I > doubt there were any other acceptable choices. Adding subdirectories to > the FAT system (done in DOS 2.0 if I have my version numbers right) screwed > up enough programs; perhaps they didn't want to have it happen again. A transparent FS layer would have solved this problem. Again, look at UNIX. MS-DOS was created out of CPM and UNIX. It would have been nice if they had taken some of the more advanced UNIX concepts and included them into DOS. Granted however, this may have been a little too much for an 8088 with 16k of memory at the time. > This all boils down to DOS being an operating system designed for the > 8086/8088 with 16k of memory and storage being a removable 160k floppy > disk. The PC became tremendously useful (Visicalc and Lotus did great > things for the platform), though, and consumers demanded for more. Getting > more memory or disk space while maintaining compatibility is like squeezing > blood from a gun -- you'll probably just shoot yourself in the foot. Yeah, I agree with all you've said here, but I still think that they could have done a much better job and broken the OS barriers earlier. The 386 was out in '85 or so and it is just NOW that we're really getting to actually USE the damned thing!!! I'm talking about Linux/OS2/NT/and DOS extenders. These things could all have easily been around 10 years earlier. > Anyway, I've rambled long enough; I hope this helps clear up a few things > for some people as well as help everyone appreciate the technical > difficulties Caldera has overcome! Oh, even though we havent actually tried them yet, I sure appreciate the efforts of Caldera! I'm also sure that we will see a better FS in DOS very soon too. With no help from M$ either! :o) TTYL Mike A. Harris - Computer Consultant http://www3.sympatico.ca/mharris My dynamic address: http://www3.sympatico.ca/mharris/ip-address.html mailto:mharris AT sympatico DOT ca mailto:mharris AT blackwidow DOT saultc DOT on DOT ca Coast to Coast AM with Art Bell: The #1 Late Night talk radio program.