Message-Id: <3.0.32.19970122151541.006a1a10@ctrvax.vanderbilt.edu> Date: Wed, 22 Jan 1997 15:15:44 -0600 To: opendos AT mail DOT tacoma DOT net From: Chip Turner Subject: Re: [opendos] Re: OpenDOS to be released next week! Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: owner-opendos AT mail DOT tacoma DOT net Precedence: bulk >On Wed, 22 Jan 1997, Mark Habersack wrote: >32k clusters!! Ouch!! I wrote a program to analyze disk wastage on a >"cluster size" basis. With 32k clusters, and the disk being FULL of >files, you'll find that you have 30% disk wastage. Yes, thats right, >I said *THIRTY* percent. This means a 1 gig disk that is full is >wasting 300 megs. Scary eh? Okay, the accuracy gland in the back of my head has started pumping out that icky fluid, so I'm forced to reply and maybe clear up a few things for the audience. The amount of disk space wasted is more dependent on the number of files than just the storage needs of those files. This is because each file takes up Floor(s/n)+1 clusters, where s is the size of the file in bytes and n is the size of the cluster in bytes (the floor function takes and number to the largest integer less than or equal to it, so Floor(1.765)==1, Floot(2)==2, and Floor(-1.5)==-2). So a 1 byte file on a system that has 32k clusters takes up 1 cluster or 32k, the exact same as a 29k file would. The remainder of the cluster is packed with garbage (ie, whatever was in the cluster before this file was put into it). The FAT file system is a table-based file system, which basically means there is one large table that keeps track of pointers for the clusters in a file. The FAT basically is a set of pairs (p,c) organized in one big chunk on disk. p stands for a pointer in the FAT to the next cluster of a file and c stands for the pointer to the actual cluster that the data is located in. So basically we have a bunch of linked lists in the FAT whose data members are pointers to clusters on disk. This means there's one spot on your disk where a crazy disk write ends up ruining your entire allocation scheme, so a second copy of the FAT is kept (physically near the 1st copy, I believe, but don't quote me). There is one entry in the FAT for every cluster on the disk, so a large number of clusters means a larger FAT size as opposed to a small number of clusters (ie, large cluster size) resulting in a small FAT. Unix i-node file systems work on a different system; instead of being based on the idea of a table of linked lists, they work in terms of indexes. Each directory entry (ie, file) has associated with it several pointers to other blocks on the disk. Some of these pointers point directly to data (direct pointers), some to other blocks of pointers that point to data (indirect blocks), and some to other blocks of pointers to other blocks of pointers that point to data (double indirect blocks). Maybe a bit more complicated, but it is much more efficient both in terms of speed and in terms of storage space for large numbers of files. The number of clusters needed to store a file are larger than the number of clusters in the file, but not by much (and always a small percentage if the file is larger than, oh, about the size of a few clusters). FAT is more efficient space-wise if you have bunches of huge files; inodes are more efficient if you like speed for big file accesses and don't want to waste as much space at the end of clusters. It's incorrect to say that every full 1 gig hard drive is wasting 30%; you must take into account the sizes of the individual files. >waste no more than 3% on average. I've done extensive testing on a >great many computers to get these results too. Yes, in general, it is a Nasty Problem, but not always. Also, it is incorrect to say that 'on average' no more than 3% is wasted unless this is a result of some kind of analysis; otherwise you should say 'I'm xx.xx% certain that on average no more than 3% is wasted' because statistics are easy to manipulate and can lead to false conclusions even if that wasn't the intention (and I'm sure that wasn't your intention; this post is by no means a flame). >At the end of the listing you will see: Heh at the end of my listing I get an out of memory error... 9,000 files on a 511meg partition is apparently just too much for 4DOS to handle. =( Never fear, one of the utilities that came with an old version of Norton Utilities (fs for those who are interested) tells me roughly 13% is slack with 8k clusters. Hmm... only 7% slack under the Win95 tree [and admitting to running Windows 95 is sure to get me flamed on THIS list... ;-) ] >Only 1 meg wasted. Less than 1%!! On one of my other partitions (880meg) I have 1% slack, too (16k clusters). Of course, that 600 meg file helps... ;-) I once had everything in 4k clusters, but when I added a 2 gig drive, it became annoying to have 12 drive letters, so I did a bit of consolidation. On a related note, I've noticed some, ah, very prominent Anti-Microsoft sentiments among some members of the list. Admittedly, MS has had some very questionable (ie, illegal) business practices, and their proprietary OS's cause stagnation in innovation and limit 3rd parties from competing, but often times some things get attributes to MS and Bill that aren't really their fault. For instance, the 640k barrier. Technical details omitted, suffice it to say that the original 8088/8086 processors could only address 1 meg total. This had to include memory for programs but also ROM for the BIOS and video card (not much else; this was before there were many different expansion boards). This had to be placed somewhere in the 1meg, so 10 segments were chosen for user programs and 6 segments for the BIOS and video ram/bios. 10 segments of 64k results in the infamous 640k barrier. Gates may have said 'Nobody will ever need more than 640k,' but he didn't create the limit. (Nor was this statement necessarily short sighted, imho; PC's didn't really have much to offer initially [no programs, basically] and I doubt anyone saw the architecture propogating for 15+ years). Nor did Microsoft conspire to limit hard drive space. They just gave in to consumer pressure -- hard drives were growing, but DOS was limited to 32meg partitions. So they extended the barrier a few times but remember, they were essentially extending a file system designed for 160k floppy disks. PCs were around before hard drives became standard in desktop computers. Perhaps it wasn't a great choice to use the same file system for floppies as hard drives, but memory was a major constraint as was compatibility; I doubt there were any other acceptable choices. Adding subdirectories to the FAT system (done in DOS 2.0 if I have my version numbers right) screwed up enough programs; perhaps they didn't want to have it happen again. This all boils down to DOS being an operating system designed for the 8086/8088 with 16k of memory and storage being a removable 160k floppy disk. The PC became tremendously useful (Visicalc and Lotus did great things for the platform), though, and consumers demanded for more. Getting more memory or disk space while maintaining compatibility is like squeezing blood from a gun -- you'll probably just shoot yourself in the foot. Anyway, I've rambled long enough; I hope this helps clear up a few things for some people as well as help everyone appreciate the technical difficulties Caldera has overcome! (I constantly wear flame-retardent underware, but before fingering the flame throwers, bear in mind I'm no Microsoft lover and am only presenting the other side of the coin...) Chip -- Chip Turner -- turnerjh AT ctrvax DOT vanderbilt DOT edu http://cswww.vuse.vanderbilt.edu/~turnerj1 "A man who is good ought not to calculate the chance of living or dying; he ought only to consider whether in doing anything he is doing right or wrong -- acting the part of a good man or of a bad." Plato, The Apology