delorie.com/archives/browse.cgi   search  
Mail Archives: opendos/1997/01/22/16:38:37

Message-Id: <3.0.32.19970122151541.006a1a10@ctrvax.vanderbilt.edu>
Date: Wed, 22 Jan 1997 15:15:44 -0600
To: opendos AT mail DOT tacoma DOT net
From: Chip Turner <turnerjh AT ctrvax DOT Vanderbilt DOT Edu>
Subject: Re: [opendos] Re: OpenDOS to be released next week!
Mime-Version: 1.0
Sender: owner-opendos AT mail DOT tacoma DOT net

>On Wed, 22 Jan 1997, Mark Habersack wrote:

>32k clusters!!  Ouch!! I wrote a program to analyze disk wastage on a
>"cluster size" basis.  With 32k clusters, and the disk being FULL of
>files, you'll find that you have 30% disk wastage.  Yes, thats right,
>I said *THIRTY* percent.  This means a 1 gig disk that is full is
>wasting 300 megs.  Scary eh?

Okay, the accuracy gland in the back of my head has started pumping out
that icky fluid, so I'm forced to reply and maybe clear up a few things for
the audience.

The amount of disk space wasted is more dependent on the number of files
than just the storage needs of those files.  This is because each file
takes up Floor(s/n)+1 clusters, where s is the size of the file in bytes
and n is the size of the cluster in bytes (the floor function takes and
number to the largest integer less than or equal to it, so Floor(1.765)==1,
Floot(2)==2, and Floor(-1.5)==-2).  So a 1 byte file on a system that has
32k clusters takes up 1 cluster or 32k, the exact same as a 29k file would.
 The remainder of the cluster is packed with garbage (ie, whatever was in
the cluster before this file was put into it).

The FAT file system is a table-based file system, which basically means
there is one large table that keeps track of pointers for the clusters in a
file.  The FAT basically is a set of pairs (p,c) organized in one big chunk
on disk.  p stands for a pointer in the FAT to the next cluster of a file
and c stands for the pointer to the actual cluster that the data is located
in.  So basically we have a bunch of linked lists in the FAT whose data
members are pointers to clusters on disk.  This means there's one spot on
your disk where a crazy disk write ends up ruining your entire allocation
scheme, so a second copy of the FAT is kept (physically near the 1st copy,
I believe, but don't quote me).  There is one entry in the FAT for every
cluster on the disk, so a large number of clusters means a larger FAT size
as opposed to a small number of clusters (ie, large cluster size) resulting
in a small FAT.

Unix i-node file systems work on a different system; instead of being based
on the idea of a table of linked lists, they work in terms of indexes.
Each directory entry (ie, file) has associated with it several pointers to
other blocks on the disk.  Some of these pointers point directly to data
(direct pointers), some to other blocks of pointers that point to data
(indirect blocks), and some to other blocks of pointers to other blocks of
pointers that point to data (double indirect blocks).  Maybe a bit more
complicated, but it is much more efficient both in terms of speed and in
terms of storage space for large numbers of files.  The number of clusters
needed to store a file are larger than the number of clusters in the file,
but not by much (and always a small percentage if the file is larger than,
oh, about the size of a few clusters).

FAT is more efficient space-wise if you have bunches of huge files; inodes
are more efficient if you like speed for big file accesses and don't want
to waste as much space at the end of clusters.  It's incorrect to say that
every full 1 gig hard drive is wasting 30%; you must take into account the
sizes of the individual files.

>waste no more than 3% on average.  I've done extensive testing on a
>great many computers to get these results too.

Yes, in general, it is a Nasty Problem, but not always.  Also, it is
incorrect to say that 'on average' no more than 3% is wasted unless this is
a result of some kind of analysis; otherwise you should say 'I'm xx.xx%
certain that on average no more than 3% is wasted' because statistics are
easy to manipulate and can lead to false conclusions even if that wasn't
the intention (and I'm sure that wasn't your intention; this post is by no
means a flame).

>At the end of the listing you will see:

Heh at the end of my listing I get an out of memory error...  9,000 files
on a 511meg partition is apparently just too much for 4DOS to handle. =(
Never fear, one of the utilities that came with an old version of Norton
Utilities (fs for those who are interested) tells me roughly 13% is slack
with 8k clusters.  Hmm... only 7% slack under the Win95 tree [and admitting
to running Windows 95 is sure to get me flamed on THIS list... ;-) ]

>Only 1 meg wasted.  Less than 1%!!

On one of my other partitions (880meg) I have 1% slack, too (16k clusters).
 Of course, that 600 meg file helps... ;-)  I once had everything in 4k
clusters, but when I added a 2 gig drive, it became annoying to have 12
drive letters, so I did a bit of consolidation.

On a related note, I've noticed some, ah, very prominent Anti-Microsoft
sentiments among some members of the list.  Admittedly, MS has had some
very questionable (ie, illegal) business practices, and their proprietary
OS's cause stagnation in innovation and limit 3rd parties from competing,
but often times some things get attributes to MS and Bill that aren't
really their fault.

For instance, the 640k barrier.  Technical details omitted, suffice it to
say that the original 8088/8086 processors could only address 1 meg total.
This had to include memory for programs but also ROM for the BIOS and video
card (not much else; this was before there were many different expansion
boards).  This had to be placed somewhere in the 1meg, so 10 segments were
chosen for user programs and 6 segments for the BIOS and video ram/bios.
10 segments of 64k results in the infamous 640k barrier.  Gates may have
said 'Nobody will ever need more than 640k,' but he didn't create the
limit.  (Nor was this statement necessarily short sighted, imho; PC's
didn't really have much to offer initially [no programs, basically] and I
doubt anyone saw the architecture propogating for 15+ years).

Nor did Microsoft conspire to limit hard drive space.  They just gave in to
consumer pressure -- hard drives were growing, but DOS was limited to 32meg
partitions.  So they extended the barrier a few times but remember, they
were essentially extending a file system designed for 160k floppy disks.
PCs were around before hard drives became standard in desktop computers.
Perhaps it wasn't a great choice to use the same file system for floppies
as hard drives, but memory was a major constraint as was compatibility; I
doubt there were any other acceptable choices.  Adding subdirectories to
the FAT system (done in DOS 2.0 if I have my version numbers right) screwed
up enough programs; perhaps they didn't want to have it happen again.

This all boils down to DOS being an operating system designed for the
8086/8088 with 16k of memory and storage being a removable 160k floppy
disk.  The PC became tremendously useful (Visicalc and Lotus did great
things for the platform), though, and consumers demanded for more.  Getting
more memory or disk space while maintaining compatibility is like squeezing
blood from a gun -- you'll probably just shoot yourself in the foot.

Anyway, I've rambled long enough; I hope this helps clear up a few things
for some people as well as help everyone appreciate the technical
difficulties Caldera has overcome!

(I constantly wear flame-retardent underware, but before fingering the
flame throwers, bear in mind I'm no Microsoft lover and am only presenting
the other side of the coin...)

Chip


--
Chip Turner -- turnerjh AT ctrvax DOT vanderbilt DOT edu
               http://cswww.vuse.vanderbilt.edu/~turnerj1

               "A man who is good ought not to calculate the chance 
               of living or dying; he ought only to consider 
               whether in doing anything he is doing right or
               wrong -- acting the part of a good man or of a bad."
                              Plato, The Apology

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019