Mail Archives: opendos/1997/04/30/17:54:07
A while ago I volunteered that I have been thinking about attributed file
systems for some time. Here are some followup notions for general discussion
(presented in no particular order):
A file system presents a single hierarchical name space for storing
files (in the case of DOS, there are multiple hierarchies, with a drive
letter rooting each name space, and mapping to an individual disk
partition). UNIX file systems lack the fragmentation of drive letters,
allowing all the partitions to be mounted into a single tree, and also
provide for "links" between branches of the tree, allowing for more than
a single path to an individual file. One use of an attributed file system
would be to isolate name components from a fixed hierarchy, and allow for
1) query-style searches and operations on the file system, and
2) a flexible (dynamic) hierarchical presentation of the file system.
As I describe the types of attributes that might be interesting, hopefully
I can provide examples of these two alternate browsing mechanisms and how
they might be used.
Some attributes that I think it would be interesting to have include:
- for a directory, aggregate disk space of all my descendants
- owner, group, access control list, etc.
- package I belong to
- security signature
- file that created me
- file type (in MIME? or some other standard form)
- icon
- etc.
This list is by no means exhaustive, but gives an example of the types
of things that could be kept. Some of these attributes only make sense
if you have an appropriate file system monitor installed which can track
the information (like the directory disk space attribute) Every time
a file size was modified, the change could be rippled up the path to
all enclosing directories, and the disk space count could always be up
to date. This would allow trivial (and very rapid) display of disk
usage, and make disk quotas much easier. Another example where a monitor
would be handy would be for "file that created me". Often (not always),
this information is readily available, and could be used to find documents
of a particular type (or to assign a file type to the documents).
While at Novell, I worked on the Personal NetWare project, and we did
Access Control Lists (what users can access a file with what rights) by
implementing a shadow file system, where the ACLs were maintained outside
of the regular file system. It was a big pain keeping the ACLs synchronized
with the "real" file system (and in some cases, like where changes occured
without the PNW server loaded, we had to punt and just accept that things
could get out of synch). Based upon those experiences, I would prefer
a simple, but integrated system, with essentially a "shadow" directory
entry per file in the system. This would be similar to "resource forks"
on the Macintosh. I would recommend also a simple ASCII (name, value) pair
as strings in the attributes "fork" (with some simple convention for binary
data (like for an icon). OS/2 hpfs went too far with its "as many resource
forks as you like" approach. The additional directory entry would double
the number of directory entries required for the file system, and would cause
some overhead in regular scanning. It would point essentially to a block
of data in its own file (an unnamed file?), and would not be returned by
DOS on directory scans. It could have a special char in it, but in all other
respects look like a normal directory entry, so that disk processing
programs were not confused.
It sure would be nice to be able to index the file system by any
particular attribute, so that a different hierarchical view could be
synthesized on demand. For example, I'd like to be able to generate
a tree of directories where each directory at the root was the month,
then each subdirectory was a day, and the files would appear in the
appropriate subdirectory based on their last modification time (or
creation time). This would make finding a file based upon the last
time I edited it (or created it) easier. Or I'd like to see a tree where
the top directories were the packages installed on the system, and the
files were listed each under its owning (or creating) package.
Actually, you don't have to keep an
index to sort and search the file system this way, but having an
index would make it fast enough to present these views interactively.
Unix users will no doubt claim that the "find"
command can do these types of operations, and they would be mostly right.
Some of the browse methods described here would take fairly complex
combinations (piping) of find, sort, cut, etc. however.
Eventually you end up with a system similar to a database, except with
no fixed schema on the information kept about a file. Conventions would
have to be used to keep thing sane between the monitors which created and
maintained the attributes and the utilities which sorted on, searched by, and
utilized the attributes. Also, with a generic name, value pair syntax,
a generic utility could be created to hand-manipulate an attribute, if
necessary or desirable.
One of the things I have noted in my career is that some problems are
really hard by themselves, but when presented with an enabling technology
become almost trivial. (For example, Novell worked for years and
finally achieved a secure authentication mechanism without an encrypted
communications channel. Netscape did the encrypted communications channel
first, and secure authentication for them was trivial) I believe that this
principle applies to an attributed file system. I have seen remarks go by
on this list about implementing a system where descriptions could be kept
about a file (I guess 4dos allows this). Also, people have been discussing
how to keep ACLs for files. With an attributed file system in place, many
thorny issues involved in developing such systems would be solved, and the
work could concentrate on those issues unique to that particular problem.
Thanks for listening.
Tim Bird
P.S. I heard the source code general availability was announced today.
I have seen the source code license, and it is IMHO very good. Much looser
than the original binary license. Redistribution (non-commercial) of both
source and binaries is allowed.
- Raw text -