From: "Tim Bird" Message-Id: <9704301546.ZM13817@caldera.com> Date: Wed, 30 Apr 1997 15:46:13 -0600 To: opendos AT delorie DOT com Subject: A few FS notions Cc: opendos-dev AT delorie DOT com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Precedence: bulk A while ago I volunteered that I have been thinking about attributed file systems for some time. Here are some followup notions for general discussion (presented in no particular order): A file system presents a single hierarchical name space for storing files (in the case of DOS, there are multiple hierarchies, with a drive letter rooting each name space, and mapping to an individual disk partition). UNIX file systems lack the fragmentation of drive letters, allowing all the partitions to be mounted into a single tree, and also provide for "links" between branches of the tree, allowing for more than a single path to an individual file. One use of an attributed file system would be to isolate name components from a fixed hierarchy, and allow for 1) query-style searches and operations on the file system, and 2) a flexible (dynamic) hierarchical presentation of the file system. As I describe the types of attributes that might be interesting, hopefully I can provide examples of these two alternate browsing mechanisms and how they might be used. Some attributes that I think it would be interesting to have include: - for a directory, aggregate disk space of all my descendants - owner, group, access control list, etc. - package I belong to - security signature - file that created me - file type (in MIME? or some other standard form) - icon - etc. This list is by no means exhaustive, but gives an example of the types of things that could be kept. Some of these attributes only make sense if you have an appropriate file system monitor installed which can track the information (like the directory disk space attribute) Every time a file size was modified, the change could be rippled up the path to all enclosing directories, and the disk space count could always be up to date. This would allow trivial (and very rapid) display of disk usage, and make disk quotas much easier. Another example where a monitor would be handy would be for "file that created me". Often (not always), this information is readily available, and could be used to find documents of a particular type (or to assign a file type to the documents). While at Novell, I worked on the Personal NetWare project, and we did Access Control Lists (what users can access a file with what rights) by implementing a shadow file system, where the ACLs were maintained outside of the regular file system. It was a big pain keeping the ACLs synchronized with the "real" file system (and in some cases, like where changes occured without the PNW server loaded, we had to punt and just accept that things could get out of synch). Based upon those experiences, I would prefer a simple, but integrated system, with essentially a "shadow" directory entry per file in the system. This would be similar to "resource forks" on the Macintosh. I would recommend also a simple ASCII (name, value) pair as strings in the attributes "fork" (with some simple convention for binary data (like for an icon). OS/2 hpfs went too far with its "as many resource forks as you like" approach. The additional directory entry would double the number of directory entries required for the file system, and would cause some overhead in regular scanning. It would point essentially to a block of data in its own file (an unnamed file?), and would not be returned by DOS on directory scans. It could have a special char in it, but in all other respects look like a normal directory entry, so that disk processing programs were not confused. It sure would be nice to be able to index the file system by any particular attribute, so that a different hierarchical view could be synthesized on demand. For example, I'd like to be able to generate a tree of directories where each directory at the root was the month, then each subdirectory was a day, and the files would appear in the appropriate subdirectory based on their last modification time (or creation time). This would make finding a file based upon the last time I edited it (or created it) easier. Or I'd like to see a tree where the top directories were the packages installed on the system, and the files were listed each under its owning (or creating) package. Actually, you don't have to keep an index to sort and search the file system this way, but having an index would make it fast enough to present these views interactively. Unix users will no doubt claim that the "find" command can do these types of operations, and they would be mostly right. Some of the browse methods described here would take fairly complex combinations (piping) of find, sort, cut, etc. however. Eventually you end up with a system similar to a database, except with no fixed schema on the information kept about a file. Conventions would have to be used to keep thing sane between the monitors which created and maintained the attributes and the utilities which sorted on, searched by, and utilized the attributes. Also, with a generic name, value pair syntax, a generic utility could be created to hand-manipulate an attribute, if necessary or desirable. One of the things I have noted in my career is that some problems are really hard by themselves, but when presented with an enabling technology become almost trivial. (For example, Novell worked for years and finally achieved a secure authentication mechanism without an encrypted communications channel. Netscape did the encrypted communications channel first, and secure authentication for them was trivial) I believe that this principle applies to an attributed file system. I have seen remarks go by on this list about implementing a system where descriptions could be kept about a file (I guess 4dos allows this). Also, people have been discussing how to keep ACLs for files. With an attributed file system in place, many thorny issues involved in developing such systems would be solved, and the work could concentrate on those issues unique to that particular problem. Thanks for listening. Tim Bird P.S. I heard the source code general availability was announced today. I have seen the source code license, and it is IMHO very good. Much looser than the original binary license. Redistribution (non-commercial) of both source and binaries is allowed.