delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/1996/10/30/01:13:32

From: jqb AT netcom DOT com (Jim Balter)
Subject: Re: Text file format (off-topic, was Re: using cat on binary files (
30 Oct 1996 01:13:32 -0800 :
Sender: daemon AT cygnus DOT com
Approved: cygnus DOT gnu-win32 AT cygnus DOT com
Distribution: cygnus
Message-ID: <199610300455.UAA09829.cygnus.gnu-win32@netcom23.netcom.com>
Mime-Version: 1.0
Original-To: kerr AT wizard DOT net (Shane Kerr)
Original-Cc: jqb AT netcom DOT com, gnu-win32 AT cygnus DOT com
In-Reply-To: <199610300258.VAA04111@wizard.wizard.net> from "Shane Kerr" at Oct 29, 96 09:58:08 pm
X-Mailer: ELM [version 2.4 PL23]
Original-Sender: owner-gnu-win32 AT cygnus DOT com

Shane Kerr wrote:
> 
> I know this is probably pretty far off of the GNU-WIN32 topic, but...
> 
> > Maybe we can file a class action suit for a few billion against the
> > turkey who unleashed on the world a system with such fundamentally
> > bad design decisions as a two-character EOL indicator and an in-band
> > EOF indicator.
> 
> You have to understand where the Win32 file system came from: MS-DOS.
> Then you have to understand where the MS-DOS file system came from: 
> CP/M.  In CP/M, there was no system information describing the size 
> of a file - only the number of blocks that it used.  So an in-band 
> EOF indicator was needed. 

An EOF indicator was needed because the file system didn't maintain a byte
count.  Was not maintaining a byte count necessary?  No; it was a bad design
decision to keep only a block count and not a byte count, despite hundreds of
file systems in existence at the time that kept byte counts.  Many really bad
systems that kept only sector or word counts had been distributed by
individual computer vendors, and those systems had been recognized as being
mistakes and abandoned in favor of better technology.  The CP/M file system
was designed by amateurs without an appreciation for the state of the art.

> As for the two-character EOL, it _does_ more accurately represent
> what's happening when you dump a text file to a line printer or a
> terminal.  At the end of each line, you want to go down a line, for
> which you use a newline character, then you want to move the print
> head back to the left-hand side, for which you use a carriage return 
> (return the carriage to the left).

So files contain these two characters because someone was too lazy to add a
couple of lines of code to drive the printer or terminal properly.  What if
the file system developer's terminal had been a stroke vector device?  We
might have ended up with characters being stored as strokes, by this
reasoning.

> When you think of it like this, 

Thinking like this is failing to think abstractly.  It's amateurish, and it's
bad design.

> it's UNIX (newline only) and Macs (carriage return only) that have a 
> bogus text file format, not CP/M, MS-DOS, and Win32.

Since all that is *needed* is a single line separator character, a single line
separator character is not bogus.  Since NL is the ANSI "newline" character,
it makes pretty good sense, although RS (record separator) might have made
more sense.  "carriage return" is harder to justify.

> The true evil is the different standards, not that any particular 
> standard is really that bad.

The CRNL eol creates problems because a lone CR or NL is not defined, and
because it makes character-at-a-time processing unnecessarily difficult.

> Yes, it sucks that we have to deal with 
> it now. But, as my boss says, it's easy to criticize, and hard to 
> create.

I've been designing and creating systems software for over 30 years.
Critical analysis makes good design.
-- 
<J Q B>
-
For help on using this list, send a message to
"gnu-win32-request AT cygnus DOT com" with one line of text: "help".

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019