delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/1996/10/31/20:49:45

From: jqb AT netcom DOT com (Jim Balter)
Subject: Re: using cat on binary files (CTRL-Z trauma)
31 Oct 1996 20:49:45 -0800 :
Sender: daemon AT cygnus DOT com
Approved: cygnus DOT gnu-win32 AT cygnus DOT com
Distribution: cygnus
Message-ID: <199610312247.OAA06452.cygnus.gnu-win32@netcom23.netcom.com>
Mime-Version: 1.0
Original-To: jjf AT dsbc DOT icl DOT co DOT uk (J.J.Farrell)
Original-Cc: dj AT delorie DOT com, jqb AT netcom DOT com, gnu-win32 AT cygnus DOT com
In-Reply-To: <199610312042.14110.0@dsbc.icl.co.uk> from "J.J.Farrell" at Oct 31, 96 08:42:10 pm
X-Mailer: ELM [version 2.4 PL23]
Original-Sender: owner-gnu-win32 AT cygnus DOT com

J.J.Farrell wrote:
> 
> > From: DJ Delorie <dj AT delorie DOT com>
> > 
> > Since Unix doesn't store the right control characters in the file, it
> > must allow the user to specify how newlines are to be converted for
> > their terminals.  Need a CR?  Use "stty onlcr".  Don't support lower
> > case?  Use "stty olcul".  Return key generates CR instead of LF?  Use
> > "stty icrnl".

This doesn't seem to me to work as an argument for stty being a consequence of
unix file structure.  Terminals that don't support lower case would require a
mapping even in DOS; this has nothing to do with file structure.  And onlcr
and icrnl are the default mappings which are rather modern fine-grained
versions of the older cooked (abstract) vs. raw (concrete) notions.  The
reason that you couldn't log on if there were no stty in the library is
because getty historically starts out in raw mode so that it can detect the
baudrate of the terminal, lowercase support, and so on; it could just as well
start out in cooked mode and not try to make these distinctions.

The return key on dumb terminals always generates CR, not LF; no one ever
turns off just icrnl, it was added in the SysIII version of termio for
completeness.  Even in DOS the enter key has to be mapped, and if DOS
supported dumb terminals whose enter key generated an LF, the exact same
considerations would apply.  From all this talk about the "right" characters,
the Mac does it right because it stores CR, which is the representation of the
enter key.  Input isn't symmetric with output.  unix responds to this by
dealing in line abstractions and pushing device details into device drivers.
Perhaps some of this is a result of files in DOS being historically viewed
more as text to be entered on a terminal and dumped out to a terminal, whereas
in unix they are manipulable objects that are operated on by filters.  After
all, that distinction is reflected in the discussion about cat being
"intended" to dump text files on terminals.  That most certainly is not
the paradigmatic view in unix.

> > Personally, I think they should have added a separate NL control code,
> > that meant CR/LF, but they didn't.
> 
> I'm not sure that there is a concept of 'the right control characters'
> in UNIX. UNIX has chosen to use a particular control character to
> indicate the end of a 'line', and it is up to programs (in the widest
> sense) to recognise it and interpret it appropriately. With this
> philosophy, it is sensible to use a single character rather than two
> since it uses less space. It doesn't matter what that character is,
> and they chose to use linefeed. This was perhaps an unfortunate choice
> since it is related to what needs to be sent to terminals and printers
> at the end of each line, and sometimes causes confusion (as now).

It confuses people because it makes the abstraction look too much like the
concrete device, and thinking abstractly is difficult in the first place.
Actually a good argument can be made that the line terminator should have been
NUL, even further simplifying processing and making the file abstraction a
list of strings, strings that could even contain LF characters.  The one
problem with this is that a file not ending with a NUL would have required
some out-of-band indicator (e.g., a special return value) at the 'gets' level,
although it could be argued that programs that need to be sensitive to that
could always be coded in terms of 'getc'.

> It's best to think of it as an arbitrary character used to indicate
> the end of a line; once programs have used it to find the end of the
> line, they logically discard it and do whatever end-of-line processing
> is needed, such as sending CR/LF to the terminal.

In unix, the terminal driver already echoed the return key as CR/LF (if in
cooked mode), and programs indicate the end-of-line abstraction on output with
a single character, just as with input.

-- 
<J Q B>

-
For help on using this list, send a message to
"gnu-win32-request AT cygnus DOT com" with one line of text: "help".

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019