From: jqb AT netcom DOT com (Jim Balter) Subject: Re: using cat on binary files (CTRL-Z trauma) 31 Oct 1996 20:49:45 -0800 Sender: daemon AT cygnus DOT com Approved: cygnus DOT gnu-win32 AT cygnus DOT com Distribution: cygnus Message-ID: <199610312247.OAA06452.cygnus.gnu-win32@netcom23.netcom.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Original-To: jjf AT dsbc DOT icl DOT co DOT uk (J.J.Farrell) Original-Cc: dj AT delorie DOT com, jqb AT netcom DOT com, gnu-win32 AT cygnus DOT com In-Reply-To: <199610312042.14110.0@dsbc.icl.co.uk> from "J.J.Farrell" at Oct 31, 96 08:42:10 pm X-Mailer: ELM [version 2.4 PL23] Content-Length: 3923 Original-Sender: owner-gnu-win32 AT cygnus DOT com J.J.Farrell wrote: > > > From: DJ Delorie > > > > Since Unix doesn't store the right control characters in the file, it > > must allow the user to specify how newlines are to be converted for > > their terminals. Need a CR? Use "stty onlcr". Don't support lower > > case? Use "stty olcul". Return key generates CR instead of LF? Use > > "stty icrnl". This doesn't seem to me to work as an argument for stty being a consequence of unix file structure. Terminals that don't support lower case would require a mapping even in DOS; this has nothing to do with file structure. And onlcr and icrnl are the default mappings which are rather modern fine-grained versions of the older cooked (abstract) vs. raw (concrete) notions. The reason that you couldn't log on if there were no stty in the library is because getty historically starts out in raw mode so that it can detect the baudrate of the terminal, lowercase support, and so on; it could just as well start out in cooked mode and not try to make these distinctions. The return key on dumb terminals always generates CR, not LF; no one ever turns off just icrnl, it was added in the SysIII version of termio for completeness. Even in DOS the enter key has to be mapped, and if DOS supported dumb terminals whose enter key generated an LF, the exact same considerations would apply. From all this talk about the "right" characters, the Mac does it right because it stores CR, which is the representation of the enter key. Input isn't symmetric with output. unix responds to this by dealing in line abstractions and pushing device details into device drivers. Perhaps some of this is a result of files in DOS being historically viewed more as text to be entered on a terminal and dumped out to a terminal, whereas in unix they are manipulable objects that are operated on by filters. After all, that distinction is reflected in the discussion about cat being "intended" to dump text files on terminals. That most certainly is not the paradigmatic view in unix. > > Personally, I think they should have added a separate NL control code, > > that meant CR/LF, but they didn't. > > I'm not sure that there is a concept of 'the right control characters' > in UNIX. UNIX has chosen to use a particular control character to > indicate the end of a 'line', and it is up to programs (in the widest > sense) to recognise it and interpret it appropriately. With this > philosophy, it is sensible to use a single character rather than two > since it uses less space. It doesn't matter what that character is, > and they chose to use linefeed. This was perhaps an unfortunate choice > since it is related to what needs to be sent to terminals and printers > at the end of each line, and sometimes causes confusion (as now). It confuses people because it makes the abstraction look too much like the concrete device, and thinking abstractly is difficult in the first place. Actually a good argument can be made that the line terminator should have been NUL, even further simplifying processing and making the file abstraction a list of strings, strings that could even contain LF characters. The one problem with this is that a file not ending with a NUL would have required some out-of-band indicator (e.g., a special return value) at the 'gets' level, although it could be argued that programs that need to be sensitive to that could always be coded in terms of 'getc'. > It's best to think of it as an arbitrary character used to indicate > the end of a line; once programs have used it to find the end of the > line, they logically discard it and do whatever end-of-line processing > is needed, such as sending CR/LF to the terminal. In unix, the terminal driver already echoed the return key as CR/LF (if in cooked mode), and programs indicate the end-of-line abstraction on output with a single character, just as with input. -- - For help on using this list, send a message to "gnu-win32-request AT cygnus DOT com" with one line of text: "help".