delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/1997/01/31/04:33:28

From: jepler AT inetnebr DOT com (Jeff Epler)
Subject: Re: ASCII and BINARY files. Why?
31 Jan 1997 04:33:28 -0800 :
Approved: cygnus DOT gnu-win32 AT cygnus DOT com
Distribution: cygnus
Message-ID: <Mutt.19970130231214.jepler.cygnus.gnu-win32@craie.inetnebr.com>
References: <c=US%a=_%p=Amulet._Inc.%l=JAGUAR-970130155732Z-7670 AT jaguar DOT amulet DOT com> <32F12CC1 DOT 7867 AT netcom DOT com>
Mime-Version: 1.0
Original-To: gnu-win32 AT cygnus DOT com ('gnu-win32 AT cygnus DOT com')
X-Mailer: Mutt 0.55
X-Operating-System: Linux craie 2.0.28 #20 Tue Jan 28 22:23:48 CST 1997 i586
In-Reply-To: <32F12CC1.7867@netcom.com>; from Jim Balter on Jan 30, 1997 15:20:33 -0800
Original-Sender: owner-gnu-win32 AT cygnus DOT com

Jim Balter writes:
> Fran Litterio wrote:
> > 
> > Jim Balter wrote:
> > 
> > >unix deals with byte streams, and there are many tools for
> > >manipulating them, rather than having systems that think
> > >they know what they are doing deleting every byte after a ^Z
> > >and destroying valuable work.
> > 
> > Yes.  I am now completely convinced that gnu-win32 should switch to an
> > all-binary-all-the-time scheme.  read() should not convert CRNL to NL
> > (nor write() do the reverse).  cat should not have implicit knowledge of
> > what a ^Z means (i.e., nothing under UNIX).  The gnu-win32 DLL should
> > probably even be made recognize a ^D typed on the keyboard (not coming
> > down a pipe) to mean end-of-file.
> 
> It would be nice if it can be done, but since this only a matter of what
> humans type, it does not break anything (other than possibly some
> existing documentation) to require people to type ^Z instead of ^D.  Of
> course, if it is done, it must be done right; ^D's should *only* be
> looked at when coming from a keyboard, nowhere else, and they cause a
> read() to return exactly as the ENter key does but without returning the
> ^D, so that it is possible to enter terminator-less lines from the
> keyboard (e.g., abc^D reads as abc with no newline at the end).

The unix world seems to do it like this:
..  Input from a non-tty should never treat _any_ character as EOF, only the
   physical end-of-file is noticed in this way
..  Any character seems to be assignable to EOF using the stty command.  ^D
   is the usual setting, but on Linux even 'q' can be the eof character.

   (Bash on Linux won't pay any attention to 'q' as the eof character on
   the commandline, but cat understood the setting.  Setting eof to ^]
   had an effect both in bash and in cat)

Once stty is more capable on gnu-win32, we could allow users to choose
between ^D, ^Z or anything else they wanted.  On input from a non-tty,
there is no EOF character.

I'm not ready to tackle the other, larger issues of binary vs. text files,
but I think that ^Z == EOF isn't a biggie, unless someone replies with a
good reason we want to treat ^Z == EOF when taking input from anywhere but
a tty.

Uh-oh, I'm going to state my position of BvsT anyway.

Here's what we should do:  Have a list of questions the library asks itself
to determine if the file should have text translation performed on it.
It tries the steps in order until it comes to a conclusion about the
particular file:

1. If the filename begins bin: or ends :bin, no (with : and bin stripped)
2. if the filename begins txt: or ends :txt, yes (with : and txt stripped)
3. If 'b' is specified in the 'fopen' second parameter, or O_BINARY is
   specified to open, no
4. If 't' is specified in the 'fopen' second parameter, or O_TRANS (O_TXT?)
   is specified to open, yes (isn't this currently undefined by relevant
   standards, so we could define it in any way we wished?)
5. If the filesystem is mounted -b, no
6. If the filesystem is mounted -t, yes
7. if the filename is matched by a glob in $BIN, no
	(default = *.com;*.exe;*.sys;*.dll;*.o;*.a;*.bin;*.tar;*.gz;*.tgz)
8. If the filename is matched by a glob in $TXT, yes
	(default = *.txt;*.doc;README;read.me;readme.1st)
9. Otherwise, no

One can quibble about the order, and whether #4 should be there at all.
Environment variable names can also be debated.  However, it has the
following features:
	. If the user knows better than the application, the user can
	always have his way, using bin: or txt:
	
	. The user can get an ultimate default of text mode by including *
	as one of the globs in $TXT, so don't bug me that my decision for
	#9 is wrong

	. Use of extension-based CRLF translation is currently used in the
	Linux FAT filesystem optionally; I find it pleasant to have most of
	the time.

Perhaps there should be an herustic somewhere in here that says 'if
<CR><LF> in first block of file, open in text mode' (step 8.5?) but I don't
want to get bit if I'm unlucky enough to get output like that from, say, a
nifty new compressor whose extension I didn't add to $BIN yet.

What currently happens when we fseek/lseek on a program not opened in
binary mode?

Jeff
-- 
\/ jepler AT inetnebr DOT com http://incolor.inetnebr.com/jepler/ (0|1(01*0)*1)+
-
For help on using this list, send a message to
"gnu-win32-request AT cygnus DOT com" with one line of text: "help".

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019