From: fjh AT cs DOT mu DOT OZ DOT AU (Fergus Henderson) Subject: Re: ASCII and BINARY files. Why? 2 Feb 1997 02:22:08 -0800 Approved: cygnus DOT gnu-win32 AT cygnus DOT com Distribution: cygnus Message-ID: <199702020520.QAA30142.cygnus.gnu-win32@mundook.cs.mu.OZ.AU> Content-Type: text Original-To: jqb AT netcom DOT com (Jim Balter) Original-Cc: gnu-win32 AT cygnus DOT com (gnu-win32) In-Reply-To: <32EFD915.673E@netcom.com> from "Jim Balter" at Jan 29, 97 03:11:17 pm X-Mailer: ELM [version 2.4 PL24] Original-Sender: owner-gnu-win32 AT cygnus DOT com Jim Balter wrote: > > I can guarantee you that the text/binary split will *never* stop > being a major headache. This is probably true no matter what we do. > The fact that cat throws away characters > from files and stops dead at ^Z makes any hope of building robust > systems on top of this thing hopeless. I think it would be a good idea to - change the text->binary and binary->text translations so that text->binary->text or binary->text->binary translations leave the original intact - not treat ^Z in text files as EOF (^Z at the console should be EOF iff `stty eof ^Z'.) > One solution would be to do away with the text/binary > split and fix any program that cannot handle CR's within > lines. I'm not talking about throwing them away in filters, > as with the current situation, but rather make sure that programs > that *parse* lines can handle arbitrary whitespace. There are many programs that treat "\n" as different from other whitespace. Your suggestion amounts to requiring applications to check for "\r\n" and treat it as equivalent to "\n", even if the file was not opened in binary mode. Philosophically, this seems like a bit of a step backwards from the ANSI C approach of making it the implementation's responsibility, not the program's. Making these changes might help interoperability in other contexts (e.g. when using network file systems shared by both DOS and Linux), so I guess it is arguable that they're a good idea anyway, but I think there would be pragmatic problems anyway: I think grep fopen etc. is going to have fewer hits than grep '\n'. For example of the difference, in the C preprocessor, #define foo \ bar is different from #define foo \ bar Now, in this particular case, it is implementation-defined what constitutes the end of a line, and so the GNU C preprocessor could define the end of a line as either "\r\n" or "\n". However, the ANSI standard requires that the implementation document this choice, and so if this change were made, the documentation would need to be changed. > This would all > be POSIX compatible and viewable as bug fixes, and thus quite possibly > mergeable back into the GNU sources. I don't agree that it would be viewable as bug fixes. Strictly speaking, the documentation of all these sources would have to be changed to relect the new behaviour. Note that if this approach were taken, and the changes were merged back into the GNU sources, then it would affect the behaviour of the other version (e.g. the Linux version) not just the gnu-win32 versions. Still, even though they're not bug fixes, such changes might be mergeable back into the GNU sources as enhancements. > There might be a few exceptions > where the lines are defined as exactly the bytes up to a NL, Why do you think there would only be "a few" exceptions like this? I think that cases like this are very common. So, I think the problem with your suggestion is that even though these changes might well be worthy enhancements, the sheer number of changes required would be overwhelming. -- Fergus Henderson | "I have always known that the pursuit WWW: | of excellence is a lethal habit" PGP: finger fjh AT 128 DOT 250 DOT 37 DOT 3 | -- the last words of T. S. Garp. - For help on using this list, send a message to "gnu-win32-request AT cygnus DOT com" with one line of text: "help".