Message-ID: <3600096D.7CEF24DD@vlsi.com> Date: Wed, 16 Sep 1998 11:54:37 -0700 From: Charles Marslett MIME-Version: 1.0 To: Eli Zaretskii CC: djgpp-workers AT delorie DOT com Subject: Re: auto-binary-mode? References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Precedence: bulk Eli Zaretskii wrote: > > On Tue, 15 Sep 1998, Charles Marslett wrote: > > > But I found that looking for at least 3 CR/LF pairs in the > > first 512 bytes of the file worked pretty well (PC file format, of course) > > and it worked better if you relaxed the rule when lots of backspaces showed > > up (I think I counted backspaces and when the counter hit 100 I counted > > that as a CR/LF pair or some such thing). If the CR/LF counter was 0, 1 > > or 2 I had a binary file, more than that indicated a text file (I actually > > used assembly with scan instructions, so there really wasn't a counter as > > such -- just where the program counter was). > > I think you are mixing two different issues: the Unix- vs DOS-style > text files and the binary vs text files. They are NOT the same, and > thus using the approach you suggest would introduce subtle bugs and > misfeatures into innocent programs like GCC, Gawk, Sed, etc. > > A file that has CR/LF pairs can be a binary file (e.g., an executable > image with text of multi-line messages inside it), but it is still a > binary file. OTOH, a text file can have Unix-style LF-only lines, and > it still should be treated as text file (e.g., the ^Z character at its > end should still be stripped). Well, I was thinking of the issue as only being between Unix (binaryish) text files and DOS text files. The whole problem arises because in most Unix systems one need not distinguish between text and binary files. If a file is a Unix-style text file with LF-only lines, then it should, IMHO, never have a ^Z at the end, and should be processed by the OS-ish part of the system as binary (ignoring issues based on the API call used, such as end of line identification in gets() for example). That is, an application would never have been written with an "r" or "w" fopen() call if it were important to distinguish between text and binary I/O on the system the program was written for (a Unix most likely). > GNU Emacs originally failed to distinguish between these two issues, > which caused several headaches when Emacs 20 began to automatically > detect and convert CR/LF to LF and back. Guessing the EOL format is > okay in text files, but reading binary files should be done with no > guesswork and no conversions at all. Since text files can be reliably > read in text mode without any guessing at all, it isn't really needed. I disagree. The problem was and is that GNU Emacs does not have an inherent way of specifying a file as being text or binary -- exactly the problem addressed by "rb" or "rt", and the problem pointed out by you and others with installing autodetect in the library. No program that does an fopen() with "r" or "w" can possibly have a concept of text and binary files. As an unrelated side issue, what difference is there between a text and binary file in the Microsoft world except for the processing of the ^Z and CR/NL characters? (And of course the side effects of that processing that leaks into ftell(), fseek() and other functions that depend on the number if characters in strings read from or written to the file.) Are there systems that need text identification other than for handling the differences in end of line and end of file parsing? --Charles