Mail Archives: djgpp-workers/1998/09/16/05:26:44
On Tue, 15 Sep 1998, Charles Marslett wrote:
> But I found that looking for at least 3 CR/LF pairs in the
> first 512 bytes of the file worked pretty well (PC file format, of course)
> and it worked better if you relaxed the rule when lots of backspaces showed
> up (I think I counted backspaces and when the counter hit 100 I counted
> that as a CR/LF pair or some such thing). If the CR/LF counter was 0, 1
> or 2 I had a binary file, more than that indicated a text file (I actually
> used assembly with scan instructions, so there really wasn't a counter as
> such -- just where the program counter was).
I think you are mixing two different issues: the Unix- vs DOS-style
text files and the binary vs text files. They are NOT the same, and
thus using the approach you suggest would introduce subtle bugs and
misfeatures into innocent programs like GCC, Gawk, Sed, etc.
A file that has CR/LF pairs can be a binary file (e.g., an executable
image with text of multi-line messages inside it), but it is still a
binary file. OTOH, a text file can have Unix-style LF-only lines, and
it still should be treated as text file (e.g., the ^Z character at its
end should still be stripped).
GNU Emacs originally failed to distinguish between these two issues,
which caused several headaches when Emacs 20 began to automatically
detect and convert CR/LF to LF and back. Guessing the EOL format is
okay in text files, but reading binary files should be done with no
guesswork and no conversions at all. Since text files can be reliably
read in text mode without any guessing at all, it isn't really needed.
- Raw text -