Mail Archives: djgpp/2002/04/29/18:17:29
xeon wrote:
>
> Hi,
>
> I'm wondering, how to determine is a file is a text file, or a binary
> file, programatically. I'm thinking about reading 4 bytes from the
> file and test them if they're in the range of usual text ([a-z],
> [A-Z], etc. The 4 bytes is read from the following locations : 1st
> byte, last byte, and 2 randomly selected offset inside the file. Is
> this enough?
Not really. The fundamental problem is in formulating precise
definitions of "text file" and "binary file:" try to do so and you'll
quickly discover the kinds of trouble you'll get into.
For example, is a file containing "abc\n" a text file of one
three-letter newline-terminated line, or is it a binary file
storing the number 0x6162630a == 1633837834? Or 'tother way
round, if you find a byte with the high bit set are you looking
at a binary file or at a text file containing the character "ß"?
That said, you can make a guess of sorts, although you'll never
be 100% accurate. Take a look at the source of the "file" program
for some ideas.
--
Eric Sosman
esosman AT acm DOT org
- Raw text -