delorie.com/archives/browse.cgi | search |
> I was thinking that if a file had any characters whose ASCII code was > < ' ' or >= DEL before the first \n, then the file would be considered > binary. Otherwise, the file would be text. You could apply this heuristic > to both input and output. It's hard to apply it to output, because you don't know what the program is going to write out. At least on input, you can read a big chunk (buffering does this anyway) and run some tests on the existing data. InfoZip has an automatic converter; we should see what heuristics it uses, since it has a lot more history than we do.
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |