Mail Archives: djgpp/2002/02/18/08:31:58
On 18 Feb 2002, Thomas Mueller wrote:
> > A file might have parts in Latin-1 and other parts not in Latin-1.
>
> Eli Zaretskii:
>
> > Only if it's a garbled file. A file can only be encoded one way,
> > anything else is random 8-bit bytes.
>
> Suppose two or more email messages are concatenated in one file? Suddenly some
> parts are in Latin-1 and others not, and there may even be some Korean and
> Chinese spam mixed in.
That's what I call a garbled file. In such a file, when you see a byte
with a code of, say, 161 decimal--how do you interpret it? 161 means one
thing in Latin-1, but something different in cp437, and something else in
cp850. Unless the file has some meta-information in it, saying how
which part is encoded, there is no way you could display the text
correctly.
> I think the HELLO file contains several different character sets in the same
> file with (extended?) ANSI escape sequences to switch between them.
Yes, but you need to encode the file with those escape sequences to be
able to mix different languages in one file. ISO-2022, the encoding used
for the HELLO file, can do that, as well as Unicode-based encodings such
as UTF-8. Latin-1 and friends cannot.
- Raw text -