X-Authentication-Warning: delorie.com: mailnull set sender to djgpp-bounces using -f
Date: Mon, 18 Feb 2002 15:29:40 +0200 (IST)
From: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
X-Sender: eliz AT is
To: Thomas Mueller <tmueller AT bluegrass DOT net>
cc: djgpp AT delorie DOT com
Subject: Re: GNU Emacs DOS (DJGPP) port converts upper-ASCII characters to ASCII 127
In-Reply-To: <a4qpm6$2bgmk$3@ID-49635.news.dfncis.de>
Message-ID: <Pine.SUN.3.91.1020218152515.5449A-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Reply-To: djgpp AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: djgpp AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com
Precedence: bulk


On 18 Feb 2002, Thomas Mueller wrote:

> > A file might have parts in Latin-1 and other parts not in Latin-1.
> 
> Eli Zaretskii:
> 
> > Only if it's a garbled file.  A file can only be encoded one way,
> > anything else is random 8-bit bytes.
> 
> Suppose two or more email messages are concatenated in one file?  Suddenly some
> parts are in Latin-1 and others not, and there may even be some Korean and
> Chinese spam mixed in.

That's what I call a garbled file.  In such a file, when you see a byte 
with a code of, say, 161 decimal--how do you interpret it?  161 means one 
thing in Latin-1, but something different in cp437, and something else in 
cp850.  Unless the file has some meta-information in it, saying how 
which part is encoded, there is no way you could display the text 
correctly.

> I think the HELLO file contains several different character sets in the same
> file with (extended?) ANSI escape sequences to switch between them.

Yes, but you need to encode the file with those escape sequences to be 
able to mix different languages in one file.  ISO-2022, the encoding used 
for the HELLO file, can do that, as well as Unicode-based encodings such 
as UTF-8.  Latin-1 and friends cannot.