X-Authentication-Warning: delorie.com: mailnull set sender to djgpp-bounces using -f Date: Mon, 18 Feb 2002 15:29:40 +0200 (IST) From: Eli Zaretskii X-Sender: eliz AT is To: Thomas Mueller cc: djgpp AT delorie DOT com Subject: Re: GNU Emacs DOS (DJGPP) port converts upper-ASCII characters to ASCII 127 In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Reply-To: djgpp AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: djgpp AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk On 18 Feb 2002, Thomas Mueller wrote: > > A file might have parts in Latin-1 and other parts not in Latin-1. > > Eli Zaretskii: > > > Only if it's a garbled file. A file can only be encoded one way, > > anything else is random 8-bit bytes. > > Suppose two or more email messages are concatenated in one file? Suddenly some > parts are in Latin-1 and others not, and there may even be some Korean and > Chinese spam mixed in. That's what I call a garbled file. In such a file, when you see a byte with a code of, say, 161 decimal--how do you interpret it? 161 means one thing in Latin-1, but something different in cp437, and something else in cp850. Unless the file has some meta-information in it, saying how which part is encoded, there is no way you could display the text correctly. > I think the HELLO file contains several different character sets in the same > file with (extended?) ANSI escape sequences to switch between them. Yes, but you need to encode the file with those escape sequences to be able to mix different languages in one file. ISO-2022, the encoding used for the HELLO file, can do that, as well as Unicode-based encodings such as UTF-8. Latin-1 and friends cannot.