X-Authentication-Warning: delorie.com: mailnull set sender to djgpp-bounces using -f From: "Thomas Mueller" Newsgroups: comp.os.msdos.djgpp Subject: Re: GNU Emacs DOS (DJGPP) port converts upper-ASCII characters to ASCII 127 Date: 20 Feb 2002 17:32:52 GMT Lines: 30 Message-ID: References: <5567-Sat16Feb2002190140+0200-eliz AT is DOT elta DOT co DOT il> NNTP-Posting-Host: dial3-131.bluegrass.net (208.147.34.131) Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: fu-berlin.de 1014226372 4022932 208.147.34.131 (16 [49635]) X-Mailer: NOS-BOX 2.05 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com from my earlier post >> and Eli Zaretskii > : > > Suppose two or more email messages are concatenated in one file? Suddenly some > > parts are in Latin-1 and others not, and there may even be some Korean and > > Chinese spam mixed in. > That's what I call a garbled file. In such a file, when you see a byte > with a code of, say, 161 decimal--how do you interpret it? 161 means one > thing in Latin-1, but something different in cp437, and something else in > cp850. Unless the file has some meta-information in it, saying how > which part is encoded, there is no way you could display the text > correctly. Still, a few big files look preferable to a lot of small files. Charset can be determined, ideally, from individual message headers, but that part is often missing, especially in Usenet. Some Unix-based mail programs can help sort into separate categories, one category to a file or directory (I don't really like the word "folder"). > > I think the HELLO file contains several different character sets in the same > > file with (extended?) ANSI escape sequences to switch between them. > Yes, but you need to encode the file with those escape sequences to be > able to mix different languages in one file. ISO-2022, the encoding used > for the HELLO file, can do that, as well as Unicode-based encodings such > as UTF-8. Latin-1 and friends cannot. Anybody think they can persuade Korean and Chinese spammers to adopt ISO-2022? I laugh as I type this.