delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2002/02/20/12:46:32.1

X-Authentication-Warning: delorie.com: mailnull set sender to djgpp-bounces using -f
From: "Thomas Mueller" <tmueller AT bluegrass DOT net>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: GNU Emacs DOS (DJGPP) port converts upper-ASCII characters to ASCII 127
Date: 20 Feb 2002 17:32:52 GMT
Lines: 30
Message-ID: <a50mk3$3qokk$5@ID-49635.news.dfncis.de>
References: <Pine DOT SUN DOT 3 DOT 91 DOT 1020214133143 DOT 29251B-100000 AT is> <a4lnda$15r9i$2 AT ID-49635 DOT news DOT dfncis DOT de> <5567-Sat16Feb2002190140+0200-eliz AT is DOT elta DOT co DOT il> <a4nbj8$4da$1 AT samba DOT rahul DOT net> <a4qpm6$2bgmk$3 AT ID-49635 DOT news DOT dfncis DOT de>
NNTP-Posting-Host: dial3-131.bluegrass.net (208.147.34.131)
Mime-Version: 1.0
X-Trace: fu-berlin.de 1014226372 4022932 208.147.34.131 (16 [49635])
X-Mailer: NOS-BOX 2.05
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

from my earlier post >> and Eli Zaretskii > :

> > Suppose two or more email messages are concatenated in one file?  Suddenly some
> > parts are in Latin-1 and others not, and there may even be some Korean and
> > Chinese spam mixed in.

> That's what I call a garbled file.  In such a file, when you see a byte
> with a code of, say, 161 decimal--how do you interpret it?  161 means one
> thing in Latin-1, but something different in cp437, and something else in
> cp850.  Unless the file has some meta-information in it, saying how
> which part is encoded, there is no way you could display the text
> correctly.

Still, a few big files look preferable to a lot of small files.  Charset can be
determined, ideally, from individual message headers, but that part is often
missing, especially in Usenet.  Some Unix-based mail programs can help sort into
separate categories, one category to a file or directory (I don't really like
the word "folder").

> > I think the HELLO file contains several different character sets in the same
> > file with (extended?) ANSI escape sequences to switch between them.

> Yes, but you need to encode the file with those escape sequences to be
> able to mix different languages in one file.  ISO-2022, the encoding used
> for the HELLO file, can do that, as well as Unicode-based encodings such
> as UTF-8.  Latin-1 and friends cannot.

Anybody think they can persuade Korean and Chinese spammers to adopt ISO-2022?
I laugh as I type this.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright � 2019   by DJ Delorie     Updated Jul 2019