delorie.com/archives/browse.cgi   search  
Mail Archives: opendos/2001/03/08/22:54:59

Message-ID: <01FD6EC775C6D4119CDF0090273F74A4021FC5@emwatent02.meters.com.au>
From: "da Silva, Joe" <Joe DOT daSilva AT emailmetering DOT com>
To: "'opendos AT delorie DOT com'" <opendos AT delorie DOT com>
Subject: RE: Text file format .ASC ? (#2.1)
Date: Fri, 9 Mar 2001 12:59:39 +1100
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id WAA29850
Reply-To: opendos AT delorie DOT com

Thanks, Matthias.

Yes, I had been looking at information about S-JIS and other
codings, at "ftp://ftp.ifcss.org/pub/software/info/cjk-codes/",
which I found very confusing, but didn't seem to fit the file
data.

When you say "the first range is ... 40h-7Eh", do you mean
these codings don't have the Roman characters, etc. in the
usual place (ala. ASCII)? In other words, if they support
Roman characters at all, they only have a two byte coding for
them?

Arkady has now sent a message saying this is Cyrillic text,
so I'm making progress!   :-)

Incidentally, I'm not too worried about finding a suitable
DBCS front-end ... I'm sure this must exist or else a
program with native support for this character set must
exist - the hardest part was identifying the language and
character encoding.

As for "being able to read XYZ [language]" ... it's not necessary
for _me_ to be able to read this, as long as I can find someone
else that can!  <g>

My final option is perhaps to translate the file to Unicode,
then I think (don't know for sure, just yet) I can use Systran
via "babel.altavista.com" or "world.altavista.com", to translate
this to English ...

Regards,
Joe.

> -----Original Message-----
> From:	Matthias Paul [SMTP:Matthias DOT Paul AT post DOT rwth-aachen DOT de]
> Sent:	Thursday, 8 March 2001 17:19
> To:	opendos AT delorie DOT com
> Subject:	Re: Text file format .ASC ? (#2)
> 
> On 2001-03-08, Joe da Silva wrote:
> 
> > That's the problem here - all those other text formats I've found
> > seem to retain the first 128 characters and do strange things
> > with the upper 128 codes. This one doesn't - it seems to use
> > just the upper case letters and other characters below about
> > 96 ($60), which to me suggests some non-Roman language,
> > in which the Roman letters are of secondary importance ...
> 
> Just a guess, but could it be that your file is encoded in one
> of these DBCS Code Pages (like Shift-JIS) as used in Asia,
> so that it could be a mixed representation of one-byte and
> two-byte characters? If the first byte is within one of usually
> two ranges it opens a window into a set of 256 characters
> which are addressed by the following byte. Each 1st byte
> within these ranges opens a different window, so you can
> have thousands of characters in one codepage, and still have
> short representations for US-ASCII (which, however, is
> normally used only for Western names and similar stuff, so
> it would make sense that you can still see some strings that
> look familiar like "PCnnn").
> Usually the first range is located *somewhere* between
> 40h..7Eh and the second range between 80h..FCh, but the
> actual count of ranges, their location, and extend depends
> on the Country and Code Page settings of the system (under
> DOS defined by the DBCS strings in COUNTRY.SYS).
> Unfortunately, there are would be plenty of DBCS Code Pages
> to try... However, without a DBCS frontend you wonīt be able
> to display such a file. But even if you would load such drivers,
> if you donīt read Japanese, Chinese, Korean, or the like,
> you wonīt be able to understand the contents, anyway...
> 
> Well, not exactly, what I would call .ASC, but
> who knows... Do you know how old this file is?
> Where do you got it from originally?
> 
>  Matthias
> 
> BTW. Most of these DBCS Code Pages also contain
> sets of Roman, Greek, and other characters used in
> Western languages.
> 
> ------------------------------------------------------------
> Matthias Paul, Ubierstrasse 28, D-50321 Bruehl, Germany
> <Matthias DOT Paul AT post DOT rwth-aachen DOT de> <mpaul AT drdos DOT org>
> http://www.uni-bonn.de/~uzs180/mpdokeng.html
> ------------------------------------------------------------
> My homepage has moved, please update your pointers.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright Đ 2019   by DJ Delorie     Updated Jul 2019