Mail Archives: opendos/2001/03/08/22:54:59
Thanks, Matthias.
Yes, I had been looking at information about S-JIS and other
codings, at "ftp://ftp.ifcss.org/pub/software/info/cjk-codes/",
which I found very confusing, but didn't seem to fit the file
data.
When you say "the first range is ... 40h-7Eh", do you mean
these codings don't have the Roman characters, etc. in the
usual place (ala. ASCII)? In other words, if they support
Roman characters at all, they only have a two byte coding for
them?
Arkady has now sent a message saying this is Cyrillic text,
so I'm making progress! :-)
Incidentally, I'm not too worried about finding a suitable
DBCS front-end ... I'm sure this must exist or else a
program with native support for this character set must
exist - the hardest part was identifying the language and
character encoding.
As for "being able to read XYZ [language]" ... it's not necessary
for _me_ to be able to read this, as long as I can find someone
else that can! <g>
My final option is perhaps to translate the file to Unicode,
then I think (don't know for sure, just yet) I can use Systran
via "babel.altavista.com" or "world.altavista.com", to translate
this to English ...
Regards,
Joe.
> -----Original Message-----
> From: Matthias Paul [SMTP:Matthias DOT Paul AT post DOT rwth-aachen DOT de]
> Sent: Thursday, 8 March 2001 17:19
> To: opendos AT delorie DOT com
> Subject: Re: Text file format .ASC ? (#2)
>
> On 2001-03-08, Joe da Silva wrote:
>
> > That's the problem here - all those other text formats I've found
> > seem to retain the first 128 characters and do strange things
> > with the upper 128 codes. This one doesn't - it seems to use
> > just the upper case letters and other characters below about
> > 96 ($60), which to me suggests some non-Roman language,
> > in which the Roman letters are of secondary importance ...
>
> Just a guess, but could it be that your file is encoded in one
> of these DBCS Code Pages (like Shift-JIS) as used in Asia,
> so that it could be a mixed representation of one-byte and
> two-byte characters? If the first byte is within one of usually
> two ranges it opens a window into a set of 256 characters
> which are addressed by the following byte. Each 1st byte
> within these ranges opens a different window, so you can
> have thousands of characters in one codepage, and still have
> short representations for US-ASCII (which, however, is
> normally used only for Western names and similar stuff, so
> it would make sense that you can still see some strings that
> look familiar like "PCnnn").
> Usually the first range is located *somewhere* between
> 40h..7Eh and the second range between 80h..FCh, but the
> actual count of ranges, their location, and extend depends
> on the Country and Code Page settings of the system (under
> DOS defined by the DBCS strings in COUNTRY.SYS).
> Unfortunately, there are would be plenty of DBCS Code Pages
> to try... However, without a DBCS frontend you wonīt be able
> to display such a file. But even if you would load such drivers,
> if you donīt read Japanese, Chinese, Korean, or the like,
> you wonīt be able to understand the contents, anyway...
>
> Well, not exactly, what I would call .ASC, but
> who knows... Do you know how old this file is?
> Where do you got it from originally?
>
> Matthias
>
> BTW. Most of these DBCS Code Pages also contain
> sets of Roman, Greek, and other characters used in
> Western languages.
>
> ------------------------------------------------------------
> Matthias Paul, Ubierstrasse 28, D-50321 Bruehl, Germany
> <Matthias DOT Paul AT post DOT rwth-aachen DOT de> <mpaul AT drdos DOT org>
> http://www.uni-bonn.de/~uzs180/mpdokeng.html
> ------------------------------------------------------------
> My homepage has moved, please update your pointers.
- Raw text -