Message-ID: <01FD6EC775C6D4119CDF0090273F74A4021FC5@emwatent02.meters.com.au> From: "da Silva, Joe" To: "'opendos AT delorie DOT com'" Subject: RE: Text file format .ASC ? (#2.1) Date: Fri, 9 Mar 2001 12:59:39 +1100 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2448.0) Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id WAA29850 Reply-To: opendos AT delorie DOT com Thanks, Matthias. Yes, I had been looking at information about S-JIS and other codings, at "ftp://ftp.ifcss.org/pub/software/info/cjk-codes/", which I found very confusing, but didn't seem to fit the file data. When you say "the first range is ... 40h-7Eh", do you mean these codings don't have the Roman characters, etc. in the usual place (ala. ASCII)? In other words, if they support Roman characters at all, they only have a two byte coding for them? Arkady has now sent a message saying this is Cyrillic text, so I'm making progress! :-) Incidentally, I'm not too worried about finding a suitable DBCS front-end ... I'm sure this must exist or else a program with native support for this character set must exist - the hardest part was identifying the language and character encoding. As for "being able to read XYZ [language]" ... it's not necessary for _me_ to be able to read this, as long as I can find someone else that can! My final option is perhaps to translate the file to Unicode, then I think (don't know for sure, just yet) I can use Systran via "babel.altavista.com" or "world.altavista.com", to translate this to English ... Regards, Joe. > -----Original Message----- > From: Matthias Paul [SMTP:Matthias DOT Paul AT post DOT rwth-aachen DOT de] > Sent: Thursday, 8 March 2001 17:19 > To: opendos AT delorie DOT com > Subject: Re: Text file format .ASC ? (#2) > > On 2001-03-08, Joe da Silva wrote: > > > That's the problem here - all those other text formats I've found > > seem to retain the first 128 characters and do strange things > > with the upper 128 codes. This one doesn't - it seems to use > > just the upper case letters and other characters below about > > 96 ($60), which to me suggests some non-Roman language, > > in which the Roman letters are of secondary importance ... > > Just a guess, but could it be that your file is encoded in one > of these DBCS Code Pages (like Shift-JIS) as used in Asia, > so that it could be a mixed representation of one-byte and > two-byte characters? If the first byte is within one of usually > two ranges it opens a window into a set of 256 characters > which are addressed by the following byte. Each 1st byte > within these ranges opens a different window, so you can > have thousands of characters in one codepage, and still have > short representations for US-ASCII (which, however, is > normally used only for Western names and similar stuff, so > it would make sense that you can still see some strings that > look familiar like "PCnnn"). > Usually the first range is located *somewhere* between > 40h..7Eh and the second range between 80h..FCh, but the > actual count of ranges, their location, and extend depends > on the Country and Code Page settings of the system (under > DOS defined by the DBCS strings in COUNTRY.SYS). > Unfortunately, there are would be plenty of DBCS Code Pages > to try... However, without a DBCS frontend you wonīt be able > to display such a file. But even if you would load such drivers, > if you donīt read Japanese, Chinese, Korean, or the like, > you wonīt be able to understand the contents, anyway... > > Well, not exactly, what I would call .ASC, but > who knows... Do you know how old this file is? > Where do you got it from originally? > > Matthias > > BTW. Most of these DBCS Code Pages also contain > sets of Roman, Greek, and other characters used in > Western languages. > > ------------------------------------------------------------ > Matthias Paul, Ubierstrasse 28, D-50321 Bruehl, Germany > > http://www.uni-bonn.de/~uzs180/mpdokeng.html > ------------------------------------------------------------ > My homepage has moved, please update your pointers.