delorie.com/archives/browse.cgi   search  
Mail Archives: opendos/2001/03/12/03:36:14

Message-ID: <01FD6EC775C6D4119CDF0090273F74A4021FD0@emwatent02.meters.com.au>
From: "da Silva, Joe" <Joe DOT daSilva AT emailmetering DOT com>
To: "'opendos AT delorie DOT com'" <opendos AT delorie DOT com>
Subject: RE: Text file format .ASC ? (#2.2)
Date: Mon, 12 Mar 2001 19:34:37 +1100
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id DAA31825
Reply-To: opendos AT delorie DOT com

Thanks for the extra info., Matthias.

BTW, IBM seems to be doing some open-source work on Unicode
support for "various" platforms. The following should be quite useful ... :

http://oss.software.ibm.com/developerworks/opensource/icu/project/charset/in
dex.html

Joe.

> -----Original Message-----
> From:	Matthias Paul [SMTP:Matthias DOT Paul AT post DOT rwth-aachen DOT de]
> Sent:	Saturday, 10 March 2001 11:17
> To:	opendos AT delorie DOT com
> Subject:	Re: Text file format .ASC ? (#2.1)
> 
> On 2001-03-09, Joe da Silva asked:
> 
> >When you say "the first range is ... 40h-7Eh", do you mean
> >these codings don't have the Roman characters, etc. in the
> >usual place (ala. ASCII)? In other words, if they support
> >Roman characters at all, they only have a two byte coding for
> >them?
> 
> Well, Arkady has solved the mystery already, but FWIW
> I still want to answer your question:
> 
> Those ranges were meant as maximum extents. According to
> William Spencer Hall (Novell) in his article "Internationalizing
> Windows Software" (from "Microsoft Windows 3.1 Developer´s
> Workshop", Microsoft Press, 1993, ISBN 1-55615-480-1), which
> also gives a very good general description of I18N issues for DOS
> at both, developer and user level, some Code Pages might actually
> have a window below 128.
> For *common* DBCS Code Pages the range for the Lead Byte is above
> 127, so the 7-bit ASCII part is not changed for them (although my own
> experience is that sometimes they have non-ASCII characters in the
> non-alphabetic and non-numeric Code Points). Here are a few examples:
> 
>  Codepage - Lead Byte Range - Trail Byte Range
> 
>  932 - 81h..9Fh, E0h..FCh - 40h..7Eh, 80h..FCh
>  936 - A1h..A9h, B0h..F7h - A1h..FEh
>  949 - A1h..ACh, B0h..C8h, CAh..FDh - A1h..FEh
>  950 - A1h..C6h, C9h..F9h - 40h..7Eh, 80h..FEh
> 
> (from Nadine Kano´s "Developing International Software for
> Windows 95 and Windows NT", Microsoft Press, 1995,
> ISBN 1-55615-840-8, superseeding "International Handbook",
> MS, 1991, and "Developing International Software for
> Microsoft Windows", MS, 1995).
> 
> But even if the above mentioned Code Pages leave 7-bit ASCII
> unchanged, they usually duplicate the Roman letters as double-byte
> characters: Like most of the other double-byte characters, these
> alternative characters are displayed in doubled width by the front-end.
> Some DBCS Code Pages also contain Greek and other characters.
> 
> The most complete and very recommendable reference on the topic
> I have seen so far, is "CJKV Information Processing - Chinese,
> Japanese, Korean & Vietnamese Computing" by Ken Lunde,
> O´Reilly Associates, 1999, ISBN 1-56592-224-7 (superseeding
> his "Understanding Japanese Information Processing", ORA, 1993).
> It contains an long list of DBCS, TBCS, and MBCS Code Pages 
> (with glyphs!) and associated standards.
> For those interested, another - more formal - documentation is
> "Character Data Representation Architecture Reference and Registry
> (CDRA) level 2 + Extension papers", IBM, 1995, SC09-2190-00
> (superseeding SC09-1391-00 and SC09-1391-01), containing the
> hugest list of Code Pages I have ever seen (unfortunately I miss the
> enclosed CD). Although from my own research in NLS issues I can
> say, it is still far from being complete. IBM does not seem to have
> a more recent publication on the subject available at the moment,
> but I have heard that they already have updated internal drafts, so
> it seems it´s just a matter of time...
> 
>  Matthias
> 
> ------------------------------------------------------------
> Matthias Paul, Ubierstrasse 28, D-50321 Bruehl, Germany
> <Matthias DOT Paul AT post DOT rwth-aachen DOT de> <mpaul AT drdos DOT org>
> http://www.uni-bonn.de/~uzs180/mpdokeng.html
> ------------------------------------------------------------
> My homepage has moved, please update your pointers.
> 
> 

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019