delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/06/27/05:49:10

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-1.4 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS,URIBL_RHS_DOB
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
In-Reply-To: <20090615084443.GO5039@calimero.vinschen.de>
References: <20090512165404 DOT GW21324 AT calimero DOT vinschen DOT de> <20090512173153 DOT GY21324 AT calimero DOT vinschen DOT de> <3f0ad08d0905140858j17c7b374paa649f18ef18178d AT mail DOT gmail DOT com> <200905201652 DOT n4KGqYGm000509 AT mail DOT bln1 DOT bf DOT nsn-intra DOT net> <200906051625 DOT n55GP6t3028411 AT mail DOT bln1 DOT bf DOT nsn-intra DOT net> <3f0ad08d0906060242t275a78e7tb9913bf78d1c5e83 AT mail DOT gmail DOT com> <200906121538 DOT n5CFcSld014997 AT mail DOT bln1 DOT bf DOT nsn-intra DOT net> <3f0ad08d0906140604y49c470eeu68c6c307ec1cd073 AT mail DOT gmail DOT com> <3f0ad08d0906140618w53c82556ye709c70efc1c65e0 AT mail DOT gmail DOT com> <20090615084443 DOT GO5039 AT calimero DOT vinschen DOT de>
Date: Sat, 27 Jun 2009 10:48:42 +0100
Message-ID: <416096c60906270248h300b9c1cv3a04c251c96414f8@mail.gmail.com>
Subject: Re: [PATCH] Add "@cjknarrow" modifier (was Re: [Fwd: [1.7] wcwidth failing configure tests])
From: Andy Koppe <andy DOT koppe AT gmail DOT com>
To: cygwin AT cygwin DOT com, newlib AT sourceware DOT org
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

2009/6/15 Corinna Vinschen:
>> > Define the default for ja, ko, and zh to use width =3D 2, with a
>> > @cjknarrow (or whatever) modifier to use width =3D 1.
>>
>> I think it is good idea.
>
> If everybody agrees to this suggestion, here's the patch. =C2=A0Tested
> with various combinations like
>
> =C2=A0LANG=3Dja_JP DOT UTF-8 AT cjknarrow
> =C2=A0LANG=3Dja_JP AT cjknarrow
> =C2=A0LANG=3Dja DOT UTF-8 AT cjknarrow
> =C2=A0LANG=3Dja AT cjknarrow

Apologies for harping on about this, especially as it was me who
suggested the @narrow scheme in the first place, but I do think this
is the wrong way to go.

MinTTY currenly ignores POSIX locales completely, so I've been
pondering how to deal with locales and codepages more properly. One
thing I'd like to do is to automatically set LANG depending on the
Windows locale and the codepage and font settings in MinTTY (if LANG
isn't set already, that is).

Trouble is, what do I do if a cjkwide font is selected, yet the
Windows locale is not East Asian? I can't just randomly stick the user
into one of the three CJK countries, because people don't always take
kindly to being put into the wrong country.

That could be addressed by adding the @cjkwide modifier for non-CJK
languages, as discussed previously, but then MinTTY would still need
to parse the language setting to decide which modifier (if any) needs
to be used. Having the @cjkwide modifier only, independent of the
selected language, would keep things much easier to use and explain.

And then there's the Linux compatibility angle, where ja_JP.UTF-8
means ambiguous width 1 not 2.

To try to help with changing this, here's some text for the user guide.

Replace this:
"Right now the language and territory, as well as the modifier, are
not important to Cygwin, except to fix a single problem. There's a
class of characters in the Unicode character set, called the "CJK
Ambiguous Width Character set". For these characters the width
returned by the wcwidth/wcswidth function is usually 1. This is often
a problem in East-Asian languages, which historically use character
sets in which these characters have a width of 2. Kind of explains why
they are called "ambiguous"...

The problem has been fixed for now like this. wcwidth/wcswidth usually
return 1 as the width of these characters. However, if the language is
specifed as "ja" (Japanese), "ko" (Korean), or "zh" (Chinese), wcwidth
returns 2 for these characters. Unfortunately this isn't correct in
all circumstances, so the user can specify the modifier "@cjknarrow",
which modifies the behaviour of wcwidth/wcswidth to return 1 for the
ambiguous width characters to return 1 even in those languages."

With this:
"Right now the language and territory are not important to Cygwin, but
the modifier is used to deal with the issue of "CJK Ambiguous Width"
characters. For these characters the width returned by the wcwidth
function is usually 1. This is often a problem in East Asian
languages, which historically use character sets in which these
characters have a width of 2. Kind of explains why they are called
"ambiguous"... . (See http://unicode.org/reports/tr11/ for a full
explanation.)

Therefore, if the modifier "@cjkwide" is specified, wcwidth returns 2
for these characters. For example, with jp_JP.UTF-8 their width is 1,
whereas with jp_JP DOT UTF-8 AT cjkwide it is 2."

Andy

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019