delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/06/06/08:22:19

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-1.7 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
In-Reply-To: <200906051625.n55GP6t3028411@mail.bln1.bf.nsn-intra.net>
References: <20090512165404 DOT GW21324 AT calimero DOT vinschen DOT de> <416096c60905120956n5521929bm69586f5e6325a994 AT mail DOT gmail DOT com> <20090512173153 DOT GY21324 AT calimero DOT vinschen DOT de> <3f0ad08d0905140858j17c7b374paa649f18ef18178d AT mail DOT gmail DOT com> <200905201652 DOT n4KGqYGm000509 AT mail DOT bln1 DOT bf DOT nsn-intra DOT net> <200906051625 DOT n55GP6t3028411 AT mail DOT bln1 DOT bf DOT nsn-intra DOT net>
Date: Sat, 6 Jun 2009 21:21:51 +0900
Message-ID: <3f0ad08d0906060521w13c096bcw570436a2c3c9f2b3@mail.gmail.com>
Subject: Re: [Fwd: [1.7] wcwidth failing configure tests]
From: IWAMURO Motonori <deenheart AT gmail DOT com>
To: newlib AT sourceware DOT org, cygwin AT cygwin DOT com
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

# Continuation of discussion.
#
# I hope that all the applications work correctly only by setting
"LANG=ja_JP.UTF-8".
# I don't hope that I give up the use of the binary packages and that
I keep applying many local patches.


> I don't think that it is the good idea because:
>
> - It is "a cygwin-specific solution (or workaround)".
> - In NetBSD, the change to which wcwidth of East Asian Ambiguous Characters returns 2 by CJK locale is planned.

- and, I don't think that I need make special cases give priority more
than general cases.

>> - I heard that there is an existing implementation that behave like my
>> proposal. (Sorry, I didn't hear the system name.)
> Even if so, I think the way I described is more compatible with the locale
> mechanism as used elsewhere.

I think that ALL locale implementations should treat East Asian
Ambiguous Character Width as 2 for CJK locale.

>> It is no problem because we -- most Japanese language users -- need
>> not change the settings of mintty and locale after first setup.
>> We set LANG=ja_JP.UTF-8 and select a Japanese font for mintty.
> In any case, mined running in mintty will detect CJK width itself,
> regardless of locale setting, with coming versions of both programs
> even when it gets changed on-the-fly :)

Sorry, I can't understand above because I am not good at English.

> This sounds complicated.

I don't think so. I think that we should consider the following issues
if a new mechanism is introduced.

The existing locale / terminal API don't support:
- Unicode BiDi.
- Unicode control characters.
- Unicode combining characters.
- Multilingualization. (*)
- Detect font/fontset information selected with terminal emulator.
(including, need to consider the case of no-tty)

* Now, we can't use Japanese, Chinese, and Korean at the same time
even if we use Unicode.
  Because many font glyphs are quite different even if the code point
is the same in each language.

> With my proposal, an application that wishes to auto-adjust on width
> properties (maybe even when changing) and which (unlike mined) uses
> the system wcwidth functions could proceed as follows:
> * Detect CJK width by using a simple test string width detection.
> * (Optional) When receiving a SIGWINCH signal (future version of MinTTY),
>  repeat this detection.
> * If e.g. LC_CTYPE starts with "ja_JP.UTF-8", call setlocale with
>  either "ja_JP DOT UTF-8 AT cjkwidth" or "ja_JP.UTF-8".

How to detect it? The application using wcwidth is not necessarily
executed with terminal emulator. (e.g. text formatter)

>> > I'm not happy with the idea of a cygwin-specific solution (or workaround).
>> I think that it is not cygwin-specific solution.
> As I tried to suggest above, using "UTF-8" for different width data on one
> system would be quite specific, using the "@" modifier syntax would not.

"UTF-8" is only an encoding scheme. It does not specify the character width.
-- 
IWAMURO Motnori <http://vmi.jp/>

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019