X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Date: Fri, 5 Jun 2009 18:25:06 +0200 (CEST) Message-Id: <200906051625.n55GP6t3028411@mail.bln1.bf.nsn-intra.net> From: Thomas Wolff To: newlib AT sourceware DOT org, cygwin AT cygwin DOT com Subject: Re: [Fwd: [1.7] wcwidth failing configure tests] References: <20090512165404 DOT GW21324 AT calimero DOT vinschen DOT de> <416096c60905120956n5521929bm69586f5e6325a994 AT mail DOT gmail DOT com> <20090512173153 DOT GY21324 AT calimero DOT vinschen DOT de> <3f0ad08d0905140858j17c7b374paa649f18ef18178d AT mail DOT gmail DOT com> <200905201652 DOT n4KGqYGm000509 AT mail DOT bln1 DOT bf DOT nsn-intra DOT net> Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com IWAMURO Motonori wrote: > 2009/5/21 Thomas Wolff : > >> > Therefore, I propose to use *_cjk() when the language part of LC_CTYPE > >> > is 'ja', 'ko', 'vi' or 'zh'. > > The problem with this is > > 1. As you say, there is no standard. > But, > - I think that my proposal doesn't violate any specification. I think it does. Part of the locale information is the "charmap" (called "codepage" on DOS/Windows). It may be implicit like with LC_CTYPE=zh_CN which defines "GB2312" as its charmap, but it is typically explicit like in en_US.UTF-8 - the intention is that the "codepage" information should be the same for all locales having thbe "UTF-8" (or any other) charmap. So you cannot freely change width information among locales with the same charmap. Also, if ja_JP.UTF-8 would mean "CJK width", how would you specify a working locale setting for a terminal that does not run a CJK width font but should yet use other Japanese settings? E.g. with rxvt which does not support CJK width. However, there is one resort within the locale mechanism that can be used; the locale syntax allows for an optional "modifier" which can be used to specify deviations, e.g. de_DE has charmap ISO-8859-1 de_DE AT euro has charmap ISO-8859-15 uz_UZ has charmap ISO-8859-1 uz_UZ AT cyrillic has charmap UTF-8 aa_ER and aa_ER AT saaho both have charmap UTF-8 (with some other difference). Thus you could define e.g. ja_JP DOT UTF-8 AT cjk or ja_JP DOT UTF-8 AT cjkwidth to indicate CJK width properties. I guess this is the most compliant way to go. > - I heard that there is an existing implementation that behave like my > proposal. (Sorry, I didn't hear the system name.) Even if so, I think the way I described is more compatible with the locale mechanism as used elsewhere. > > 2. If you wish to handle character widths compliant with the terminal > > ? your application is running in, there is no guarantee that your > > ? assumption of CJK width (or the actual locale setting if that model > > ? would be implemented) does indeed reflect the terminal's width properties. > Yes, I understand it, too. My proposal is completely workaround. > But it is the best solution because we have no specification/standard > for my wish. A well-chosen option like above, that stays within the described standard options, would be best accepted by other communities, I think, and could be established for this purpose. > > 3. In mintty, you can dynamically change width properties by selecting > > ? different fonts; mintty changes CJK width behaviour according to certain > > ? font properties. "static" configuration in your shell using a locale > > ? variable would not reflect this change > It is no problem because we -- most Japanese language users -- need > not change the settings of mintty and locale after first setup. > We set LANG=ja_JP.UTF-8 and select a Japanese font for mintty. In any case, mined running in mintty will detect CJK width itself, regardless of locale setting, with coming versions of both programs even when it gets changed on-the-fly :) > > ? b) Determine the actual CJK width behaviour dynamically. That's what > > ? ? ?mined does (in addition to other width property detection in general). > It is the best solution. I think that we need specify the following: > - the escape sequence about language context for terminal emulater. > -- setting language context > -- getting language context > -- getting capability of language context > (context is fixed, static or dynamic / acceptable languages) > - new multilingualized string/terminal API for terminal based applications. This sounds complicated. With my proposal, an application that wishes to auto-adjust on width properties (maybe even when changing) and which (unlike mined) uses the system wcwidth functions could proceed as follows: * Detect CJK width by using a simple test string width detection. * (Optional) When receiving a SIGWINCH signal (future version of MinTTY), repeat this detection. * If e.g. LC_CTYPE starts with "ja_JP.UTF-8", call setlocale with either "ja_JP DOT UTF-8 AT cjkwidth" or "ja_JP.UTF-8". The application would need to stay with the same locale prefix "ja_JP..." because there is no reasonable way to choose a completely different locale, which is another reason to just use the modifier suffix, rather than reserving the complete "ja_JP..." setting for CJK width. Advantage of this approach: The system does not have to care about this issue and can just follow the locale setting. > And, we need rewrite too many applications by new API. Well, alternatively, the system could follow the approach outlined above, but maybe that's not the proper level to do it (?) > > I'm not happy with the idea of a cygwin-specific solution (or workaround). > I think that it is not cygwin-specific solution. As I tried to suggest above, using "UTF-8" for different width data on one system would be quite specific, using the "@" modifier syntax would not. Kind regards, Thomas -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/