delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/05/13/15:40:49

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Wed, 13 May 2009 21:38:16 +0200
From: Corinna Vinschen <vinschen AT redhat DOT com>
To: newlib AT sourceware DOT org, cygwin AT cygwin DOT com
Subject: Re: [Fwd: [1.7] wcwidth failing configure tests]
Message-ID: <20090513193816.GA7650@calimero.vinschen.de>
Reply-To: newlib AT sourceware DOT org
Mail-Followup-To: newlib AT sourceware DOT org, cygwin AT cygwin DOT com
References: <20090512165404 DOT GW21324 AT calimero DOT vinschen DOT de> <416096c60905120956n5521929bm69586f5e6325a994 AT mail DOT gmail DOT com> <20090512173153 DOT GY21324 AT calimero DOT vinschen DOT de> <416096c60905131204r473ac1d3t4c811f7f0a4cb81f AT mail DOT gmail DOT com>
MIME-Version: 1.0
In-Reply-To: <416096c60905131204r473ac1d3t4c811f7f0a4cb81f@mail.gmail.com>
User-Agent: Mutt/1.5.19 (2009-02-20)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On May 13 20:04, Andy Koppe wrote:
> 2009/5/12 Corinna Vinschen:
> >> Trouble is, there's the thorny issue of the "CJK Ambiguous Width"
> >> category of characters, which consists of things like Greek and
> >> Cyrillic letters as well as line drawing symbols. Those have a width
> >> of 1 in Western use, yet with CJK fonts they have a width of 2. That's
> >> why Markus Kuhn's code includes the mk_wcswidth_cjk() variant.
> >
> > We should use the standard variation alone, imho.
> 
> I'm not sure that CJK users would be happy with that. See MinTTY issue
> 88 for my misguided attempts to dismiss this as a legacy issue:
> http://code.google.com/p/mintty/issues/detail?id=88
> 
> In comment 8 on that, "deenheart" mentioned that he was working on a
> fix for wcwidth(). I don't know what he had in mind, but I'd suspect
> something based on an environment variable setting.
> 
> > And we need some workaround for UTF-16 systems like Cygwin.
> > Unfortunately, surrogate pairs only work well as part of a string, not
> > as standalone chars.  So wcwidth would return -1 for each single char,
> > but wcswidth could be tweaked to handle them gracefully.
> 
> Looking at the ranges in wcwidth.c, it might be possible to decide the
> width of a surrogate pair based on the high surrogate only, and then
> treat the low surrogate as a combining character with length 0.

How should that work?  The first half of the surrogate pair has not
enough information to decide that.  For instance, take the ranges
0x10A01, 0x10A03 }, { 0x10A05, 0x10A06 }.  The information about the low
10 bits of the Unicode value is in the second half of the pair.  From
the first half you don't know if the char is perhaps the 0x10A04 value
or one of the other.  So you need both halves to make a decision.

A surrogate pair half alone is also always invalid.  That's something
you can't handle in wcwidth.


Corinna

-- 
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019