delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2017/07/28/15:58:50

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:to:subject:message-id:reply-to
:references:mime-version:content-type:in-reply-to; q=dns; s=
default; b=I4WQWeQAxymZeXzvTO1Vejs39LOrPgAqhSCtKjLroMVvxm2xFhDu6
tzVRYqZ8/NQj67R4B7NAGSs+gKMRL1WN+gU8XPjpGBObSGguMY72dSyKMem+JMtj
Y3EF93RIF/RX6r4g1iOkSoo5AS08336gT8jO48RWVVV0JXMPY0xETY=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:to:subject:message-id:reply-to
:references:mime-version:content-type:in-reply-to; s=default;
bh=pyx/ruFzXyWwFahuj+yppizqOwE=; b=tRX40Snoh8UR/dYsdwsoFSv8yyM2
7Ngnh9bOqn+1OJaV82N007Myn8dUb7k2lePW406mcqfS+k1npBSNRs+HCvOE7Rrl
td87YTSpKN3HXauaxanuQ0lgTqiz6X6zd0kzU+rE14QAaF+KaeW6XDqSlgStXa+c
G1xniYu5stp9bZE=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-101.9 required=5.0 tests=AWL,BAYES_00,GOOD_FROM_CORINNA_CYGWIN,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=highly, H*c:application
X-HELO: drew.franken.de
Date: Fri, 28 Jul 2017 21:58:26 +0200
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: Unicode width data inconsistent/outdated
Message-ID: <20170728195826.GI24013@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <f3c1b415-7a26-8bbe-a67f-5619d356f058 AT towo DOT net> <20170726080859 DOT GA24312 AT calimero DOT vinschen DOT de> <5d3cb047-49f8-26a6-d816-387a71486e99 AT cygwin DOT com> <20170726095016 DOT GA25666 AT calimero DOT vinschen DOT de> <289bd98b-e644-888d-07f8-8965b6538373 AT towo DOT net>
MIME-Version: 1.0
In-Reply-To: <289bd98b-e644-888d-07f8-8965b6538373@towo.net>
User-Agent: Mutt/1.8.3 (2017-05-23)

--uAgJxtfIS94j9H4T
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Jul 26 23:43, Thomas Wolff wrote:
> Am 26.07.2017 um 11:50 schrieb Corinna Vinschen:
> > On Jul 26 03:16, Yaakov Selkowitz wrote:
> > > On 2017-07-26 03:08, Corinna Vinschen wrote:
> > > > On Jul 26 08:49, Thomas Wolff wrote:
> > > > > It would be good to keep wcwidth/wcswidth in sync with the instal=
led
> > > > > Unicode data version (package unicode-ucd).
> > > > > Currently it seems to be hard-coded (in newlib/libc/string/wcwidt=
h.c);
> > > > > it refers to Unicode 5.0 while installed Unicode data suggest 9.0=
 would
> > > > > be used.
> > > > > I can provide some scripts to generate the respective tables if d=
esired.
> > > > > Thomas
> > > > If you can update the newlib files this way and send matching patch=
es
> > > > to the newlib list, this would be highly appreciated.
> > > Thomas, I just updated unicode-ucd to 10.0 for this purpose.
> Thanks.
> >=20
> > Oh, and, btw, the comment in wcwidth.c isn't quite correct.  The
> > cwstate in newlib is on Unicode 5.2, see newlib/libc/ctype/towupper.c.
> Oh, a number of other embedded tables. To make the tow* and isw* functions
> more easily adaptable to Unicode updates, there will be some revisions to=
 do
> here. And the to* and is* ones (without 'w') even refer to locales in a w=
ay
> I do not understand. Maybe I'll restrict my effort to wcwidth first...

The to* and is* ones (without 'w') don't matter at all and you don't
have to touch them.

The Unicode stuff only affects the tow and isw functions.

As for how to fetch the data, you may want to have a look into
newlib/libc/ctype/utf8alpha.h and newlib/libc/ctype/utf8print.h.  The
header comments contain the awk scripts used to collect the data.

All other isw* files like iswblank.c contain comments explaining
what Unicode character categories are covered.


Corinna

--=20
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

--uAgJxtfIS94j9H4T
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJZe5diAAoJEPU2Bp2uRE+gBZEP/iOUteETH9mpUB2Z+X4nAf2W
2kFw7SJkKj2SiMEZvFc05jHQqMgolFq0aw/guyP9lON8/nwEc6XzTOZCohQRojr8
Tqxof2+Bu2+bokWllY67yqsj3gMilNRrYARba3cMJBi2R1y4rmYbZv7xpmIdrpLY
qGtyRngq3cY4jBv8IDeU3EEs+g609pTGCvy7BeC06jqFvWlY5WsS/FAjhZoBDVrp
U0noUR4sm8iVuCQfPlDtJ2HTGgjsqo5bGQ1zgOo4hm2OLW/F5mADZFuOL718kLum
chjJT+RaG9nD0uLnrvLnjGXLLZ7J7p29aLYuPLp5Pect47ojNjHJyVoo/ag6THYq
d6LO0burPZ46nxKqfsc5c0h4mSdc4bZey7IAatIal1ZX/M7AJdQGydV4LIpU4HVx
lyF6KWrwHSNjRUAEM3u5juspbeYt50z+9r+hIkYx4bD5nyJtxg1hawr/Qml9pU7L
lWzemVviGqoIv16z+Wpkktl2B87bBfheYHlYlNM2ZDizRjrbYKiahXRY/L+0fHeF
kokDs0BC5do/8RG04o/Iyj8a4E2cUvr26cXhS4KpR98R7e0KJKdiaEz0WhPTlW9F
fo3rWJl0X4OsD+4JdlNxGl0xtMjA1eWLbs5NDWqi6zJFu6roIze20HLpt2ECoZEO
SZ2PJB85E+RE/0unKl2+
=ZTde
-----END PGP SIGNATURE-----

--uAgJxtfIS94j9H4T--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019