Mail Archives: cygwin/2017/08/08/04:22:44
X-Recipient: | archive-cygwin AT delorie DOT com
|
DomainKey-Signature: | a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
|
| :list-unsubscribe:list-subscribe:list-archive:list-post
|
| :list-help:sender:date:from:to:subject:message-id:reply-to
|
| :references:mime-version:content-type:in-reply-to; q=dns; s=
|
| default; b=UgMpadqySkqoWsvtUMIl3yXg/CdkJq7kwnT067WoUwKjztUzKcbzd
|
| 9Q+ZL1y6vwRnCz42r6b5J5DjB1R7KwE097XnUFhGadLbfOq5OuwU6m78JyGeDsna
|
| 2al0Lamh33UNPsxxJK7eApl4ysqpOEDaPo/Ib32fI1AJ5u1euMBTIM=
|
DKIM-Signature: | v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
|
| :list-unsubscribe:list-subscribe:list-archive:list-post
|
| :list-help:sender:date:from:to:subject:message-id:reply-to
|
| :references:mime-version:content-type:in-reply-to; s=default;
|
| bh=IJOOFkEuf5AzCqjISEQaZroqO1I=; b=Itei7h2hOcetuUGObwEepEivoXbZ
|
| sj/BIM8L9Pci3vjsfYrc37bsD8tEG8SeGrjhe0kkTGSdBm+uuI1HuiFAPvIrHO87
|
| 1OvpSlcGiL8PCwmubqIjLnt8zkmBm2ubPxUWlYtAMDWd6P83xykL9TkvgHhGqu20
|
| hKyEQIGJS3Togcc=
|
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm
|
List-Id: | <cygwin.cygwin.com>
|
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com>
|
List-Archive: | <http://sourceware.org/ml/cygwin/>
|
List-Post: | <mailto:cygwin AT cygwin DOT com>
|
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
|
Sender: | cygwin-owner AT cygwin DOT com
|
Mail-Followup-To: | cygwin AT cygwin DOT com
|
Delivered-To: | mailing list cygwin AT cygwin DOT com
|
Authentication-Results: | sourceware.org; auth=none
|
X-Virus-Found: | No
|
X-Spam-SWARE-Status: | No, score=-101.9 required=5.0 tests=AWL,BAYES_00,GOOD_FROM_CORINNA_CYGWIN,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=holiday, back-conversion, backconversion, nights
|
X-HELO: | drew.franken.de
|
Date: | Tue, 8 Aug 2017 10:22:20 +0200
|
From: | Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
|
To: | cygwin AT cygwin DOT com
|
Subject: | Re: Unicode width data inconsistent/outdated
|
Message-ID: | <20170808082220.GA13759@calimero.vinschen.de>
|
Reply-To: | cygwin AT cygwin DOT com
|
Mail-Followup-To: | cygwin AT cygwin DOT com
|
References: | <20170726080859 DOT GA24312 AT calimero DOT vinschen DOT de> <5d3cb047-49f8-26a6-d816-387a71486e99 AT cygwin DOT com> <20170726095016 DOT GA25666 AT calimero DOT vinschen DOT de> <289bd98b-e644-888d-07f8-8965b6538373 AT towo DOT net> <20170728195826 DOT GI24013 AT calimero DOT vinschen DOT de> <1244bd24-bb27-d185-1f24-61beae02c2cd AT towo DOT net> <20170804170156 DOT GL25551 AT calimero DOT vinschen DOT de> <30486790-c59d-9a78-6000-b3c20fb86d9d AT towo DOT net> <20170807092820 DOT GQ25551 AT calimero DOT vinschen DOT de> <3eb4ee2f-f62c-cb19-3e4b-10cc57852ba9 AT towo DOT net>
|
MIME-Version: | 1.0
|
In-Reply-To: | <3eb4ee2f-f62c-cb19-3e4b-10cc57852ba9@towo.net>
|
User-Agent: | Mutt/1.8.3 (2017-05-23)
|
--2oS5YaxWCcQjTEyO
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Aug 7 21:27, Thomas Wolff wrote:
> Am 07.08.2017 um 11:28 schrieb Corinna Vinschen:
> > On Aug 5 21:06, Thomas Wolff wrote:
> > > I have a working version now, and it uses much less as the category t=
able is
> > > range-based.
> > > Another table is needed for case conversion. Size estimates are as fo=
llows
> > > (based on Unicode 5.2 for a fair comparison, going up a little bit fo=
r 10.0
> > > of course):
> > >=20
> > > Categories: 2313 entries (10.0: 2715)
> > > each entry needs 9 bytes, total 20817 bytes
> > > I don't know whether that expands by some word-alignment.
> > > I could pack entries to 7 bytes, or even 6 bytes if that helps (total=
16191
> > > or 13878).
> > >=20
> > > Case conversion: 2062 entries (10.0: 2621)
> > > each entry needs 12 bytes, total 24744
> > > packed 8 bytes, total 16496
> > >=20
> > > The Categories table could be boiled down to 1223 entries (penalty: d=
ouble
> > > runtime for iswupper and iswlower)
> > > The Case conversion table could be transformed to a compact form
> > > Case conversion compact: 1201 entries
> > > each entry needs 16 bytes, total 19216
> > > packed 12 or 11 (or even 10), total 14412 (or 12010)
> > > So I think the increase is acceptable for the benefit of simple and
> > > automatic generation
> > So we're at 40K+ plus code then.
> No, if I implement the packed versions, it's 19.3K, so even smaller the
> currently.
Apparently I added up wrongly.
> > > I had noticed meanwhile that this is not active in Cygwin, but it's b=
roken
> > > anyway for multiple reasons:
> > > * platforms for which wchar_t is not Unicode should be explicitly=
listed
> > > * if used, the transformation needs to be applied to all non-Unic=
ode
> > > locales (also Chinese, Korean, and even 8-bit locales such as *.CP125=
2)
> > > * for towupper and towlower, the result must be back-transformed =
into the
> > > respective locale encoding
> > > * particulary the locale-specific _l functions inconsistently do =
not use
> > > the transformation but have this note:
> > No, no, no. The functionality is restricted to certain use-cases and
> > always was. It was a paid-for customer extension back in the day and it
> > was *sufficient* for the use-cases. It's not clear how many newlib
> > users are still using it, but it's not a good idea to remove it without
> > checking first. That means, ask on the newlib mailing list how many are
> > using the historical jp2uc code, and if we don't get a reply within,
> > say, a month, we can probably nuke it.
> OK, let's make such a request after holiday time.
> But, even if this shall persist as a special solution, it's still broken =
and
> should be fixed.
> Can we then substitute the current table with calling the iconvdata
> functions? In that case, as I said, the back-conversion would be available
> too, and I could fix that and add the missing handling of the _l function=
s,
> for a consistent solution.
I'm not quite sure I follow. Do you mean, iconvdata tables for the
three japanese codesets only? Wouldn't that mean to convert the
multibyte stuff into unicode and vice versa, basically getting rid
of the jp2uc workaround?
After a night's sleep, that might actually be the best way anyway. I
agree that the jp2uc workaround is a bit of a hack. Well, not a bit.
However, give that this does not affect Cygwin, we should really discuss
this on the newlib list.
Thanks,
Corinna
--=20
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
--2oS5YaxWCcQjTEyO
Content-Type: application/pgp-signature; name="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAEBCAAGBQJZiXS8AAoJEPU2Bp2uRE+g0F0P/2+Uk7h7yFSzTRVyHjQYhGos
IwXx1r7ZRdyWlJVjc/bd7L7+jdB9sFkXu2UEDEdEg3ogCWP3o7padWchL2/KfGSE
/WuLawGX1VHPlqkSNs0Zaz3OmplIMf70pUliyTm/UdLFIeBoUvo2wHTp9iaDlrCK
ylXNLrs4WSHyCY7jodWYMM1VjJQSwuZOiQgDVns5z/fEi1/XLk3oOdDrPoSiPytI
fqPtDJpxAh23oAd/rflqFDz3VAbqzlYHKUeFVFCdIgHcuqwhtShQhey32zrJP4T5
aAQVb9zHsJbpyUYRM+4KoHNaL0WkVk/ggCeEmYBiccqW8fCCCU4vRLlHAxEu+bzN
P8dI94oi6xCGelgnJqVAEXdqikZ0c1gJxLtauMxkz4+81EwHtKOz8BE3nsZoIw83
/hOA1TIA1YTzrSBWVSW2HVe6CqCgfDdewovgw2oIVVq1cq3PLqORdwLeKiQnQo22
6mX/yWsxC4iWYTmpBBOTP5D0/d4mpcm4l/omXCqfCVMnGvusFTzOWrDn1j5nKdpN
xAEIriwa463Ta6ZwBigRDqBZd2VBstBcP1ZdE/GzTPDiZkrqWO8o5g45VcqZbJmx
bJX4C96Bf4fuAiiQZt7HhIaurKcaOknAmiu9yrsMvcYg00GwLLbR2uTai4ugEhcr
ffIgSJ2thcjUYKcq25Mj
=LOOu
-----END PGP SIGNATURE-----
--2oS5YaxWCcQjTEyO--
- Raw text -