Mail Archives: cygwin/2017/08/07/06:41:47
X-Recipient: | archive-cygwin AT delorie DOT com
|
DomainKey-Signature: | a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
|
| :list-unsubscribe:list-subscribe:list-archive:list-post
|
| :list-help:sender:date:from:to:subject:message-id:reply-to
|
| :references:mime-version:content-type:in-reply-to; q=dns; s=
|
| default; b=epdh9hvIrmRu0/IYSoIi22/F9j3wJOYBTTsVSXBHSg4CJwCa0Rw84
|
| mAIA5w+9U+V3UtbjiyRm2NF8nXXWueMgqDIx2nUCEFae9lEjPWYd9wai1ij/3VyE
|
| 7O89PeXpG66jh0KIai1iitofuCriGJ/ZNwzRm02fBFnddaIjUAakt0=
|
DKIM-Signature: | v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
|
| :list-unsubscribe:list-subscribe:list-archive:list-post
|
| :list-help:sender:date:from:to:subject:message-id:reply-to
|
| :references:mime-version:content-type:in-reply-to; s=default;
|
| bh=eoUJHUm5UriMwdSDhy7BUpoQwh8=; b=iJ7/GNT2ZRUVTY1EEu2QxZCIyRRA
|
| VqrgdgC8AH6CIy0i6fHoPfqmYntS1kYaef39xZEaIv9AqE8c7q0vPNrx9RARaJwW
|
| xS7VPwCkTG0Zk3lclmLizIufaF1ssS/Nv+TGa413ds5UWcjbKIhe0i6/pAzxEdyr
|
| o9F83ydocSoMVlI=
|
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm
|
List-Id: | <cygwin.cygwin.com>
|
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com>
|
List-Archive: | <http://sourceware.org/ml/cygwin/>
|
List-Post: | <mailto:cygwin AT cygwin DOT com>
|
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
|
Sender: | cygwin-owner AT cygwin DOT com
|
Mail-Followup-To: | cygwin AT cygwin DOT com
|
Delivered-To: | mailing list cygwin AT cygwin DOT com
|
Authentication-Results: | sourceware.org; auth=none
|
X-Virus-Found: | No
|
X-Spam-SWARE-Status: | No, score=-101.9 required=5.0 tests=AWL,BAYES_00,GOOD_FROM_CORINNA_CYGWIN,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=month, customer, Chinese, H*c:application
|
X-HELO: | drew.franken.de
|
Date: | Mon, 7 Aug 2017 12:41:27 +0200
|
From: | Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
|
To: | cygwin AT cygwin DOT com
|
Subject: | Re: Unicode width data inconsistent/outdated
|
Message-ID: | <20170807104127.GT25551@calimero.vinschen.de>
|
Reply-To: | cygwin AT cygwin DOT com
|
Mail-Followup-To: | cygwin AT cygwin DOT com
|
References: | <f3c1b415-7a26-8bbe-a67f-5619d356f058 AT towo DOT net> <20170726080859 DOT GA24312 AT calimero DOT vinschen DOT de> <5d3cb047-49f8-26a6-d816-387a71486e99 AT cygwin DOT com> <20170726095016 DOT GA25666 AT calimero DOT vinschen DOT de> <289bd98b-e644-888d-07f8-8965b6538373 AT towo DOT net> <20170728195826 DOT GI24013 AT calimero DOT vinschen DOT de> <1244bd24-bb27-d185-1f24-61beae02c2cd AT towo DOT net> <20170804170156 DOT GL25551 AT calimero DOT vinschen DOT de> <30486790-c59d-9a78-6000-b3c20fb86d9d AT towo DOT net> <20170807092820 DOT GQ25551 AT calimero DOT vinschen DOT de>
|
MIME-Version: | 1.0
|
In-Reply-To: | <20170807092820.GQ25551@calimero.vinschen.de>
|
User-Agent: | Mutt/1.8.3 (2017-05-23)
|
--/t6ASE28jIy1gGy9
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Aug 7 11:28, Corinna Vinschen wrote:
> On Aug 5 21:06, Thomas Wolff wrote:
> > Am 04.08.2017 um 19:01 schrieb Corinna Vinschen:
> > > This shouldn't matter to you, just keep it in place. It's a historic=
al,
> > > low footprint conversion for japanese characters without pulling in t=
he
> > > unicode stuff. Not used on Cygwin so just ignore.
> > I had noticed meanwhile that this is not active in Cygwin, but it's bro=
ken
> > anyway for multiple reasons:
> > * platforms for which wchar_t is not Unicode should be explicitly li=
sted
> > * if used, the transformation needs to be applied to all non-Unicode
> > locales (also Chinese, Korean, and even 8-bit locales such as *.CP1252)
> > * for towupper and towlower, the result must be back-transformed int=
o the
> > respective locale encoding
> > * particulary the locale-specific _l functions inconsistently do not=
use
> > the transformation but have this note:
>=20
> No, no, no. The functionality is restricted to certain use-cases and
> always was. It was a paid-for customer extension back in the day and it
> was *sufficient* for the use-cases. It's not clear how many newlib
> users are still using it, but it's not a good idea to remove it without
> checking first. That means, ask on the newlib mailing list how many are
> using the historical jp2uc code, and if we don't get a reply within,
> say, a month, we can probably nuke it.
To clarify where we're coming from:
If you look into newlib/libc/locale/locale.c, function __loadlocale,
you'll notice that outside of Cygwin, only six single/double/multi-bytes
codesets are supported at all:
ASCII
ISO-8859-1
EUCJP
JIS
SJIS
UTF-8
The multichar/widechar conversion functions for EUCJP, JIS and SJIS were
implemented to have a low footprint in the first place, see, for
instance, __sjis_wctomb in newlib/libc/stdlib/wctomb_r.c.
This is all about simplification for small targets. There was never a
requirement that converting a UTF-8 char to wchar_t, and converting the
equivalent SJIS char to wchar_t would result in the same wide char.
Consequentially, Cygwin does not use these conversion functions. Rather
it uses Windows conversion functions, see the conversion functions in
winsup/cygwin/strfuncs.cc, to get a consistent wide char representation
(UTF-16). Another side-effect is that Cygwin does not support JIS at
all, only SJIS, see the comment in strfuncs.cc.
Corinna
--=20
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
--/t6ASE28jIy1gGy9
Content-Type: application/pgp-signature; name="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAEBCAAGBQJZiEPXAAoJEPU2Bp2uRE+gS1cP/2zlkr6oFCjHI4CmQkAyIRVS
fXsW9QjzQja0sMA8mo43fek+VQtOWDc5S41nIghFb83RK9ySHtGxhXyjM/bCUXGJ
WyWSisKGWJklxDtf3BA+1NYodo+y1tHgXNrDnny5oLVhwMo86RVzKL9ubzzFHsSZ
5XF80Bs1pDaGcm3cgV+qVF+O1PTP9tKO9RAyE8FCtMONLs1kUTTwPlkIlOPdUUfb
ffFcf7ZcW72iM79A0+DT61s96FiMjCKRoqaDcKrAc3YZ+HnIjPHXwtuSkfrZ25nO
txCXq2XClllYeTvvqzLr7XAcEs1ncMW8oMW7kGZlC0hM+/Wx8xr90eZYFlnY20aw
ZaOeQQc56cbQyYZM19SP37bmbyhXGEK6HdhMzL9iTaA2BepTtYs3v877P+BkNhzs
m+iIrUTyYU1gDYJA82pSvf87gO+B6woz7W+F24AUM5C6QfIiFStxLZYGj7DnjlGz
7RkoAr+gayVFdDNABSFmNkaOWYyPnDLGI93PmMSkMwvZ5H2W8Pnf82S287u+XKkY
DrwawqKnYidMuvmppLUSYCrxEpgZnqC/Z6edwCmR9q2INkrd7pIQIyo4tDb+FpzA
XirRof7PZb9y+MpbLFQSrelrnA2GPxCBeOGTuFr/Ui8HwOH6e7sLHStsNl9b+aYt
1ZS78CozFRpTqxAHw5P9
=DqFa
-----END PGP SIGNATURE-----
--/t6ASE28jIy1gGy9--
- Raw text -