X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; q=dns; s= default; b=epdh9hvIrmRu0/IYSoIi22/F9j3wJOYBTTsVSXBHSg4CJwCa0Rw84 mAIA5w+9U+V3UtbjiyRm2NF8nXXWueMgqDIx2nUCEFae9lEjPWYd9wai1ij/3VyE 7O89PeXpG66jh0KIai1iitofuCriGJ/ZNwzRm02fBFnddaIjUAakt0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; s=default; bh=eoUJHUm5UriMwdSDhy7BUpoQwh8=; b=iJ7/GNT2ZRUVTY1EEu2QxZCIyRRA VqrgdgC8AH6CIy0i6fHoPfqmYntS1kYaef39xZEaIv9AqE8c7q0vPNrx9RARaJwW xS7VPwCkTG0Zk3lclmLizIufaF1ssS/Nv+TGa413ds5UWcjbKIhe0i6/pAzxEdyr o9F83ydocSoMVlI= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-101.9 required=5.0 tests=AWL,BAYES_00,GOOD_FROM_CORINNA_CYGWIN,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=month, customer, Chinese, H*c:application X-HELO: drew.franken.de Date: Mon, 7 Aug 2017 12:41:27 +0200 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: Unicode width data inconsistent/outdated Message-ID: <20170807104127.GT25551@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <20170726080859 DOT GA24312 AT calimero DOT vinschen DOT de> <5d3cb047-49f8-26a6-d816-387a71486e99 AT cygwin DOT com> <20170726095016 DOT GA25666 AT calimero DOT vinschen DOT de> <289bd98b-e644-888d-07f8-8965b6538373 AT towo DOT net> <20170728195826 DOT GI24013 AT calimero DOT vinschen DOT de> <1244bd24-bb27-d185-1f24-61beae02c2cd AT towo DOT net> <20170804170156 DOT GL25551 AT calimero DOT vinschen DOT de> <30486790-c59d-9a78-6000-b3c20fb86d9d AT towo DOT net> <20170807092820 DOT GQ25551 AT calimero DOT vinschen DOT de> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="/t6ASE28jIy1gGy9" Content-Disposition: inline In-Reply-To: <20170807092820.GQ25551@calimero.vinschen.de> User-Agent: Mutt/1.8.3 (2017-05-23) --/t6ASE28jIy1gGy9 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Aug 7 11:28, Corinna Vinschen wrote: > On Aug 5 21:06, Thomas Wolff wrote: > > Am 04.08.2017 um 19:01 schrieb Corinna Vinschen: > > > This shouldn't matter to you, just keep it in place. It's a historic= al, > > > low footprint conversion for japanese characters without pulling in t= he > > > unicode stuff. Not used on Cygwin so just ignore. > > I had noticed meanwhile that this is not active in Cygwin, but it's bro= ken > > anyway for multiple reasons: > > * platforms for which wchar_t is not Unicode should be explicitly li= sted > > * if used, the transformation needs to be applied to all non-Unicode > > locales (also Chinese, Korean, and even 8-bit locales such as *.CP1252) > > * for towupper and towlower, the result must be back-transformed int= o the > > respective locale encoding > > * particulary the locale-specific _l functions inconsistently do not= use > > the transformation but have this note: >=20 > No, no, no. The functionality is restricted to certain use-cases and > always was. It was a paid-for customer extension back in the day and it > was *sufficient* for the use-cases. It's not clear how many newlib > users are still using it, but it's not a good idea to remove it without > checking first. That means, ask on the newlib mailing list how many are > using the historical jp2uc code, and if we don't get a reply within, > say, a month, we can probably nuke it. To clarify where we're coming from: If you look into newlib/libc/locale/locale.c, function __loadlocale, you'll notice that outside of Cygwin, only six single/double/multi-bytes codesets are supported at all: ASCII ISO-8859-1 EUCJP JIS SJIS UTF-8 The multichar/widechar conversion functions for EUCJP, JIS and SJIS were implemented to have a low footprint in the first place, see, for instance, __sjis_wctomb in newlib/libc/stdlib/wctomb_r.c. This is all about simplification for small targets. There was never a requirement that converting a UTF-8 char to wchar_t, and converting the equivalent SJIS char to wchar_t would result in the same wide char. Consequentially, Cygwin does not use these conversion functions. Rather it uses Windows conversion functions, see the conversion functions in winsup/cygwin/strfuncs.cc, to get a consistent wide char representation (UTF-16). Another side-effect is that Cygwin does not support JIS at all, only SJIS, see the comment in strfuncs.cc. Corinna --=20 Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat --/t6ASE28jIy1gGy9 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJZiEPXAAoJEPU2Bp2uRE+gS1cP/2zlkr6oFCjHI4CmQkAyIRVS fXsW9QjzQja0sMA8mo43fek+VQtOWDc5S41nIghFb83RK9ySHtGxhXyjM/bCUXGJ WyWSisKGWJklxDtf3BA+1NYodo+y1tHgXNrDnny5oLVhwMo86RVzKL9ubzzFHsSZ 5XF80Bs1pDaGcm3cgV+qVF+O1PTP9tKO9RAyE8FCtMONLs1kUTTwPlkIlOPdUUfb ffFcf7ZcW72iM79A0+DT61s96FiMjCKRoqaDcKrAc3YZ+HnIjPHXwtuSkfrZ25nO txCXq2XClllYeTvvqzLr7XAcEs1ncMW8oMW7kGZlC0hM+/Wx8xr90eZYFlnY20aw ZaOeQQc56cbQyYZM19SP37bmbyhXGEK6HdhMzL9iTaA2BepTtYs3v877P+BkNhzs m+iIrUTyYU1gDYJA82pSvf87gO+B6woz7W+F24AUM5C6QfIiFStxLZYGj7DnjlGz 7RkoAr+gayVFdDNABSFmNkaOWYyPnDLGI93PmMSkMwvZ5H2W8Pnf82S287u+XKkY DrwawqKnYidMuvmppLUSYCrxEpgZnqC/Z6edwCmR9q2INkrd7pIQIyo4tDb+FpzA XirRof7PZb9y+MpbLFQSrelrnA2GPxCBeOGTuFr/Ui8HwOH6e7sLHStsNl9b+aYt 1ZS78CozFRpTqxAHw5P9 =DqFa -----END PGP SIGNATURE----- --/t6ASE28jIy1gGy9--