delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2023/03/25/15:05:13

X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5384F385B532
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1679771069;
bh=kJ2/NtHH+Zv0ubvjIqmr/CZcnKS1kdJUAyEhatM/OTc=;
h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
From;
b=qrODCTRJfMyDxVRSudIzPS4t4ckLlXjif1r7IzUk2Lro7lvTpw/WwDYZc8+caGgWI
72V/VwKhQ6F1gHTitehsR3zZ1c8SUqF8T1ujL4wJ5y1pgBN8lwSWjdBJ4U9BZCtoEe
ZE23MJ2ICMypJQRq7FiBuP5Lwx5zaAm+YuTOa5Rs=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8265B3858D20
X-Authority-Analysis: v=2.4 cv=VbHkgXl9 c=1 sm=1 tr=0 ts=641f4596
a=DxHlV3/gbUaP7LOF0QAmaA==:117 a=DxHlV3/gbUaP7LOF0QAmaA==:17
a=IkcTkHD0fZMA:10 a=BqEg4_3jAAAA:8 a=mDV3o1hIAAAA:8 a=qv0N3cA8AAAA:8
a=uZvujYp8AAAA:8 a=UPi3VDVP6i3DgxbP3xwA:9 a=QEXdDO2ut3YA:10 a=JL7LL0wFKmEA:10
a=917E4DLJ5_QA:10 a=CyDqZaEJHvYA:10 a=6Rf0VR9Hqp4A:10 a=di7zyy9U-FoA:10
a=Y6fmkEXvoAoA:10 a=H5SEnc7o8hUA:10 a=juh9Zz1NyDQA:10 a=XnhQH5pFAIYA:10
a=0mFWnFbQd5xWBqmg7tTt:22 a=_FVE-zBwftR9WsbkzFJk:22 a=IkJeoXH5QaFgIRku97pq:22
a=SLzB8X_8jTLwj6mN0q5r:22
Message-ID: <8f2b7a49-2878-8481-233c-146fdb9a0e69@Shaw.ca>
Date: Sat, 25 Mar 2023 13:03:49 -0600
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.9.0
Subject: Re: newlocale: Linux incompatibility
To: cygwin AT cygwin DOT com
References: <bd7ebebc-cf90-d509-ef15-11d702a6126c AT cornell DOT edu>
<ZBzBL2jFO7Oltjd1 AT calimero DOT vinschen DOT de>
<ZB2U/JCFrwSUo1+U AT calimero DOT vinschen DOT de>
<bb1eed09-54c2-08a1-7eb9-40e39c7657d9 AT Shaw DOT ca>
<ZB7fsuNoUY9miWr+@calimero.vinschen.de>
Organization: Inglis
In-Reply-To: <ZB7fsuNoUY9miWr+@calimero.vinschen.de>
X-CMAE-Envelope: MS4xfHdSSX+0Zq770jU18ksGTRy86ep06iV58cLlLtTpGqnnRctqd6F7fyPqwwrcUa6wSHDBCC4o9NXWf3M+M/gRETPYv53+PdOM5kL9sVGBaVfZAHCt5lfa
HCuusojmiOHJ1AWwR4jLLau6tgKeXeE6zdj4vmW2dWSCBhIavlN7LbAl+eFoIsn4zJx3rr3Xqh1VAg==
X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, NICE_REPLY_A,
RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS,
TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Brian Inglis via Cygwin <cygwin AT cygwin DOT com>
Reply-To: cygwin AT cygwin DOT com
Cc: Brian Inglis <Brian DOT Inglis AT Shaw DOT ca>
Errors-To: cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 32PJ4tdu016187

On 2023-03-25 05:49, Corinna Vinschen via Cygwin wrote:
> On Mar 24 16:49, Brian Inglis via Cygwin wrote:
>> On 2023-03-24 06:18, Corinna Vinschen via Cygwin wrote:
>>>> First, it's a bug in the Emacs testsuite.  The test simply assumes that
>>>> there's no en_DE locale on any system, but that's just not true.
>>>> Windows support the RFC 5646 locale "en-DE", which is called "English
>>>> (Germany)" in the "Region" settings.
>>>> You can also check with `locale -av | less' and search for en_DE.
>>>> For the reminder of this mail, I assume you're talking about Cygwin 3.5.
>>>> I won't fix this for 3.4 anymore, given how much locale handling has
>>>> changed for 3.5.
>>>> The second bug is that Cygwin blindly trusts the Windows function
>>>> ResolveLocaleName().  That function blatantly converts even vaguely
>>>> similar locales into something it supports.  E.g., it converts "en-XY"
>>>> to "en-US".  I. .e., even if you use "en_XY.utf8" as locale, the above
>>>> testcase will wrongly succeed.  So I have to rethink how I resolve POSIX
>>>> locales to Windows locales.

>> Does Windows even consider https://www.rfc-editor.org/rfc/rfc4647 "Matching
>> of Language Tags", part of https://www.rfc-editor.org/info/bcp47 "Language
>> Tags", and if POSIX only matches exactly, will LANGUAGE be able to be used
>> for fallback?

> I never heard about an environment variable called LANGUAGE.  This is
> about LANG/LC_ALL/LC_whatever, so POSIX syntax is required...

Used by gettext:

https://www.gnu.org/software/gettext/manual/html_node/The-LANGUAGE-variable.html

also LINGUAS FYI controlling, documentating, or limiting translations:

https://www.gnu.org/software/gettext/manual/html_node/po_002fLINGUAS.html
https://www.gnu.org/software/gettext/manual/html_node/Installers.html

as POSIX punts a lot of locale handling into the (hand waving) implementation 
defined bucket, where this is the primary implementation.

>> I currently define LANGUAGE=en_CA:en_GB:en in case en-CA is unsupported by
>> anything.
>> [I use my own en-CA locale not the glibc default created by https://rap.dk/.]
>> Will "-" be supported like "_" as a separator in values?

> In Cygwin?  No.  The POSIX syntax is required, it's converted into
> a matching Windows RFC 5646 locale internally.

>>>> And the third bug is that Cygwin fails to set errno if it doesn't
>>>> support a locale, but that's a minor inconvenience in comparison.
>>>> Thanks for the report, I totally missed the above problem with
>>>> ResolveLocaleName.

>>> I pushed a couple of patches which hopefully clean up the code.  It's
>>> really frustrating how these Windows locale functions work.  Or, rather,
>>> not work.  I mean, come on...
>>> - ResolveLocaleName() resolves "ff-BF" to "ff-Latn-SN", not to
>>>     "ff-Adlm-BF" or "ff-Latn-BF", even though both exist.
>>> - There's a locale called "sd-Arab-PK" and a locale "sd-Deva-IN".  If
>>>     you ask for the script used in "sd-IN", the result is "Arab", not
>>>     "Deva".
>>> I had to create a replacement function for ResolveLocaleName which
>>> doesn't return totally screwy and unexpected results, and special case
>>> two more locales in /proc/locales output so the output makes sense.

>> Aha - a nice new 3.5.0 feature - as well as /proc/codesets - is that
>> charsets e.g. ISO-10646, etc. rather than encodings e.g. UTF-8, etc.!

> It's a list of what you can use as codeset in $LANG and friends as in
>    LC_CTYPE=lang_TERRITORY DOT codeset AT modifier

You are using codeset to mean encoding, whereas in Unicode and W3 it usually 
means coded character set/charset; it can also mean charmap; see iconv(1):

	https://pubs.opengroup.org/onlinepubs/9699919799/utilities/iconv.html

Further confused by codeset definition:

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_99

linking to:

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html#tag_06_02

which says POSIX "provides no means of defining a wide-character codeset" 
implying encodings such as UCS-2/UTF-16 and UCS-4/UTF-32 can not be specified, 
requiring a non-POSIX approach to conversion.

Also IBM uses codeset to distinguish between EBCDIC and ASCII encodings.

Adding to the confusion ISO uses codeset to refer generically to each set of 
codes supported by each part of ISO-639-1/2/3/5, ISO-3166-1/2/3, and ISO-15924, 
as well as ISO-8859-1...16.

I get no hits from RFCs.

To avoid ambiguity and reduce possible confusion, it may be better to name this 
charmaps as used in locale(1), and produced by locale -m with the same apparent 
content?
It looks like /proc/locales contains the same content as produced by locale -a?

JM2c ;^>

-- 
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                 -- Antoine de Saint-Exupéry

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019