delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2015/12/24/13:22:52

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type
:content-transfer-encoding; q=dns; s=default; b=tpFN9ju7q0ub/bUJ
ONT+DGnxRa2LsM/+Ngw/Mim+uABoI5Wl/bSR84Wand+dO5gT4GBCTXqHZ9ZOicYu
JafO66+kaVyOuhi7yBqebSfogblasLeCR+TSMd2geuSfTQtz64wIXEQ6TdzxQSDQ
Zk7Vfnsm66Zi4hviD6WHO7+EN2Q=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type
:content-transfer-encoding; s=default; bh=FtiYDRJA5SQOfxUTKxB/W5
1+TW4=; b=t3s1b82MfkPxA7wFPkKDWmSy+9NXsRhb/Wmj3ziP1tdqeOza1rRJUm
KpBCVcFuEDYCnJNM4vUbSbzDpv3wg+n5WzkWLtULVhU24TxEBF+GE5WVGJC5yEG0
KlS2gLaIxKMC8BLorNQlIW9O0NfeMQnstqS+g4oI2ADpE+9TJqYfo=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=0.9 required=5.0 tests=BAYES_50,FREEMAIL_FROM,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=no version=3.3.2 spammy=Serbian, serbian, Default, H*c:koi8-r
X-HELO: mail-wm0-f44.google.com
X-Received: by 10.28.187.198 with SMTP id l189mr25010890wmf.89.1450981350895; Thu, 24 Dec 2015 10:22:30 -0800 (PST)
Subject: Re: Default locale for Russian/Russia should be ru_RU.CP1251
To: cygwin AT cygwin DOT com
References: <567C1207 DOT 3020700 AT gmail DOT com>
From: Marco Atzeri <marco DOT atzeri AT gmail DOT com>
Message-ID: <567C37D9.8090102@gmail.com>
Date: Thu, 24 Dec 2015 19:22:17 +0100
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0
MIME-Version: 1.0
In-Reply-To: <567C1207.3020700@gmail.com>
X-IsSubscribed: yes

On 24/12/2015 16:40, Andrey ``Bass'' Shcheglov wrote:
> Hi,
>
> I'm running Cygwin 2.2.0 on an English Windows 8.1 box:
>
>> CYGWIN_NT-6.3 UNIT-725 2.2.0(0.289/5/3) 2015-08-03 12:51 x86_64 Cygwin
>
> Windows regional settings are set to Russian/Russia.
>
> In the absence of any settings in bashrc/bash_profile, `locale` command
> outputs the following:
>
>> LANG=ru_RU
>> LC_CTYPE="ru_RU"
>> LC_NUMERIC="ru_RU"
>> LC_TIME="ru_RU"
>> LC_COLLATE="ru_RU"
>> LC_MONETARY="ru_RU"
>> LC_MESSAGES="ru_RU"
>> LC_ALL=
>
> This is perfectly fine, except that "no charset" in the locale output
> means "ISO charset", which is ISO-8859-5 for Russian/Russia and has
> never been used (historically, DOS used CP866, Windows used CP1251 ANSI
> codepage, and various Unices sticked to KOI8-R before the rise of
> Unicode era).
>
> The above is consistent with locale charmap output, which is again
> ISO-8859-5.
>
>
> Short C example also confirms ISO-8859-5 is used:
>
>> #include <stdio.h>
>>
>> #include <locale.h>
>> #include <langinfo.h>
>>
>> int main() {
>>      const char *locale = setlocale(LC_ALL, "");
>>      const char *codeset = nl_langinfo(CODESET);
>>      printf("locale: %s\n", locale);
>>      printf("codeset: %s\n", codeset);
>>
>>      return 0;
>> }
>
> outputs
>
>> locale: ru_RU/ru_RU/ru_RU/ru_RU/ru_RU/C
>> codeset: ISO-8859-5
>
>
> Cygwin docs state that
>
>> Starting with Cygwin 1.7.2, the default character set is determined by the default Windows ANSI codepage for this language and territory.
>
> which is not true in my case (Windows ANSI codepage for Cyrillic is
> CP1251, not ISO-8859-5!). Surprisingly, for Belarusian (a.k.a
> Belorussian, Eastern Slavic language very close to Russian) "be_BY"
> locale the default charset is indeed CP1251 which is in accordance with
> both the documentation and common sense.
>
>
> Additionally, in `strace locale -u` output, I see multiple
>> __get_lcid_from_locale: LCID=0x0419
> lines.
>
> "0x0419" corresponds to Russian/Russia (see
> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd318693%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396>).
>
> Despite that, $(locale -u) returns "en_GB", despite all regional
> settings are set to Russian/Russia. I believe this is not correct,
> either, and needs to be fixed.

the current code on
   winsup/cygwin/nlsfuncs.cc

is responsible for the ISO-8859-5 defaults.
--------------------------------------------------------------
     case 1251:
       if (lcid == 0x0c1a                /* sr_CS (Serbian Language/Former
                                                   Serbia and Montenegro) */
           || lcid == 0x1c1a             /* sr_BA (Serbian Language/Bosnia
                                                   and Herzegovina) */
           || lcid == 0x281a             /* sr_RS (Serbian 
Language/Serbia) */
           || lcid == 0x301a             /* sr_ME (Serbian 
Language/Montenegro)*/
           || lcid == 0x0440             /* ky_KG (Kyrgyz/Kyrgyzstan) */
           || lcid == 0x0843             /* uz_UZ (Uzbek/Uzbekistan) */
                                         /* tt_RU (Tatar/Russia),
                                                  IQTElif alphabet */
           || (lcid == 0x0444 && has_modifier ("@iqtelif"))
           || lcid == 0x0450)            /* mn_MN (Mongolian/Mongolia) */
         cs = "UTF-8";
       else if (lcid == 0x0423)          /* be_BY (Belarusian/Belarus) */
         cs = has_modifier ("@latin") ? "UTF-8" : "CP1251";
       else if (lcid == 0x0402)          /* bg_BG (Bulgarian/Bulgaria) */
         cs = "CP1251";
       else if (lcid == 0x0422)          /* uk_UA (Ukrainian/Ukraine) */
         cs = "KOI8-U";
       else
         cs = "ISO-8859-5";
--------------------------------------------------------------

> Regards,
> Andrey.

as temporary workaround can you use UTF-8 ?

export LANG=ru_RU.UTF-8

Regards
Marco





--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019