X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=tpFN9ju7q0ub/bUJ ONT+DGnxRa2LsM/+Ngw/Mim+uABoI5Wl/bSR84Wand+dO5gT4GBCTXqHZ9ZOicYu JafO66+kaVyOuhi7yBqebSfogblasLeCR+TSMd2geuSfTQtz64wIXEQ6TdzxQSDQ Zk7Vfnsm66Zi4hviD6WHO7+EN2Q= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=FtiYDRJA5SQOfxUTKxB/W5 1+TW4=; b=t3s1b82MfkPxA7wFPkKDWmSy+9NXsRhb/Wmj3ziP1tdqeOza1rRJUm KpBCVcFuEDYCnJNM4vUbSbzDpv3wg+n5WzkWLtULVhU24TxEBF+GE5WVGJC5yEG0 KlS2gLaIxKMC8BLorNQlIW9O0NfeMQnstqS+g4oI2ADpE+9TJqYfo= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.9 required=5.0 tests=BAYES_50,FREEMAIL_FROM,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=no version=3.3.2 spammy=Serbian, serbian, Default, H*c:koi8-r X-HELO: mail-wm0-f44.google.com X-Received: by 10.28.187.198 with SMTP id l189mr25010890wmf.89.1450981350895; Thu, 24 Dec 2015 10:22:30 -0800 (PST) Subject: Re: Default locale for Russian/Russia should be ru_RU.CP1251 To: cygwin AT cygwin DOT com References: <567C1207 DOT 3020700 AT gmail DOT com> From: Marco Atzeri Message-ID: <567C37D9.8090102@gmail.com> Date: Thu, 24 Dec 2015 19:22:17 +0100 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <567C1207.3020700@gmail.com> Content-Type: text/plain; charset=koi8-r; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes On 24/12/2015 16:40, Andrey ``Bass'' Shcheglov wrote: > Hi, > > I'm running Cygwin 2.2.0 on an English Windows 8.1 box: > >> CYGWIN_NT-6.3 UNIT-725 2.2.0(0.289/5/3) 2015-08-03 12:51 x86_64 Cygwin > > Windows regional settings are set to Russian/Russia. > > In the absence of any settings in bashrc/bash_profile, `locale` command > outputs the following: > >> LANG=ru_RU >> LC_CTYPE="ru_RU" >> LC_NUMERIC="ru_RU" >> LC_TIME="ru_RU" >> LC_COLLATE="ru_RU" >> LC_MONETARY="ru_RU" >> LC_MESSAGES="ru_RU" >> LC_ALL= > > This is perfectly fine, except that "no charset" in the locale output > means "ISO charset", which is ISO-8859-5 for Russian/Russia and has > never been used (historically, DOS used CP866, Windows used CP1251 ANSI > codepage, and various Unices sticked to KOI8-R before the rise of > Unicode era). > > The above is consistent with locale charmap output, which is again > ISO-8859-5. > > > Short C example also confirms ISO-8859-5 is used: > >> #include >> >> #include >> #include >> >> int main() { >> const char *locale = setlocale(LC_ALL, ""); >> const char *codeset = nl_langinfo(CODESET); >> printf("locale: %s\n", locale); >> printf("codeset: %s\n", codeset); >> >> return 0; >> } > > outputs > >> locale: ru_RU/ru_RU/ru_RU/ru_RU/ru_RU/C >> codeset: ISO-8859-5 > > > Cygwin docs state that > >> Starting with Cygwin 1.7.2, the default character set is determined by the default Windows ANSI codepage for this language and territory. > > which is not true in my case (Windows ANSI codepage for Cyrillic is > CP1251, not ISO-8859-5!). Surprisingly, for Belarusian (a.k.a > Belorussian, Eastern Slavic language very close to Russian) "be_BY" > locale the default charset is indeed CP1251 which is in accordance with > both the documentation and common sense. > > > Additionally, in `strace locale -u` output, I see multiple >> __get_lcid_from_locale: LCID=0x0419 > lines. > > "0x0419" corresponds to Russian/Russia (see > ). > > Despite that, $(locale -u) returns "en_GB", despite all regional > settings are set to Russian/Russia. I believe this is not correct, > either, and needs to be fixed. the current code on winsup/cygwin/nlsfuncs.cc is responsible for the ISO-8859-5 defaults. -------------------------------------------------------------- case 1251: if (lcid == 0x0c1a /* sr_CS (Serbian Language/Former Serbia and Montenegro) */ || lcid == 0x1c1a /* sr_BA (Serbian Language/Bosnia and Herzegovina) */ || lcid == 0x281a /* sr_RS (Serbian Language/Serbia) */ || lcid == 0x301a /* sr_ME (Serbian Language/Montenegro)*/ || lcid == 0x0440 /* ky_KG (Kyrgyz/Kyrgyzstan) */ || lcid == 0x0843 /* uz_UZ (Uzbek/Uzbekistan) */ /* tt_RU (Tatar/Russia), IQTElif alphabet */ || (lcid == 0x0444 && has_modifier ("@iqtelif")) || lcid == 0x0450) /* mn_MN (Mongolian/Mongolia) */ cs = "UTF-8"; else if (lcid == 0x0423) /* be_BY (Belarusian/Belarus) */ cs = has_modifier ("@latin") ? "UTF-8" : "CP1251"; else if (lcid == 0x0402) /* bg_BG (Bulgarian/Bulgaria) */ cs = "CP1251"; else if (lcid == 0x0422) /* uk_UA (Ukrainian/Ukraine) */ cs = "KOI8-U"; else cs = "ISO-8859-5"; -------------------------------------------------------------- > Regards, > Andrey. as temporary workaround can you use UTF-8 ? export LANG=ru_RU.UTF-8 Regards Marco -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple