delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/09/08/16:48:52

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
In-Reply-To: <20090908193456.GC17515@calimero.vinschen.de>
References: <416096c60908300959i1e0084b1xc8f6e65e792b035d AT mail DOT gmail DOT com> <20090831005258 DOT GG2068 AT ednor DOT casa DOT cgf DOT cx> <416096c60909012329l2f25e735yc07145b8d6698cda AT mail DOT gmail DOT com> <3f0ad08d0909020656v7d9fce6ft4afea63ed363b9a9 AT mail DOT gmail DOT com> <416096c60909071308qc5ff057sbe9cb1dbc270554f AT mail DOT gmail DOT com> <20090908193456 DOT GC17515 AT calimero DOT vinschen DOT de>
Date: Tue, 8 Sep 2009 21:48:25 +0100
Message-ID: <416096c60909081348n660b165eo8a71d8c65fca5204@mail.gmail.com>
Subject: Re: The C locale
From: Andy Koppe <andy DOT koppe AT gmail DOT com>
To: cygwin AT cygwin DOT com
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

2009/9/8 Corinna Vinschen:
>> Which leaves one apparently good solution for the "C" locale:
>> >> - Use the default Windows codepage for filenames, console, and
>> >> multibyte functions. This is what happens already if you specifiy a
>> >> locale with a language but no charset, e.g. "en". Maximum 1.5
>> >> compatibility.
>
> UTF-8 has been chosen because it has the advantage that every UTF-16
> Windows filename will result in a valid multibyte string.

Fair enough, if the console and the character conversion functions
used UTF-8 as well (and if applications such as mc, nano and mutt were
rebuilt with UTF-8 support).

Unfortunately, they use ISO-8859-1, so out-of-the box the support for
non-ASCII characters in Cygwin 1.7 is effectively broken. Please see
posts earlier in this thread for the problems caused by this.

Yes, users can set a locale variable to get this working, but hacking
Cygwin.bat or finding the Windows environment variable dialog isn't
exactly intuitive. And they didn't have to do that in 1.5 to at least
get the Windows "ANSI" codepage working.


>=C2=A0Every choice has its advantage and its trade-offs.

The current choices have nothing but disadvantages, due to mixing of
UTF-8 and ISO-8859-1.

Besides, regarding the Windows codepage, wasn't the ^N scheme
introduced to deal with filename characters outside the current
charset?


>> On a closely related note, Debian are introducing a "C.UTF-8" locale
>> as a language-neutral locale with a UTF-8 character set. This is
>> useful for choosing UTF-8 without picking up language-specific stuff
>> like sorting rules. See here:
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=3D522776. It's a rather
>> lengthy thread, but in the end they did decide to go for it.
>
> Doesn't just setting LC_CTYPE=3Dfo_ba.UTF-8 has the same result?

For newlib, yes, because it doesn't (yet) care about the language
part. But the language part nevertheless matters for many programs,
and it may also matter when connecting to other hosts, e.g. by
changing the sort order in 'ls'.

"C.charset" would mean: give me all the default behaviours, except
that I want this specific charset.


>> Cygwin 1.7, through newlib, already has "C-UTF-8", as well as the
>> likes of "C-ISO-8859-1" or "C-SJIS". So how about replacing the "C-"
>> with "C." in those, considering that Cygwin has no backward
>> compatibility requirement regarding those?
>
> No, but newlib has.

Understood. I meant a __CYGWIN__-guarded change.

Andy

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019