delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/09/07/16:08:41

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
In-Reply-To: <3f0ad08d0909020656v7d9fce6ft4afea63ed363b9a9@mail.gmail.com>
References: <416096c60908300959i1e0084b1xc8f6e65e792b035d AT mail DOT gmail DOT com> <20090831005258 DOT GG2068 AT ednor DOT casa DOT cgf DOT cx> <416096c60909012329l2f25e735yc07145b8d6698cda AT mail DOT gmail DOT com> <3f0ad08d0909020656v7d9fce6ft4afea63ed363b9a9 AT mail DOT gmail DOT com>
Date: Mon, 7 Sep 2009 21:08:25 +0100
Message-ID: <416096c60909071308qc5ff057sbe9cb1dbc270554f@mail.gmail.com>
Subject: Re: The C locale
From: Andy Koppe <andy DOT koppe AT gmail DOT com>
To: cygwin AT cygwin DOT com
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

2009/9/2 IWAMURO Motonori:
> I want to use UTF-8 throughout.
> Because:
> - a lot of UNIX tools using network (e.g. rsync, scp, ...) treat the
> file name as 8bit byte array.
> - default locale of modern UNIX based OS is *.UTF-8.
> - The file with the filename including the character outside the
> codepage (e.g. files in iTunes folder) can be handled.

I'm minded to agree, but actually there's a big stumbling block here:
many interactive programs in Cygwin do not (yet) support UTF-8, e.g.
nano, mutt, and mc. If you try, you get all sorts of funny effects
with invalid characters and mispositioned cursors. That's not
acceptable as default.

Which leaves one apparently good solution for the "C" locale:
>> - Use the default Windows codepage for filenames, console, and
>> multibyte functions. This is what happens already if you specifiy a
>> locale with a language but no charset, e.g. "en". Maximum 1.5
>> compatibility.

On a closely related note, Debian are introducing a "C.UTF-8" locale
as a language-neutral locale with a UTF-8 character set. This is
useful for choosing UTF-8 without picking up language-specific stuff
like sorting rules. See here:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522776. It's a rather
lengthy thread, but in the end they did decide to go for it.

Cygwin 1.7, through newlib, already has "C-UTF-8", as well as the
likes of "C-ISO-8859-1" or "C-SJIS". So how about replacing the "C-"
with "C." in those, considering that Cygwin has no backward
compatibility requirement regarding those?

Andy

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019