delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/09/22/04:43:45

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_41,SPF_HELO_PASS,SPF_PASS
X-Spam-Check-By: sourceware.org
To: cygwin AT cygwin DOT com
From: Lapo Luchini <lapo AT lapo DOT it>
Subject: Re: The C locale
Date: Tue, 22 Sep 2009 10:43:04 +0200
Lines: 50
Message-ID: <h9a2mo$867$1@ger.gmane.org>
References: <416096c60908300959i1e0084b1xc8f6e65e792b035d AT mail DOT gmail DOT com> <20090831005258 DOT GG2068 AT ednor DOT casa DOT cgf DOT cx> <416096c60909012329l2f25e735yc07145b8d6698cda AT mail DOT gmail DOT com> <3f0ad08d0909020656v7d9fce6ft4afea63ed363b9a9 AT mail DOT gmail DOT com> <416096c60909071308qc5ff057sbe9cb1dbc270554f AT mail DOT gmail DOT com> <20090908193456 DOT GC17515 AT calimero DOT vinschen DOT de> <416096c60909081449r1fe024dbm7b82a3719be05e9e AT mail DOT gmail DOT com> <20090921103758 DOT GE20981 AT calimero DOT vinschen DOT de> <416096c60909211420g4ac8ea93l80fc1f00dcd5c0f3 AT mail DOT gmail DOT com> <h99p3v$e38$1 AT ger DOT gmane DOT org> <416096c60909212347r7e03a4f3q7d518ff7e8bce55d AT mail DOT gmail DOT com>
Mime-Version: 1.0
User-Agent: Thunderbird 2.0.0.23 (X11/20090831)
In-Reply-To: <416096c60909212347r7e03a4f3q7d518ff7e8bce55d@mail.gmail.com>
OpenPGP: id=C8F252FB
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Andy Koppe wrote:
> No, it isn't. UTF-16 filename characters that can't be represented in
> the current charset are encoded by a ^N followed by the character's
> UTF-8 representation.

OK, right.

> For example, a Windows filename "bäh" turns into "bŤh" in the C locale,
> while it shows up correctly with explicitly set ISO-8859-1 or CP1252.

Uh? Doesn't seem so to me: if I create "bäh" in WindowsExplorer, then
open up an UTF-8 mintty console I have a consistent output with both
LANG=C and LANG=it_IT.UTF-8 (of course, since right now C is UTF-8):

% LANG=C ls -l|egrep b.h
-rw-r--r-- 1 lapo None     0 Sep 22 09:53 bäh
% LANG=it_IT.UTF-8 ls -l|egrep b.h
-rw-r--r-- 1 lapo None     0 22 Sep 09:53 bäh

So I'm not sure what do you mean with 'a Windows filename "bäh" turns
into "bŤh" in the C locale'... you mean that a script sees it as
62C3A468 as opposed as 62E468? Or that actual "bŤh" is shown somewhere?

As "bŤh" is just a representation, and it depends on the charset the
console expects (and in fact in this UTF-8-encoded message, it will be
probably represented with 62C385C2A468)... if the console is UTF-8,
what's currently shown is what I'd expect.
If OTOH we're talking what it is in raw form and not of what is shown
(i.e. about "3 bytes" vs a "4 bytes" string) well, that's a different
issue, and I'm not sure why a program should prefer a 3-byte
representations as opposed to a 4-byte one...?

But OTOH as far as "not caring" goes, it sure can be a nice feature to
be retro-compatible in that single case, since the behavior is not
well-defined anyways...
But again, if a script creates a filename that happens to contain
Japanese characters (or even umlauts or r-quotes/l-quotes) I would
expect to see that on the filesystem too, and not some random-looking
escaped-sequence...

> Btw, are you actually using the C locale?

Not usually, but it happens from time to time (mostly in script, or in
cases such as the monotone "make check" unit tests; one which tries to
create UTF-8 filenames and then ISO-8859-1 filenames currently fail).

-- 
Lapo Luchini - http://lapo.it/

“Endure. In enduring, grow strong.” (Dak'kon, videogame "Torment", 1999)


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright 2019   by DJ Delorie     Updated Jul 2019