Mail Archives: cygwin/2009/03/19/13:13:47

delorie.com/archives/browse.cgi

search

Mail Archives: cygwin/2009/03/19/13:13:47

X-Recipient: archive-cygwin AT delorie DOT com

X-Spam-Check-By: sourceware.org

Date: Thu, 19 Mar 2009 19:13:23 +0100

From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>

To: cygwin AT cygwin DOT com

Subject: Re: Q: Is anybody here using the CYGWIN=codepage:oem setting?

Message-ID: <20090319181323.GB1868@calimero.vinschen.de>

Reply-To: cygwin AT cygwin DOT com

Mail-Followup-To: cygwin AT cygwin DOT com

References: <20090319130909 DOT GZ9322 AT calimero DOT vinschen DOT de> <49C281F7 DOT 6080602 AT acm DOT org>

MIME-Version: 1.0

In-Reply-To: <49C281F7.6080602@acm.org>

User-Agent: Mutt/1.5.19 (2009-02-20)

Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm

List-Id: <cygwin.cygwin.com>

List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>

List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>

List-Archive: <http://sourceware.org/ml/cygwin/>

List-Post: <mailto:cygwin AT cygwin DOT com>

List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>

Sender: cygwin-owner AT cygwin DOT com

Mail-Followup-To: cygwin AT cygwin DOT com

Delivered-To: mailing list cygwin AT cygwin DOT com

On Mar 19 10:33, David Rothenberger wrote:
> On 3/19/2009 6:09 AM, Corinna Vinschen wrote:
> > If you've set $LANG to, say, "en_US.UTF-8", Cygwin would use the UTF-8
> > charset *iff* the application switched the codepage by calling something
> > along the lines of `setlocale(LC_ALL, "");'.
> > An application which does not call setlocale (which means, it's not
> > native language aware anyway) would still use the default ANSI codepage.
> 
> First, please forgive my ignorance about LC_ALL, LANG, etc.
> 
> I ran into an issue yesterday where I was trying to "du -sh" a directory
> that contained files whose names included UTF characters, I think.
> Without CYGWIN=codepage:utf8, this failed. It worked fine when I added
> CYGWIN=codepage:utf8.

Yes, sure.  As described in the User's Guide.  That's exactly what bugs
me right now.  To get UTF-8 support you have to set LANG or LC_ALL or
whatever, *and* CYGWIN=codepage:utf8.

I *think* we can get rid of the codepage setting in favor of the
$LANG/$LC_foo setting, but we couldn't support both, ANSI and OEM
codepages anymore in this case.  In the long run I'm looking into not
using the ANSI/OEM codepages at all, though, but instead have real, full
locale support.  But that's a dream of the future.

> So my question is, will this work if codepage is dropped and I set LANG
> to en_US.UTF-8? Is there anything in the Cygwin DLL itself that uses
> codepage that might be valuable to enable even for applications that
> aren't native language aware and don't call setlocale()?

Not exactly.  However, assuming you have a file using characters which
are not in your current ANSI codeset, then you could only manipulate
that file when setting LANG="xx_YY.UTF-8", and only in applications
which call setlocale().

In contrast to UNIX systems, we have the problem that the underlying
filesystems are using the UTF-16 charset for filenames.  So we must
convert from the used singlebyte or multibyte charset to wide character.
Other systems don't care, the filename is just a byte stream.  On
Windows, you always have a conversion step which requires to know the
multibyte character set.  There's no way to convert a wide character
string into a multibyte string without knowing that charset.

Of course, what we could do is to call setlocale from within Cygwin so
we always have a base for the conversion, whether or not the application
calls it again.  In theory this should not affect applications which
don't call setlocale since these applications are like other OSes; they
handle the filename as a simple bytestream.

The problem: I'm not really sure calling setlocale in Cygwin is a good
idea.  Maybe there's some downside I just don't see right now.

Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -

webmaster	delorie software privacy
Copyright © 2019 by DJ Delorie	Updated Jul 2019

X-Recipient:	archive-cygwin AT delorie DOT com
X-Spam-Check-By:	sourceware.org
Date:	Thu, 19 Mar 2009 19:13:23 +0100
From:	Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To:	cygwin AT cygwin DOT com
Subject:	Re: Q: Is anybody here using the CYGWIN=codepage:oem setting?
Message-ID:	<20090319181323.GB1868@calimero.vinschen.de>
Reply-To:	cygwin AT cygwin DOT com
Mail-Followup-To:	cygwin AT cygwin DOT com
References:	<20090319130909 DOT GZ9322 AT calimero DOT vinschen DOT de> <49C281F7 DOT 6080602 AT acm DOT org>
MIME-Version:	1.0
In-Reply-To:	<49C281F7.6080602@acm.org>
User-Agent:	Mutt/1.5.19 (2009-02-20)
Mailing-List:	contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id:	<cygwin.cygwin.com>
List-Unsubscribe:	<mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe:	<mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive:	<http://sourceware.org/ml/cygwin/>
List-Post:	<mailto:cygwin AT cygwin DOT com>
List-Help:	<mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender:	cygwin-owner AT cygwin DOT com
Mail-Followup-To:	cygwin AT cygwin DOT com
Delivered-To:	mailing list cygwin AT cygwin DOT com