delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/03/19/13:48:29

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=2.1 required=5.0 tests=AWL,BAYES_20,BOTNET
X-Spam-Check-By: sourceware.org
Message-id: <49C29366.8080708@acm.org>
Date: Thu, 19 Mar 2009 11:48:06 -0700
From: David Rothenberger <daveroth AT acm DOT org>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.21) Gecko/20090302 Thunderbird/2.0.0.21 Mnenhy/0.7.6.666
MIME-version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: Q: Is anybody here using the CYGWIN=codepage:oem setting?
References: <20090319130909 DOT GZ9322 AT calimero DOT vinschen DOT de> <49C281F7 DOT 6080602 AT acm DOT org> <20090319181323 DOT GB1868 AT calimero DOT vinschen DOT de>
In-reply-to: <20090319181323.GB1868@calimero.vinschen.de>
X-IsSubscribed: yes
Reply-To: cygwin AT cygwin DOT com
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On 3/19/2009 11:13 AM, Corinna Vinschen wrote:
> On Mar 19 10:33, David Rothenberger wrote:
>> On 3/19/2009 6:09 AM, Corinna Vinschen wrote:
>>> If you've set $LANG to, say, "en_US.UTF-8", Cygwin would use the UTF-8
>>> charset *iff* the application switched the codepage by calling something
>>> along the lines of `setlocale(LC_ALL, "");'.
>>> An application which does not call setlocale (which means, it's not
>>> native language aware anyway) would still use the default ANSI codepage.
>>
>> I ran into an issue yesterday where I was trying to "du -sh" a directory
>> that contained files whose names included UTF characters, I think.
>> Without CYGWIN=codepage:utf8, this failed. It worked fine when I added
>> CYGWIN=codepage:utf8.
> 
> Yes, sure.  As described in the User's Guide.  That's exactly what bugs
> me right now.  To get UTF-8 support you have to set LANG or LC_ALL or
> whatever, *and* CYGWIN=codepage:utf8.

In my specific case, I didn't need to set LANG or LC_ALL, just 
CYGWIN=codepage:utf8.

>> So my question is, will this work if codepage is dropped and I set LANG
>> to en_US.UTF-8? Is there anything in the Cygwin DLL itself that uses
>> codepage that might be valuable to enable even for applications that
>> aren't native language aware and don't call setlocale()?
> 
> Not exactly.  However, assuming you have a file using characters which
> are not in your current ANSI codeset, then you could only manipulate
> that file when setting LANG="xx_YY.UTF-8", and only in applications
> which call setlocale().

I have no idea whether du calls setlocale() or not. I think you're 
saying that today, with codepage:utf8, it is able to get sizes for files 
using non-ANSI characters, but if codepage is removed, it would not be 
able to do so unless it called setlocale(). Is that right?

-- 
David Rothenberger  ----  daveroth AT acm DOT org

The Abrams' Principle:
         The shortest distance between two points is off the wall.


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019