X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=2.1 required=5.0 tests=AWL,BAYES_20,BOTNET X-Spam-Check-By: sourceware.org Message-id: <49C29366.8080708@acm.org> Date: Thu, 19 Mar 2009 11:48:06 -0700 From: David Rothenberger User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.21) Gecko/20090302 Thunderbird/2.0.0.21 Mnenhy/0.7.6.666 MIME-version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: Q: Is anybody here using the CYGWIN=codepage:oem setting? References: <20090319130909 DOT GZ9322 AT calimero DOT vinschen DOT de> <49C281F7 DOT 6080602 AT acm DOT org> <20090319181323 DOT GB1868 AT calimero DOT vinschen DOT de> In-reply-to: <20090319181323.GB1868@calimero.vinschen.de> Content-type: text/plain; charset=ISO-8859-1; format=flowed Content-transfer-encoding: 7bit X-IsSubscribed: yes Reply-To: cygwin AT cygwin DOT com Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On 3/19/2009 11:13 AM, Corinna Vinschen wrote: > On Mar 19 10:33, David Rothenberger wrote: >> On 3/19/2009 6:09 AM, Corinna Vinschen wrote: >>> If you've set $LANG to, say, "en_US.UTF-8", Cygwin would use the UTF-8 >>> charset *iff* the application switched the codepage by calling something >>> along the lines of `setlocale(LC_ALL, "");'. >>> An application which does not call setlocale (which means, it's not >>> native language aware anyway) would still use the default ANSI codepage. >> >> I ran into an issue yesterday where I was trying to "du -sh" a directory >> that contained files whose names included UTF characters, I think. >> Without CYGWIN=codepage:utf8, this failed. It worked fine when I added >> CYGWIN=codepage:utf8. > > Yes, sure. As described in the User's Guide. That's exactly what bugs > me right now. To get UTF-8 support you have to set LANG or LC_ALL or > whatever, *and* CYGWIN=codepage:utf8. In my specific case, I didn't need to set LANG or LC_ALL, just CYGWIN=codepage:utf8. >> So my question is, will this work if codepage is dropped and I set LANG >> to en_US.UTF-8? Is there anything in the Cygwin DLL itself that uses >> codepage that might be valuable to enable even for applications that >> aren't native language aware and don't call setlocale()? > > Not exactly. However, assuming you have a file using characters which > are not in your current ANSI codeset, then you could only manipulate > that file when setting LANG="xx_YY.UTF-8", and only in applications > which call setlocale(). I have no idea whether du calls setlocale() or not. I think you're saying that today, with codepage:utf8, it is able to get sizes for files using non-ANSI characters, but if codepage is removed, it would not be able to do so unless it called setlocale(). Is that right? -- David Rothenberger ---- daveroth AT acm DOT org The Abrams' Principle: The shortest distance between two points is off the wall. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/