X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: sourceware.org To: cygwin AT cygwin DOT com From: Tuomo Valkonen Subject: Re: default encoding (was: Re: GNU screen hangs) Date: Sun, 30 Aug 2009 18:59:55 +0000 (UTC) Lines: 60 Message-ID: References: <416096c60908301114r62d7cad5qb167910ac97c278e AT mail DOT gmail DOT com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit User-Agent: slrn/pre1.0.0-2 (Linux) X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On 2009-08-30, Andy Koppe wrote: > If a locale is specified without an encoding, Cygwin 1.7 uses the > Windows system's default "ANSI" codepage, i.e. CP1252 or such like. > > Presumably X implements the encodings itself rather than use > setlocale(LC_CTYPE, "") and rely on the standard conversion functions? > Hence, for proper interoperability, it would need to duplicate the > fallback to the Windows ANSI codepage as well. > > Unfortunately there doesn't seem to be a standard interface for > finding out what charset is being used with a locale setting that > doesn't explicitly specify one. I have LC_CTYPE=en_US.UTF-8, of course. And still Xlib fails. >> Another problem is that a after an upgrade a couple of >> months, various Python software (duplicity and eyeD3 at >> least) stopped working with  UTF-8 file names (and probably >> other input too). This is fixed by adding the call >> >>  locale.setlocale(locale.LC_CTYPE, "") >> >> in the programs. Not sure where the fault is, or if it >> has been fixed by now. > > Strictly speaking, the default "C" locale is ASCII only, so programs > shouldn't rely on anything that happens to be working on a particular > system. Having said that, handling of non-ASCII characters in Cygwin's > C locale has indeed changed. Not sure how and why though. See my "The > C Locale" post. I'm not sure how this is relevant. The problem seems to be that since that one update (might have been a minor version change in Python), Python programs aren't in multibyte/locale-aware mode by default anymore, which that call above enables, my setting being LC_CTYPE=en_US.UTF-8. Now, the question is whether 1. Have Cygwin packagers somehow disabled the Python interpreter from calling setlocale? 2. Or has it been disabled in Python entirely? There was no problem previously. I think the Python interpreter should call setlocale, instead of having Python programs themselves do it, because it is half-an-OS and does lots of character set mangling, that Python software shouldn't have to be aware of. Anyway, I think this problem may have been fixed already -- not 100% certain -- since eyeD3 no longer dies on some tested file names that do not fit into the ASCII range, and I never hacked it to include the setlocale call, just some custom id3 tag backup scripts using its library. -- Stop Gnomes and other pests! Purchase Windows today! http://iki.fi/tuomov/b/archives/2009/07/21/T17_26_09/ -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple