delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/08/30/15:00:36

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,SPF_PASS
X-Spam-Check-By: sourceware.org
To: cygwin AT cygwin DOT com
From: Tuomo Valkonen <tuomov AT iki DOT fi>
Subject: Re: default encoding (was: Re: GNU screen hangs)
Date: Sun, 30 Aug 2009 18:59:55 +0000 (UTC)
Lines: 60
Message-ID: <slrnh9lj1b.ens.tuomov@beer.modeemi.cs.tut.fi>
References: <416096c60908301114r62d7cad5qb167910ac97c278e AT mail DOT gmail DOT com>
Mime-Version: 1.0
User-Agent: slrn/pre1.0.0-2 (Linux)
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On 2009-08-30, Andy Koppe <andy DOT koppe AT gmail DOT com> wrote:
> If a locale is specified without an encoding, Cygwin 1.7 uses the
> Windows system's default "ANSI" codepage, i.e. CP1252 or such like.
>
> Presumably X implements the encodings itself rather than use
> setlocale(LC_CTYPE, "") and rely on the standard conversion functions?
> Hence, for proper interoperability, it would need to duplicate the
> fallback to the Windows ANSI codepage as well.
>
> Unfortunately there doesn't seem to be a standard interface for
> finding out what charset is being used with a locale setting that
> doesn't explicitly specify one.

I have LC_CTYPE=en_US.UTF-8, of course. And still Xlib fails.

>> Another problem is that a after an upgrade a couple of
>> months, various Python software (duplicity and eyeD3 at
>> least) stopped working with  UTF-8 file names (and probably
>> other input too). This is fixed by adding the call
>>
>>  locale.setlocale(locale.LC_CTYPE, "")
>>
>> in the programs. Not sure where the fault is, or if it
>> has been fixed by now.
>
> Strictly speaking, the default "C" locale is ASCII only, so programs
> shouldn't rely on anything that happens to be working on a particular
> system. Having said that, handling of non-ASCII characters in Cygwin's
> C locale has indeed changed. Not sure how and why though. See my "The
> C Locale" post.

I'm not sure how this is relevant. The problem seems to be
that since that one update (might have been a minor version
change in Python), Python programs aren't in 
multibyte/locale-aware mode by default anymore, which that 
call above enables, my setting being LC_CTYPE=en_US.UTF-8. 
Now, the question is whether

  1. Have Cygwin packagers somehow disabled the Python 
     interpreter from calling setlocale?

  2. Or has it been disabled in Python entirely? There 
     was no problem previously.

I think the Python interpreter should call setlocale,
instead of having Python programs themselves do it,
because it is half-an-OS and does lots of character
set mangling, that Python software shouldn't have to
be aware of.

Anyway, I think this problem may have been fixed already
-- not 100% certain -- since eyeD3 no longer dies on
some tested file names that do not fit into the ASCII 
range, and I never hacked it to include the setlocale 
call, just some custom id3 tag backup scripts using 
its library.

-- 
Stop Gnomes and other pests! Purchase Windows today!
  http://iki.fi/tuomov/b/archives/2009/07/21/T17_26_09/


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019