delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/05/08/09:02:26

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-0.9 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
Date: Fri, 8 May 2009 22:02:08 +0900
Message-ID: <3f0ad08d0905080602s36a9eddg852eaa3ea3a2a69f@mail.gmail.com>
Subject: [1.7][python] File operation API to multibyte filenames fails.
From: IWAMURO Motonori <deenheart AT gmail DOT com>
To: cygwin AT cygwin DOT com
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Hi.

File operation API to multibyte filenames fails on Python and Cygwin-1.7.
Which Python or Cygwin-1.7 should be fixed?

My environment: Windows XP SP3, Cygwin-1.7.0-46, and LANG=ja_JP.UTF-8

The following code fails on the directory which has multibyte filenames:

>>> import os
>>> os.listdir(".")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 138] Invalid or incomplete multibyte or wide character: '.'

The following code works correctly:

>>> import os
>>> import locale
>>> locale.setlocale(locale.LC_CTYPE, '')
'ja_JP.UTF-8'
>>> os.listdir(".")
[(snip), '\xe3\x82\xb9\xe3\x82\xbf\xe3\x83\xbc\xe3\x83\x88
\xe3\x83\xa1\xe3\x83\x8b\xe3\x83\xa5\xe3\x83\xbc',
'\xe3\x83\x87\xe3\x82\xb9\xe3\x82\xaf\xe3\x83\x88\xe3\x83\x83\xe3\x83\x97']

However, it is impossible to fix all the python scripts.

There are two causes.

- Python has intentionally evaded the execution of setlocale(LC_ALL,
"") and/or setlocale(LC_CTYPE, "").
- When locale is not appropriately set, Cygwin-1.7 converts non-ASCII
character into a special sequence. (see "Convert chars invalid in the
current codepage to a sequence ASCII SO" part of sys_cp_wcstombs in
winsup/cygwin/strfuncs.cc)

Which Python or Cygwin-1.7 should be fixed?
-- 
IWAMURO Motnori <http://vmi.jp/>

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019