X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Fri, 8 May 2009 15:09:01 +0200 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: [1.7][python] File operation API to multibyte filenames fails. Message-ID: <20090508130901.GL21324@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <3f0ad08d0905080602s36a9eddg852eaa3ea3a2a69f AT mail DOT gmail DOT com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3f0ad08d0905080602s36a9eddg852eaa3ea3a2a69f@mail.gmail.com> User-Agent: Mutt/1.5.19 (2009-02-20) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On May 8 22:02, IWAMURO Motonori wrote: > Hi. > > File operation API to multibyte filenames fails on Python and Cygwin-1.7. > Which Python or Cygwin-1.7 should be fixed? > > My environment: Windows XP SP3, Cygwin-1.7.0-46, and LANG=ja_JP.UTF-8 > > The following code fails on the directory which has multibyte filenames: > > >>> import os > >>> os.listdir(".") > Traceback (most recent call last): > File "", line 1, in > OSError: [Errno 138] Invalid or incomplete multibyte or wide character: '.' > > The following code works correctly: > > >>> import os > >>> import locale > >>> locale.setlocale(locale.LC_CTYPE, '') > 'ja_JP.UTF-8' > >>> os.listdir(".") > [(snip), '\xe3\x82\xb9\xe3\x82\xbf\xe3\x83\xbc\xe3\x83\x88 > \xe3\x83\xa1\xe3\x83\x8b\xe3\x83\xa5\xe3\x83\xbc', > '\xe3\x83\x87\xe3\x82\xb9\xe3\x82\xaf\xe3\x83\x88\xe3\x83\x83\xe3\x83\x97'] > > However, it is impossible to fix all the python scripts. > > There are two causes. > > - Python has intentionally evaded the execution of setlocale(LC_ALL, > "") and/or setlocale(LC_CTYPE, ""). > - When locale is not appropriately set, Cygwin-1.7 converts non-ASCII > character into a special sequence. (see "Convert chars invalid in the > current codepage to a sequence ASCII SO" part of sys_cp_wcstombs in > winsup/cygwin/strfuncs.cc) > > Which Python or Cygwin-1.7 should be fixed? Your scripts. Python correctly doesn't use setlocale because it's the responsibility of the application to set the local if it uses non-ASCII chars. And Cygwin simply has no chance to convert UTF-8 to UTF-16 if the application doesn't ask for UTF-8. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/