delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/05/08/12:04:48

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Fri, 8 May 2009 18:04:15 +0200
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: [1.7][python] File operation API to multibyte filenames fails.
Message-ID: <20090508160415.GM21324@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <3f0ad08d0905080602s36a9eddg852eaa3ea3a2a69f AT mail DOT gmail DOT com> <20090508130901 DOT GL21324 AT calimero DOT vinschen DOT de> <3f0ad08d0905080621j2b1f97b9p317ee1df0f1dfc76 AT mail DOT gmail DOT com>
MIME-Version: 1.0
In-Reply-To: <3f0ad08d0905080621j2b1f97b9p317ee1df0f1dfc76@mail.gmail.com>
User-Agent: Mutt/1.5.19 (2009-02-20)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On May  8 22:21, IWAMURO Motonori wrote:
> Hi.
> 
> 2009/5/8 Corinna Vinschen <corinna-cygwin AT cygwin DOT com>:
> > Your scripts.  Python correctly doesn't use setlocale because it's
> > the responsibility of the application to set the local if it uses
> > non-ASCII chars.  And Cygwin simply has no chance to convert UTF-8
> > to UTF-16 if the application doesn't ask for UTF-8.
> 
> Oh, it is very very difficult.
> Because ALL python utilities which access files or directories fail.
> For example, Mercurial doesn't work.

I can reproduce this issue and I created a simple application to
create your example filenames in the current dir (see below).

Given the python testcase

  import os
  os.listdir(".")

can't see a fault in Cygwin. Neither from strace, nor in a GDB session.
The readdir calls return the filenames using the SO sequences so that
a valid byte-stream is created which also works in the C locale.
However, for some reason there's a EILSEQ (138) errno generated, but
from what I can tell it's not generated in Cygwin or newlib code.

So I'd like to ask Jason, our python maintainer, to have a look into
that.  Maybe we just need a python rebuild for 1.7?


Corinna


This is the simple code I used to create the japanese filenames:

#include <fcntl.h>
#include <locale.h>

int main ()
{
  char file1[] = { 0xe3, 0x82, 0xb9, 0xe3, 0x82, 0xbf, 0xe3, 0x83, 0xbc, 
		   0xe3, 0x83, 0x88, 0xe3, 0x83, 0xa1, 0xe3, 0x83, 0x8b,
		   0xe3, 0x83, 0xa5, 0xe3, 0x83, 0xbc, 0 };
  char file2[] = { 0xe3, 0x83, 0x87, 0xe3, 0x82, 0xb9, 0xe3, 0x82, 0xaf,
		   0xe3, 0x83, 0x88, 0xe3, 0x83, 0x83, 0xe3, 0x83, 0x97, 0 };
  setlocale (LC_ALL, "en_US.UTF-8");
  int fd = open (file1, O_CREAT|O_RDWR, 0644);
  close (fd);
  fd = open (file2, O_CREAT|O_RDWR, 0644);
  close (fd);
  return 0;
}

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019