Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe@cygwin.com>
List-Archive: <http://sources.redhat.com/ml/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-help@cygwin.com>, <http://sources.redhat.com/ml/#faqs>
Sender: cygwin-owner@cygwin.com
Mail-Followup-To: cygwin@cygwin.com
Delivered-To: mailing list cygwin@cygwin.com
X-WM-Posted-At: avacado.atomice.net; Tue, 2 Jul 02 22:50:32 +0100
Message-ID: <01f801c22212$7d0cecf0$0100a8c0@advent02>
From: "Chris January" <chris@atomice.net>
To: <cygwin@cygwin.com>, <v@iki.fi>
References: <20020701085851.GD9092@niksula.cs.hut.fi> <20020702213825.GF9092@niksula.cs.hut.fi>
Subject: Re: Accessing filenames with different charsets
Date: Tue, 2 Jul 2002 22:50:31 +0100
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000

> > Sorry if this has already been discussed, but I couldn't find it in the
> > archive nor in the FAQ...
> >
> > If I have a file name with Russian characters in it, cygwin is unable to
> > access it:
> >
> > > ls
> > ????.TEST
> >
> > (Russian characters are shown as '?' in directory listing, but ls does
find
> > the file).
> >
> > If I try to access it, however, open fails:
> >
> > > touch *
> > touch: '????.TEST': no such file or directory
> >
> > same deal with less, cp, rm, rsync etc.
>
> Okay, it seems cygwin readdir() returns the filenames as "????.TEST"
(where
> ?:s are really ?:s (ascii 0x3f)). Looking at fhandler_disk_file.cc, this
> can't be caused by much else than by FindFirstFileA() returning
"????.TEST".
> And indeed, if made a little non-unicode test program, that called
> FindFirstFile, and it returned "????.TEST" ("\0x3f\0x3f\0x3f\0x3f.TEST").
>
> To access the file, the wide char versions of Find*File() functions would
> propably have to be used (or is there another way?). I can't no idea how
this
> could be integrated into the cygwin framework...
>
> Any ideas?
Qt (from Trolltech) encodes Unicode filenames before they are used. In
Cygwin we could do the reverse, i.e. use Find*FileW and then encode the
Unicode as a local ANSI string. If we do the encoding manually in Cygwin,
rather than let Windows do it for us, this would overcome the problem. I
will try to put together a patch for this that you can test. One possibility
is to encode Unicode strings as UTF-8.

Chris



--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

