Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com X-WM-Posted-At: avacado.atomice.net; Tue, 2 Jul 02 22:50:32 +0100 Message-ID: <01f801c22212$7d0cecf0$0100a8c0@advent02> From: "Chris January" To: , References: <20020701085851 DOT GD9092 AT niksula DOT cs DOT hut DOT fi> <20020702213825 DOT GF9092 AT niksula DOT cs DOT hut DOT fi> Subject: Re: Accessing filenames with different charsets Date: Tue, 2 Jul 2002 22:50:31 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 > > Sorry if this has already been discussed, but I couldn't find it in the > > archive nor in the FAQ... > > > > If I have a file name with Russian characters in it, cygwin is unable to > > access it: > > > > > ls > > ????.TEST > > > > (Russian characters are shown as '?' in directory listing, but ls does find > > the file). > > > > If I try to access it, however, open fails: > > > > > touch * > > touch: '????.TEST': no such file or directory > > > > same deal with less, cp, rm, rsync etc. > > Okay, it seems cygwin readdir() returns the filenames as "????.TEST" (where > ?:s are really ?:s (ascii 0x3f)). Looking at fhandler_disk_file.cc, this > can't be caused by much else than by FindFirstFileA() returning "????.TEST". > And indeed, if made a little non-unicode test program, that called > FindFirstFile, and it returned "????.TEST" ("\0x3f\0x3f\0x3f\0x3f.TEST"). > > To access the file, the wide char versions of Find*File() functions would > propably have to be used (or is there another way?). I can't no idea how this > could be integrated into the cygwin framework... > > Any ideas? Qt (from Trolltech) encodes Unicode filenames before they are used. In Cygwin we could do the reverse, i.e. use Find*FileW and then encode the Unicode as a local ANSI string. If we do the encoding manually in Cygwin, rather than let Windows do it for us, this would overcome the problem. I will try to put together a patch for this that you can test. One possibility is to encode Unicode strings as UTF-8. Chris -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/