delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2002/07/03/05:59:14

Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sources.redhat.com/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sources.redhat.com/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
X-WM-Posted-At: avacado.atomice.net; Wed, 3 Jul 02 10:58:39 +0100
Message-ID: <004e01c22278$346ec610$0100a8c0@advent02>
From: "Chris January" <chris AT atomice DOT net>
To: "Ville Herva" <vherva AT niksula DOT hut DOT fi>
Cc: <cygwin AT cygwin DOT com>
References: <20020701085851 DOT GD9092 AT niksula DOT cs DOT hut DOT fi> <20020702213825 DOT GF9092 AT niksula DOT cs DOT hut DOT fi> <01f801c22212$7d0cecf0$0100a8c0 AT advent02> <20020703093837 DOT GI9092 AT niksula DOT cs DOT hut DOT fi>
Subject: Re: Accessing filenames with different charsets
Date: Wed, 3 Jul 2002 10:58:38 +0100
MIME-Version: 1.0
X-Priority: 3
X-MSMail-Priority: Normal
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2600.0000

> > Qt (from Trolltech) encodes Unicode filenames before they are used. In
> > Cygwin we could do the reverse, i.e. use Find*FileW and then encode the
> > Unicode as a local ANSI string. If we do the encoding manually in
Cygwin,
> > rather than let Windows do it for us, this would overcome the problem. I
> > will try to put together a patch for this that you can test. One
possibility
> > is to encode Unicode strings as UTF-8.
>
> Another idea that comes into mind: use the cAlternateFileName field from
> WIN32_FIND_DATA - that is, the 8.3 filename. I tried it, and I can access
> the file via it's 8.3 name in cygwin:
>
>   wc F305~1.TES
>     318    1214   10141 F305~1.TES
>
> So all that'd have to be done is make cygwin readdir (and friends) return
> the 8.3 name if the normal name is inaccessible (different charset, too
long
> name... I'm not yet sure how to detect this).
>
> The advantage over encoding the wide char name somehow is that the 8.3
name
> is usable in DOS/windows as well. The disadvantage is that a name like
> F305~1 doesn't really tell anything about the real filename. And, if you
> back it up (say, with tar), and then restore it, you lose the original
name.
> While not perfect that's still better than losing the whole file.
I wrote a patch for Cygwin yesterday that converts Unicode filenames to UTF8
and back for some file operations. This should do what you want and allow
you to restore the names correctly later. I will post it to cygwin-patches
sometime today, but I'm not sure whether the patch will appear in a Cygwin
snapshot anytime soon, if not I can send you the modified binary and the
patch directly for you to try. The only disadvantage with this method is it
still makes the filenames impossible to type. However, if you have your
terminal set up correctly, it is certainly possible to read them as they
should be (e.g. in xterm with UTF8 support turned on). If you are using a
graphical file browser like konqueror then that makes things even easier.

Regards
Chris




--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019