delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/05/09/06:02:58

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Sat, 9 May 2009 12:02:31 +0200
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: Cygwin programs doesn't support non-ASCII filenames
Message-ID: <20090509100231.GR21324@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <gu2u4o$f2i$3 AT ger DOT gmane DOT org>
MIME-Version: 1.0
In-Reply-To: <gu2u4o$f2i$3@ger.gmane.org>
User-Agent: Mutt/1.5.19 (2009-02-20)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

[Repeated and additional question.  I accidentally sent this as PM.
 Sorry about that.  Let's keep this on the list, please]

On May  9 11:43, Lenik wrote:
> (My system locale is zh_CN)

What ANSI codepage is that?

And what OEM codepage uses the console Window by default?

> 1, test path
>     >>> set LANG=& cygpath -am .
>     C:/Profiles/Shecti/??????
>
>     >>> set LANG=zh_CN.GBK& cygpath -am .
>     C:/Profiles/Shecti/??????
>
>     >>> set LANG=C& cygpath -am .
>     C:/Profiles/Shecti/×ÀÃæ

Can you please give us the exact name of the directory in either
UTF-8 or UTF-16 notation?

> 2, the `test' utility
>     >>> set LANG=& bash -c "D=$(cygpath -am .); if [ -d $D ]; then echo  
> ok $D; else echo fail $D; fi"
>     fail C:/Profiles/Shecti/??????

What you're actually testing here all the time is cygpath in the first
place.  If you stop using cygpath, start a bash shell and use the Cygwin
commands with the paths in POSIX notation, you would have much less
trouble.  Cygwin is a POSIX emulation layer, after all.

If you give me the above information I'll look into fixing cygpath.

>     The GB2312 charset is a subset of GBK charset, and the characters `  
> ??????' is included in GB2312 charset. So in this example, GB2312 SHOULD 
> WORK.

Sorry, no.  It's documented that GBK is supported, GB2312 isn't.  From
what I read about GB2312 it's not actually a subset of GBK in terms
of character definitions, it's just a subset in terms of supported
characters.  AFAICS, GB2312 uses chars < 0x7f in multibyte sequences
which is not feasible for Cygwin.  We could support EUC-CN, which
seems to be another way to encode GB2312 chars, but I'm not exactly
willing to add that now.  I'd rather stabilize what we have now and
add further charset support in a later, official 1.7 release.

So you can use LANG=zh_CN.GBK, but not LANG=zh_CN.GB2312.  It's just
treated as invalid input.  Better: Use LANG=zh_CN.UTF-8.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019