X-Recipient: archive-cygwin@delorie.com
X-SWARE-Spam-Status: No, hits=-0.5 required=5.0 	tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,RCVD_NUMERIC_HELO,SPF_HELO_PASS,SPF_PASS
X-Spam-Check-By: sourceware.org
To: cygwin@cygwin.com
From: Lenik <lenik@bodz.net>
Subject:  Re: Cygwin programs doesn't support non-ASCII filenames
Date:  Tue, 12 May 2009 23:07:47 +0800
Lines: 33
Message-ID: <guc3cc$8kf$1@ger.gmane.org>
References:  <gu2u4o$f2i$3@ger.gmane.org> <20090509100231.GR21324@calimero.vinschen.de> <gu46gf$3tf$1@ger.gmane.org> <20090512135424.GT21324@calimero.vinschen.de>
Mime-Version:  1.0
Content-Type:  text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding:  8bit
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b3pre) Gecko/20090223 Thunderbird/3.0b2
In-Reply-To: <20090512135424.GT21324@calimero.vinschen.de>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm
Precedence: bulk
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie.com@cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe@cygwin.com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-help@cygwin.com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner@cygwin.com
Mail-Followup-To: cygwin@cygwin.com
Delivered-To: mailing list cygwin@cygwin.com
Note-from-DJ: This may be spam

On 2009-5-12 21:54, Corinna Vinschen wrote:
> On May  9 23:12, Lenik wrote:
>> (This mail is encoded in utf-8)
>> [...]
>> The two chinese characters encoding in:
>> GB2312: d7 c0 c3 e6
>> UTF-8: e6 a1 8c e9 9d a2
>> Unicode: \u684c \u9762
>> [...]
>> This is a new test don't use cygpath:
>>      C:\Profiles\Shecti>  set LANG=&  bash -c "cat ??????"
>>      cat: ??????: No such file or directory
>
> I'm just looking into this issue and I do not quite understand how you
> came up with the filename in this example.  Above you mention that the
> mail is in UTF-8.  However, when I look into this email using `od -t
> x1', the multibyte sequence in your example is e4 bd a0 e5 a5 bd, rather
> than the aforementioned UTF-8 sequence e6 a1 8c e9 9d a2.  Nor does it
> match the aforementioned GB2312 sequence d7 c0 c3 e6.  Can you please
> explain how the multibyte sequence in the example is related to the
> above GB2312 and UTF-8 sequences?
>
>
> Corinna
>
Sorry, there are two examples, the first using 桌面, and the second 
using 你好. You may test either.

桌面：e6 a1 8c e9 9d a2, GB2312=d7 c0 c3 e6
你好：e4 bd a0 e5 a5 bd, GB2312=c4 e3 ba c3

Thanks,
Lenik


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

