delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2010/02/20/04:17:26

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Sat, 20 Feb 2010 10:17:10 +0100
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: 1.7.1: unable to run the a bash script resides in chinese path using: c:\cygwin\bin\bash --login script.
Message-ID: <20100220091710.GI5683@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <t94sn59ntooeal9hc0a25hkk7ntphg99cf AT 4ax DOT com> <c6fsn5ln6bdtgr86bp3ri44ui48kf57ica AT 4ax DOT com> <416096c61002191229x670cbb63gf5c693056af727a2 AT mail DOT gmail DOT com> <drmun5969k15jlm1ji2auh5cojrnakc6uu AT 4ax DOT com> <416096c61002200000r549264c4tfdf46a9b71700bc AT mail DOT gmail DOT com>
MIME-Version: 1.0
In-Reply-To: <416096c61002200000r549264c4tfdf46a9b71700bc@mail.gmail.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Feb 20 08:00, Andy Koppe wrote:
> Hongyi Zhao:
> >>Looks like there's some sort of GBK vs UTF-8 mixup going on, because
> >>'??????????????????' is the same byte sequence in GBK as '????????????' is in UTF-8:
> >>\xE6\x96\xB0\xE6\x9F\xA5\xE6\x96\x87\xE7\x8C\xAE
> >
> > Could you please give me some hints on the tools
> > used by you to obtain this conclusion?
> 
> That was just a hunch based on the length of the two strings, and I
> confirmed it by pasting the strings into mintty running a utility for
> echoing keycodes, switching charset as appropriate.
> 
> Anyway, I had a look into why the dosfilewarning prints the wrong
> filename: it calls small_sprintf to print the message, and
> small_sprintf uses the ANSI version of WriteFile to write to
> STD_ERROR_HANDLE, so it ends up interpreting a UTF-8 string as GBK.
> Seems sys_mbstowcs and WriteFileW are needed there.

There's no such thing as a WriteFileW function.  Since that only affects
a few error messages, I don't think it's overly important.  The most
simple approach here is to enforce hex printing of all characters > 0x7f
as hex values as in:

  MS-DOS style path detected: \tmp\t\\xC3\xB6\xC3\xA4
  Preferred POSIX equivalent is: /tmp/t/\xC3\xB6\xC3\xA4
  [...]

I've changed that in CVS.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019