delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2014/01/07/12:25:44

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:to:subject:message-id:reply-to
:references:mime-version:content-type:in-reply-to; q=dns; s=
default; b=glv0iAdbukd+8DvVwPsijIb9+rxpbp6gt6YNW9AcCFn2u0xY61BvP
9v1edB5hxAYdHbIKZuP+K9TUUy2ibKHmnj6NkOzM4iBl3IxuGLinkDpN9MxpqGUy
t0zTLiCJ3mEJ5/Etm58i5xox80M+CsviYYbVDyF1ZV3MMHyifa5xbU=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:to:subject:message-id:reply-to
:references:mime-version:content-type:in-reply-to; s=default;
bh=rpj8X4ZGMCFtsIXFrTSGHdCWUK8=; b=N70IwyEIZA8mwhT+PQIC4l5mbp8P
qYDBOwFCNh6TLmZZt2EYZ0eg7xnJ7WKehnQifjYnGK+AzH5hnWiC9RZ1kkwUy17J
kLfQrYLlRKmbbsZqE/OOHoWkc3ZnhH3pLqoO4NXy262hrh4dZvHry1gjFpgO7f4b
hYF3F825rclhGaU=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=2.9 required=5.0 tests=AWL,BAYES_00,GARBLED_BODY autolearn=no version=3.3.2
X-HELO: calimero.vinschen.de
Date: Tue, 7 Jan 2014 18:14:46 +0100
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: command line argument parsing get extra ^X for Chinese characters when started from native win app
Message-ID: <20140107171446.GL2440@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <CAMs-qv9Lk1a8K6b_bJ3a_EBRxSXo32N69+f934oWD7pk3wrWLA AT mail DOT gmail DOT com>
MIME-Version: 1.0
In-Reply-To: <CAMs-qv9Lk1a8K6b_bJ3a_EBRxSXo32N69+f934oWD7pk3wrWLA@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)

--6lXr1rPCNTf1w0X8
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Dec 24 15:36, Xuefer wrote:
> tested with
> $ uname -a
> CYGWIN_NT-6.1 mOo-PC 1.7.27(0.271/5/3) 2013-12-09 11:54 x86_64 Cygwin
>=20
> run the following code in .bat file, the file should be in GBK
> encoding. as your system should be GBK encoding by default to parse
> the batch file correctly
> or copy paste the code to start->run
> =3D=3D[ to get actual wrong output ]
> c:\app\cygwin\bin\env LANG=3Dzh_CN.UTF-8 PATH=3D/usr/bin bash -c "echo =
=E4=B8=AD=E6=96=87;
> echo =E4=B8=AD=E6=96=87 > a.txt; cat a.txt; xxd a.txt; echo please vim a.=
txt; sh"
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>=20
> =3D=3D[  actual output ]
>  =E4=B8=AD =E6=96=87
>  =E4=B8=AD =E6=96=87
> 0000000: 18e4 b8ad 18e6 9687 0a                   .........
> please vim a.txt
> sh-4.1$
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> now when you do "vim a.txt", you see
> a.txt
> ^X=E4=B8=AD^X=E6=96=87

I'm sorry, but I have a hard time testing this.  I don't have a system,
which allows to switch the console to codepage 936, which would be
required to give this a try.  Also, the a.bat.txt file you attached to
your mail seems to be broken.  The characters in the `echo' commands
seem to consist of four 0x3f hex values, which is probably not what you
wanted.  This doesn't look like valid GBK encoding.

I have a hunch what the problem might be, though.

When you start the batch file, you don't have any POSIX environment
variable set to tell Cygwin which codeset you're using.  The first
process started here is `env'.  When you set LANG, it's env doing this,
but it does so only *after* reading the command line.  Env itself will
use what is set in the environment prior to starting env.  So when env
evaluates the command line, it assumes that the Cygwin locale is
supposed to be set to "C" or "POSIX", which is ASCII-only per POSIX.  In
that case, all non-ASCII chars in the input will be converted to
replacement byte values, starting with ^X (=3D=3D 0x18), followed by the
UTF-8 value of the input character.  That's what you see.

If my hunch is more or less correct, a workaround would be to make sure
the LANG or LC_CTYPE variable is set before calling the first Cygwin
process.  So, please change your bat file to something like this and try
again:

  set LC_CTYPE=3Dzh_CN.UTF-8
  c:\app\cygwin\bin\env PATH=3D/usr/bin bash -c "echo =E4=B8=AD=E6=96=87;
  echo =E4=B8=AD=E6=96=87 > a.txt; cat a.txt; xxd a.txt; echo please vim a.=
txt; sh"


Corinna

--=20
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

--6lXr1rPCNTf1w0X8
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJSzDYGAAoJEPU2Bp2uRE+gZAIP/3xJhbjQvLYdTKgt0ZZ+xSAR
bmFXjhKrZL0c5EHx05dofhTlJwV+S+8ZBqpW8O4uB3RC/GR7qlO7Od10vvb97iEU
MC6JXTtpXTsGOrTLde+wtmt4S0Andgx0plhtBhCy0LbN+Uld+5N/56hEnSL8/b+k
0A1J8EABfyGX39Gbu6dQQ+Yr46POv4v+bND/8Qn4WD/FFwhBZwS+kcRbd0/MeBLK
poMO0pwYmUMAdZXvGTQQlT3x0deXSy2uChO/QNcBA6HaN3I34GVjnLRTkwagMHcp
ORw8s+M/ni+0TgcgChBgKpTGXIXwXyP0vm4OVUBDJMdLt5KzIcY9MTFQHAxyVlA7
gptDgskUeY4L1MQdkmdQG3u9evFj66pH+DbWjzW9gh5Fr1W3bolYTS68WJERtOW6
AfVCssPAA9LTf1mhOECoPyWiXrhZgLRrmh/rqZWvZ8hf3avOpUQ4U8mhYWcZXbUl
fq0VSjx0XlVK5TwYJI6xxmjdL9gSBaO7Hh2YsbO1kDbLPY0d8HOP087ijnhqtdxn
sOa05Kc6j6tM2OGjOVc6AzGzxnB3cTaCfj+ydPN5CFc8D23XzqBw7YOTQDCX6dmD
UqITW8cBRBfftONWmtPyKhiTr0h2xeEqoTRpU4bUUeRX5fI3N9QpLYWK5R1sMKhx
k0DLgfHc9e7DuVtrtGKI
=bI/S
-----END PGP SIGNATURE-----

--6lXr1rPCNTf1w0X8--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019