X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; q=dns; s= default; b=glv0iAdbukd+8DvVwPsijIb9+rxpbp6gt6YNW9AcCFn2u0xY61BvP 9v1edB5hxAYdHbIKZuP+K9TUUy2ibKHmnj6NkOzM4iBl3IxuGLinkDpN9MxpqGUy t0zTLiCJ3mEJ5/Etm58i5xox80M+CsviYYbVDyF1ZV3MMHyifa5xbU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; s=default; bh=rpj8X4ZGMCFtsIXFrTSGHdCWUK8=; b=N70IwyEIZA8mwhT+PQIC4l5mbp8P qYDBOwFCNh6TLmZZt2EYZ0eg7xnJ7WKehnQifjYnGK+AzH5hnWiC9RZ1kkwUy17J kLfQrYLlRKmbbsZqE/OOHoWkc3ZnhH3pLqoO4NXy262hrh4dZvHry1gjFpgO7f4b hYF3F825rclhGaU= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=2.9 required=5.0 tests=AWL,BAYES_00,GARBLED_BODY autolearn=no version=3.3.2 X-HELO: calimero.vinschen.de Date: Tue, 7 Jan 2014 18:14:46 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: command line argument parsing get extra ^X for Chinese characters when started from native win app Message-ID: <20140107171446.GL2440@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="6lXr1rPCNTf1w0X8" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) --6lXr1rPCNTf1w0X8 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Dec 24 15:36, Xuefer wrote: > tested with > $ uname -a > CYGWIN_NT-6.1 mOo-PC 1.7.27(0.271/5/3) 2013-12-09 11:54 x86_64 Cygwin >=20 > run the following code in .bat file, the file should be in GBK > encoding. as your system should be GBK encoding by default to parse > the batch file correctly > or copy paste the code to start->run > =3D=3D[ to get actual wrong output ] > c:\app\cygwin\bin\env LANG=3Dzh_CN.UTF-8 PATH=3D/usr/bin bash -c "echo = =E4=B8=AD=E6=96=87; > echo =E4=B8=AD=E6=96=87 > a.txt; cat a.txt; xxd a.txt; echo please vim a.= txt; sh" > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > =3D=3D[ actual output ] > =E4=B8=AD =E6=96=87 > =E4=B8=AD =E6=96=87 > 0000000: 18e4 b8ad 18e6 9687 0a ......... > please vim a.txt > sh-4.1$ > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > now when you do "vim a.txt", you see > a.txt > ^X=E4=B8=AD^X=E6=96=87 I'm sorry, but I have a hard time testing this. I don't have a system, which allows to switch the console to codepage 936, which would be required to give this a try. Also, the a.bat.txt file you attached to your mail seems to be broken. The characters in the `echo' commands seem to consist of four 0x3f hex values, which is probably not what you wanted. This doesn't look like valid GBK encoding. I have a hunch what the problem might be, though. When you start the batch file, you don't have any POSIX environment variable set to tell Cygwin which codeset you're using. The first process started here is `env'. When you set LANG, it's env doing this, but it does so only *after* reading the command line. Env itself will use what is set in the environment prior to starting env. So when env evaluates the command line, it assumes that the Cygwin locale is supposed to be set to "C" or "POSIX", which is ASCII-only per POSIX. In that case, all non-ASCII chars in the input will be converted to replacement byte values, starting with ^X (=3D=3D 0x18), followed by the UTF-8 value of the input character. That's what you see. If my hunch is more or less correct, a workaround would be to make sure the LANG or LC_CTYPE variable is set before calling the first Cygwin process. So, please change your bat file to something like this and try again: set LC_CTYPE=3Dzh_CN.UTF-8 c:\app\cygwin\bin\env PATH=3D/usr/bin bash -c "echo =E4=B8=AD=E6=96=87; echo =E4=B8=AD=E6=96=87 > a.txt; cat a.txt; xxd a.txt; echo please vim a.= txt; sh" Corinna --=20 Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat --6lXr1rPCNTf1w0X8 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJSzDYGAAoJEPU2Bp2uRE+gZAIP/3xJhbjQvLYdTKgt0ZZ+xSAR bmFXjhKrZL0c5EHx05dofhTlJwV+S+8ZBqpW8O4uB3RC/GR7qlO7Od10vvb97iEU MC6JXTtpXTsGOrTLde+wtmt4S0Andgx0plhtBhCy0LbN+Uld+5N/56hEnSL8/b+k 0A1J8EABfyGX39Gbu6dQQ+Yr46POv4v+bND/8Qn4WD/FFwhBZwS+kcRbd0/MeBLK poMO0pwYmUMAdZXvGTQQlT3x0deXSy2uChO/QNcBA6HaN3I34GVjnLRTkwagMHcp ORw8s+M/ni+0TgcgChBgKpTGXIXwXyP0vm4OVUBDJMdLt5KzIcY9MTFQHAxyVlA7 gptDgskUeY4L1MQdkmdQG3u9evFj66pH+DbWjzW9gh5Fr1W3bolYTS68WJERtOW6 AfVCssPAA9LTf1mhOECoPyWiXrhZgLRrmh/rqZWvZ8hf3avOpUQ4U8mhYWcZXbUl fq0VSjx0XlVK5TwYJI6xxmjdL9gSBaO7Hh2YsbO1kDbLPY0d8HOP087ijnhqtdxn sOa05Kc6j6tM2OGjOVc6AzGzxnB3cTaCfj+ydPN5CFc8D23XzqBw7YOTQDCX6dmD UqITW8cBRBfftONWmtPyKhiTr0h2xeEqoTRpU4bUUeRX5fI3N9QpLYWK5R1sMKhx k0DLgfHc9e7DuVtrtGKI =bI/S -----END PGP SIGNATURE----- --6lXr1rPCNTf1w0X8--