delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/05/28/16:24:01

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=0.7 required=5.0 tests=AWL,BAYES_05,J_CHICKENPOX_41,SPF_PASS
X-Spam-Check-By: sourceware.org
Message-ID: <4A1EF2CE.2060509@sidefx.com>
Date: Thu, 28 May 2009 16:23:42 -0400
From: Edward Lam <edward AT sidefx DOT com>
User-Agent: Thunderbird 2.0.0.21 (Windows/20090302)
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: 1.7.0-48: [BUG] Passing characters above 128 from bash command line
References: <200905281541 DOT 33404 DOT michael DOT renner AT gmx DOT de> <20090528145106 DOT GA23970 AT ednor DOT casa DOT cgf DOT cx> <4A1EAA75 DOT 7030203 AT sidefx DOT com> <4A1EAAED DOT 1060702 AT cygwin DOT com> <4A1EAD61 DOT 5010308 AT sidefx DOT com> <4A1EAD91 DOT 1060701 AT sidefx DOT com> <e2480c70905281131u37651a2eoba946637bd414516 AT mail DOT gmail DOT com>
In-Reply-To: <e2480c70905281131u37651a2eoba946637bd414516@mail.gmail.com>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Alexey Borzenkov wrote:
> On Thu, May 28, 2009 at 7:28 PM, Edward Lam <edward AT sidefx DOT com> wrote:
>> PS. In case you haven't noticed, copyright.txt is not a long file. It
>> consists of a single byte, 0xA9.
> 
> Did you try utf-8 encoding copyright.txt? Perhaps your locale is utf-8
> and the encoder fails.

How is one supposed to determine one's locale in cygwin? I do NOT have 
LANG, or any of the LC environment variables set. I even tried 
explicitly setting LANG=C and it still fails.

The problem does seem to stem from the new UTF-8 support in cygwin 1.7. 
However, I think something is going on here that is unexpected because 
trying something similar on Linux has no problems. To confirm that it 
was an UTF-8 related problem, let me repeat the steps slightly 
differently again. Here we assume that I've already got bug.exe compiled 
which simply prints out its arguments.

$ export LANG=C

$ ./bug arg1 "before `cat copyright.txt` after" arg3
0: E:\cygwin1.7\tmp\bug.exe
1: arg1
2: before

*Notice that argc is 3 when it should be 4!*

$ piconv -f iso-8859-1 -t utf8 < copyright.txt > fubar.txt

$ ./bug arg1 "before `cat fubar.txt` after" arg3
0: E:\cygwin1.7\tmp\bug.exe
1: arg1
2: before © after
3: arg3

*So now everything works because I converted the character into UTF-8.*

I think what this points to is some form of invalid source encoding of 
the command line argument when spawning NATIVE applications.

Here's what happens when I try to compile bug.c using cygwin's gcc:

$ gcc bug.c -o bug-gcc.exe

$ ./bug-gcc arg1 "before `cat copyright.txt` after" arg3
0: ./bug-gcc
1: arg1
2: before © after
3: arg3

So there seems to be some sort of special marshaling of the command line 
arguments that only works when spawning cygwin apps, but breaks when 
running under native apps.

Regards,
-Edward

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019