delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/06/03/12:56:14

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-1.5 required=5.0 tests=AWL,BAYES_00,SPF_PASS
X-Spam-Check-By: sourceware.org
Message-ID: <4A26AB1D.1090404@sidefx.com>
Date: Wed, 03 Jun 2009 12:55:57 -0400
From: Edward Lam <edward AT sidefx DOT com>
User-Agent: Thunderbird 2.0.0.21 (Windows/20090302)
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: 1.7.0-48: [BUG] Passing characters above 128 from bash command line
References: <3f0ad08d0905290852xe41338alfda89c622f92f677 AT mail DOT gmail DOT com> <4A200BC0 DOT 9010704 AT sidefx DOT com> <e2480c70905291142o2bcc65ccw2287d175dbd09dd5 AT mail DOT gmail DOT com> <4A204149 DOT 2050009 AT sidefx DOT com> <e2480c70905291337g6c8bcca7xd0baba79c84629db AT mail DOT gmail DOT com> <4A2051E5 DOT 6060600 AT sidefx DOT com> <20090602205440 DOT GF23519 AT calimero DOT vinschen DOT de> <4A26782C DOT 9040207 AT sidefx DOT com> <20090603142755 DOT GM23519 AT calimero DOT vinschen DOT de> <20090603160225 DOT GA27039 AT ednor DOT casa DOT cgf DOT cx> <20090603161158 DOT GB23419 AT calimero DOT vinschen DOT de>
In-Reply-To: <20090603161158.GB23419@calimero.vinschen.de>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Corinna Vinschen wrote:
> On Jun  3 12:02, Christopher Faylor wrote:
>> On Wed, Jun 03, 2009 at 04:27:55PM +0200, Corinna Vinschen wrote:
>>> On Jun  3 09:18, Edward Lam wrote:
>>>> Corinna Vinschen wrote:
>>>>> The question is, what do you expect?  [...]
>>>> [...]
>>>> Wikipedia has several suggestions on how to handle invalid UTF-8 byte  
>>>> sequences (http://en.wikipedia.org/wiki/UTF-8). Personally, I favor the  
>>>> rule that uses the replacement character.
>>> Chris implemented using the invalid code point solution.  The discussion
>>> in http://www.mail-archive.com/linux-utf8 AT nl DOT linux DOT org/msg00080.html
>>> supports this solution.  What's missing so far is the way back, from
>>> an invalid single second half of a surrogate pair in the 0xDCxx range
>>> back to the correct byte value.  I'm just looking into that.
>> The way back was not, AFAIK, needed for Cygwin programs.  I don't think
>> there is a valid way back for Windows programs.
> 
> The way back is not needed for the argv handling in Cygwin, but it
> gets necessary if you converted to UTF-16 in other circumstances.
> It's not much of a problem since the way back is a no-brainer, in
> contrast to the conversion to UTF-16.

What is the current state of affairs in cygwin 1.7.0-48? Is the invalid 
code point solution currently being used when converting the command 
line to UTF-16 when spawning non-cygwin processes? What I'm trying to 
understand is where the command line truncation is taking place, in the 
parent or child process.

If the truncation is happening in the child process because of the 
invalid code point, then perhaps we should consider using the 
replacement character solution when spawning non-cygwin child processes. 
IMHO, having a bad character is better than having a truncated command 
line. At least, the problem (invalid UTF-8) then becomes more obvious.

-Edward

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019