delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/06/03/09:19:06

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-1.2 required=5.0 tests=AWL,BAYES_00,SPF_PASS
X-Spam-Check-By: sourceware.org
Message-ID: <4A26782C.9040207@sidefx.com>
Date: Wed, 03 Jun 2009 09:18:36 -0400
From: Edward Lam <edward AT sidefx DOT com>
User-Agent: Thunderbird 2.0.0.21 (Windows/20090302)
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: 1.7.0-48: [BUG] Passing characters above 128 from bash command line
References: <e2480c70905281131u37651a2eoba946637bd414516 AT mail DOT gmail DOT com> <4A1EF2CE DOT 2060509 AT sidefx DOT com> <3f0ad08d0905290813m39999f81q918e94e3c960eb3f AT mail DOT gmail DOT com> <4A200287 DOT 8030403 AT sidefx DOT com> <3f0ad08d0905290852xe41338alfda89c622f92f677 AT mail DOT gmail DOT com> <4A200BC0 DOT 9010704 AT sidefx DOT com> <e2480c70905291142o2bcc65ccw2287d175dbd09dd5 AT mail DOT gmail DOT com> <4A204149 DOT 2050009 AT sidefx DOT com> <e2480c70905291337g6c8bcca7xd0baba79c84629db AT mail DOT gmail DOT com> <4A2051E5 DOT 6060600 AT sidefx DOT com> <20090602205440 DOT GF23519 AT calimero DOT vinschen DOT de>
In-Reply-To: <20090602205440.GF23519@calimero.vinschen.de>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Corinna Vinschen wrote:
> On May 29 17:21, Edward Lam wrote:
>> 
>> I think the problem I'm running into is: - I give cygwin 1.7's bash
>> a string that is in my system default code page. - cygwin 1.7
>> thinks the string is actually UTF-8 and tries to convert it as
>> UTF-8 into UTF-16, resulting in a truncated command line that is 
>> passed to child process.
> 
> The question is, what do you expect?  I know, you expect that it
> "just works", but that's not as easy as you might assume,
> unfortunately.

Yes, Alexey and I had a lengthy argument on this thread already.
Disagreements on the default LANG behaviour notwithstanding, I think
that it still should NOT truncate, substituting the invalid character
with something else instead.

Here's a quote from Alexey previously on this thread:

"In my opinion: truncation is a bug (should use replacement character,
or fail exec altogether), expecting utf-8 is not"

Wikipedia has several suggestions on how to handle invalid UTF-8 byte 
sequences (http://en.wikipedia.org/wiki/UTF-8). Personally, I favor the 
rule that uses the replacement character.

> Yoy get the idea.  The character 0xa9 has no meaning in itself.  It
> only has a meaning when you consider the character set or codepage in
> which you use this character.
...
 > How is anybody supposed to know that the file which consists
 > of the single byte 0xa9 has *any* meaning at all?  Why should it be
 > the copyright sign, of all things?

What I was attempting to do was to have NO conversion. In the
real case that I into this, the "bug.exe" was the one to properly
interpret what the byte 0xA9 meant from the command line. Yes, I know
there are several workarounds.

> If we default to the ANSI codepage, you will have the same problem,
> just upside down.  In both cases you will have even more problems if
> you start using characters not available in your default codepage.

This is where I disagreed with Alexey. What we're really arguing here is 
whether which default will run into the least problems for the most 
common usage. This is subjective of course.

-Edward

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019