X-Recipient: archive-cygwin@delorie.com
X-Spam-Check-By: sourceware.org
Date: Wed, 3 Jun 2009 18:11:58 +0200
From: Corinna Vinschen <corinna-cygwin@cygwin.com>
To: cygwin@cygwin.com
Subject: Re: 1.7.0-48: [BUG] Passing characters above 128 from bash command 	line
Message-ID: <20090603161158.GB23419@calimero.vinschen.de>
Reply-To: cygwin@cygwin.com
Mail-Followup-To: cygwin@cygwin.com
References: <3f0ad08d0905290852xe41338alfda89c622f92f677@mail.gmail.com> <4A200BC0.9010704@sidefx.com> <e2480c70905291142o2bcc65ccw2287d175dbd09dd5@mail.gmail.com> <4A204149.2050009@sidefx.com> <e2480c70905291337g6c8bcca7xd0baba79c84629db@mail.gmail.com> <4A2051E5.6060600@sidefx.com> <20090602205440.GF23519@calimero.vinschen.de> <4A26782C.9040207@sidefx.com> <20090603142755.GM23519@calimero.vinschen.de> <20090603160225.GA27039@ednor.casa.cgf.cx>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20090603160225.GA27039@ednor.casa.cgf.cx>
User-Agent: Mutt/1.5.19 (2009-02-20)
Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm
Precedence: bulk
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie.com@cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe@cygwin.com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-help@cygwin.com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner@cygwin.com
Mail-Followup-To: cygwin@cygwin.com
Delivered-To: mailing list cygwin@cygwin.com

On Jun  3 12:02, Christopher Faylor wrote:
> On Wed, Jun 03, 2009 at 04:27:55PM +0200, Corinna Vinschen wrote:
> >On Jun  3 09:18, Edward Lam wrote:
> >> Corinna Vinschen wrote:
> >>> The question is, what do you expect?  [...]
> >> [...]
> >> Wikipedia has several suggestions on how to handle invalid UTF-8 byte  
> >> sequences (http://en.wikipedia.org/wiki/UTF-8). Personally, I favor the  
> >> rule that uses the replacement character.
> >
> >Chris implemented using the invalid code point solution.  The discussion
> >in http://www.mail-archive.com/linux-utf8@nl.linux.org/msg00080.html
> >supports this solution.  What's missing so far is the way back, from
> >an invalid single second half of a surrogate pair in the 0xDCxx range
> >back to the correct byte value.  I'm just looking into that.
> 
> The way back was not, AFAIK, needed for Cygwin programs.  I don't think
> there is a valid way back for Windows programs.

The way back is not needed for the argv handling in Cygwin, but it
gets necessary if you converted to UTF-16 in other circumstances.
It's not much of a problem since the way back is a no-brainer, in
contrast to the conversion to UTF-16.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

