X-Recipient: archive-cygwin@delorie.com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:date:from:to:subject:message-id:reply-to
	:references:mime-version:content-type:in-reply-to; q=dns; s=
	default; b=odJoEB7SrIcyal+FiLjBUBEMQtn1IW95clbukzv2fOwdkI2vQlQs5
	PWgw3SqTN0WNmj+6lC8q+7RVtMTzFV8JapBnlGwtw9xfMzrcQv27tox0jH4qHKJs
	jOAbtSEhGhhxSC93hgPOnJRXenE/UJq3JFdmeHFZSLef33NcjvEkbM=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:date:from:to:subject:message-id:reply-to
	:references:mime-version:content-type:in-reply-to; s=default;
	 bh=MQcgHIq5TOSetNQtUwQrLVXKKOo=; b=rX1wU4UrdvWy7oX/LnkSs2SeZWIW
	se0iUWk0xpIo88PAufjBMYWxbdB2DovxKGKucxXv6wOMsgXMFTuXDbaV9sPlBTdu
	V0a+u95A+LshqsTgpFCcTSpCUkJwrKZscOBXzF+5nz4GiBPPRMO7KMgyin4KJU8i
	iH9hQ46+qI5eboU=
Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe@cygwin.com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-help@cygwin.com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner@cygwin.com
Mail-Followup-To: cygwin@cygwin.com
Delivered-To: mailing list cygwin@cygwin.com
X-Spam-SWARE-Status: No, score=0.1 required=5.0 tests=AWL,BAYES_50,RDNS_NONE,TW_EG autolearn=no version=3.3.1
Date: Mon, 22 Jul 2013 10:12:00 +0200
From: Corinna Vinschen <corinna-cygwin@cygwin.com>
To: cygwin@cygwin.com
Subject: Re: regex library fails git tests
Message-ID: <20130722081200.GE2661@calimero.vinschen.de>
Reply-To: cygwin@cygwin.com
Mail-Followup-To: cygwin@cygwin.com
References: <ksepor$cag$1@ger.gmane.org> <20130721193953.GC2661@calimero.vinschen.de> <51ECA00D.6030105@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <51ECA00D.6030105@gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)

On Jul 21 22:59, Mark Levedahl wrote:
> On 07/21/2013 03:39 PM, Corinna Vinschen wrote:
> 
> >So, what I did now was this:  I added a workaround to Cygwin's regcomp.
> >If the current codeset is ASCII, the characters in the pattern are
> >converted to wchar_t by simply using their unsigned value verbatim.
> >This allows to compile (and test) the patterns in the git testcases.
> >
> >However, please note that this behaviour, while being provided by glibc
> >and now by Cygwin, is *not* standards-compliant.  In the narrow sense
> >the characters beyond 0x7f are still invalid ASCII chars, and other
> >functions working with wchar_t strings won't be as forgiving when using
> >invalid input.
> >
> >
> >HTH,
> >Corinna
> >
> 
> Thank you. I confirm that git passes the two test cases (t4018 and
> t4034) using today's snapshot.

Thanks for your feedback and for testing the snapshot.  I created them
yesterday but then forgot to mention them here.

> I will pass your comments about use
> of characters 0x80 and above to the git list to see if they wish to
> change anything.

After some sleep, I think I now understand why the glibc devs made
regcomp to work this way.  This behaviour is backward compatible to non
locale-aware applications.  In the "C" locale, a char is just some
arbitrary byte between 0 and 255.  So this pattern always worked before
in the "C locale, therefore it makes sense that it continues to work,
even if it won't when using other locales/codesets.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

