X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; q=dns; s= default; b=odJoEB7SrIcyal+FiLjBUBEMQtn1IW95clbukzv2fOwdkI2vQlQs5 PWgw3SqTN0WNmj+6lC8q+7RVtMTzFV8JapBnlGwtw9xfMzrcQv27tox0jH4qHKJs jOAbtSEhGhhxSC93hgPOnJRXenE/UJq3JFdmeHFZSLef33NcjvEkbM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; s=default; bh=MQcgHIq5TOSetNQtUwQrLVXKKOo=; b=rX1wU4UrdvWy7oX/LnkSs2SeZWIW se0iUWk0xpIo88PAufjBMYWxbdB2DovxKGKucxXv6wOMsgXMFTuXDbaV9sPlBTdu V0a+u95A+LshqsTgpFCcTSpCUkJwrKZscOBXzF+5nz4GiBPPRMO7KMgyin4KJU8i iH9hQ46+qI5eboU= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com X-Spam-SWARE-Status: No, score=0.1 required=5.0 tests=AWL,BAYES_50,RDNS_NONE,TW_EG autolearn=no version=3.3.1 Date: Mon, 22 Jul 2013 10:12:00 +0200 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: regex library fails git tests Message-ID: <20130722081200.GE2661@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <20130721193953 DOT GC2661 AT calimero DOT vinschen DOT de> <51ECA00D DOT 6030105 AT gmail DOT com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <51ECA00D.6030105@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) On Jul 21 22:59, Mark Levedahl wrote: > On 07/21/2013 03:39 PM, Corinna Vinschen wrote: > > >So, what I did now was this: I added a workaround to Cygwin's regcomp. > >If the current codeset is ASCII, the characters in the pattern are > >converted to wchar_t by simply using their unsigned value verbatim. > >This allows to compile (and test) the patterns in the git testcases. > > > >However, please note that this behaviour, while being provided by glibc > >and now by Cygwin, is *not* standards-compliant. In the narrow sense > >the characters beyond 0x7f are still invalid ASCII chars, and other > >functions working with wchar_t strings won't be as forgiving when using > >invalid input. > > > > > >HTH, > >Corinna > > > > Thank you. I confirm that git passes the two test cases (t4018 and > t4034) using today's snapshot. Thanks for your feedback and for testing the snapshot. I created them yesterday but then forgot to mention them here. > I will pass your comments about use > of characters 0x80 and above to the git list to see if they wish to > change anything. After some sleep, I think I now understand why the glibc devs made regcomp to work this way. This behaviour is backward compatible to non locale-aware applications. In the "C" locale, a char is just some arbitrary byte between 0 and 255. So this pattern always worked before in the "C locale, therefore it makes sense that it continues to work, even if it won't when using other locales/codesets. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple