X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; q=dns; s= default; b=g6QhruoNgKfg9zr45+LMXh5NeVGGL69xzg+qYBVg9ubEp6XgPcJir 00ujasj4OP1GjvMLbx7eaCFtmZydye132fgrBOuPwgeQceDkTvqdGPrirMDXVCWs bc1v7gYyHFKk1+gEqezQ1i1z/sK6ApQQadzi1JnMVPIWV+9cvL0A8Q= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; s=default; bh=Z9GoWVSXHTeCDO/BJxszUlQfk5A=; b=o/gojJDCRXzvSCKLpDLiQ269oReg plXheLR5Tw2rNMPz7FNZkGPUaAYhiFRzaYMlrb+zC+l/YJKHPRQAO09P/29cb6fn NW6Nr456wYzg2H884IlHvKqj0O6PkNCO2a5JwjiCqhLIUyhqqg5FtaI2JaycdnC/ 40bM8YkiX0S/NVE= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.1 Date: Wed, 26 Jun 2013 11:19:38 +0200 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: [BUG REPORT]sed -e 's/[B-D]/_/g' replaces unexpected characters Message-ID: <20130626091938.GA6966@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <20130625152356 DOT GD11958 AT calimero DOT vinschen DOT de> <5F8AAC04F9616747BC4CC0E803D5907D0C37C240 AT MLBXv04 DOT nih DOT gov> <20130625160359 DOT GB14459 AT calimero DOT vinschen DOT de> <20130625160911 DOT GC14459 AT calimero DOT vinschen DOT de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20130625160911.GC14459@calimero.vinschen.de> User-Agent: Mutt/1.5.21 (2010-09-15) On Jun 25 18:09, Corinna Vinschen wrote: > On Jun 25 18:03, Corinna Vinschen wrote: > > On Jun 25 15:38, Lavrentiev, Anton (NIH/NLM/NCBI) [C] wrote: > > > > Your locale is zh_CN.UTF-8. What you're expecting is only guaranteed > > > > in the C locale: > > > [...] > Which also means, AFAICS, Cygwin's sed is doing it right, Linux' sed > is doing it wrong. Yes, that puzzles me a bit at the moment, too. I had a discussion with my collegues from the Linux side of Red Hat. The bottom line is, we're both doing it right, just differently. As for the difference itself, here's what happened: The gawk maintainer was unhappy with how regex ranges worked when using locales other than the C locale. So he implemented a change to regex which he called "rational ranges". The idea being, that something like [b-d] always means lowercase only, [B-D] means uppercase only, independent of the locale we're in. This change to the regex handling not only made it into gawk(*), but also into glibc(**) and perl regex, but not into sed or bash, for instance. That's why sed under Cygwin shows the default, collation-abiding behaviour when using a non-C locale. Under Fedora 18 it shows the new "rational ranges" behaviour, because glibc supports them and sed has been built with the --without-included-regex option. I just checked the new upstream sed 4.2.2 (will upload shortly) and it still doesn't implement "rational ranges", even though its regex is derived from gnulib's regex. Corinna (*) Try echo abcdeABCDE | awk '{ gsub(/[B-D]/, "_"); print }' (**) http://sourceware.org/ml/libc-alpha/2012-12/msg00456.html -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple