delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2013/06/25/12:09:34

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:to:subject:message-id:reply-to
:references:mime-version:content-type:in-reply-to; q=dns; s=
default; b=rJ5tXgt+ZpxDR8tM2ByQ3rPa0UmjZAnuyZ8pwtEzLJVjK/fnZvf5x
Uhdzal7AAbIUaA5qi6J4OJrD2yv3AG9S/ND8OR2WqWH2+tfwIaC7WeXhVo53nsHD
lQiPL/byN3SI9s3s8NEBD5q0U91bt0VE/W0Eyraj0Gk+9lmYkTsONY=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:date:from:to:subject:message-id:reply-to
:references:mime-version:content-type:in-reply-to; s=default;
bh=DsUMeym5gai2X7ajEEGNoZlQz4Y=; b=Wb6ulKkwOXqZu0xP3ZOI63wfGqxo
cTSHoBuj/pommiiPPdfBtWESqh3vTFT5irD6SfeXvrS3c7ZGxK0+hDbCQ79Q0g0s
xQDPyOr8iJmthWKrE6agPVmM7AQgdPMlfSa8b0fNsxcm8SRXvIfyIWmqhUOEZl3I
QHPW30/wwEgflZk=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.1
Date: Tue, 25 Jun 2013 18:09:11 +0200
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: [BUG REPORT]sed -e 's/[B-D]/_/g' replaces unexpected characters
Message-ID: <20130625160911.GC14459@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <CA+nJC97He=j-O2FZ-Y2jJhYXEJn2o2EfC1wO39+2bZ=nj1f-zA AT mail DOT gmail DOT com> <20130625152356 DOT GD11958 AT calimero DOT vinschen DOT de> <5F8AAC04F9616747BC4CC0E803D5907D0C37C240 AT MLBXv04 DOT nih DOT gov> <20130625160359 DOT GB14459 AT calimero DOT vinschen DOT de>
MIME-Version: 1.0
In-Reply-To: <20130625160359.GB14459@calimero.vinschen.de>
User-Agent: Mutt/1.5.21 (2010-09-15)

On Jun 25 18:03, Corinna Vinschen wrote:
> On Jun 25 15:38, Lavrentiev, Anton (NIH/NLM/NCBI) [C] wrote:
> > > Your locale is zh_CN.UTF-8.  What you're expecting is only guaranteed
> > > in the C locale:
> > 
> > I'm not quite sure it applies here.  I'm using US English Windows 7.
> > 
> > LANG = 'en_US.UTF-8'
> > 
> > I get the same result:
> > 
> > $ echo abcdeABCDE | sed -e 's/[B-D]/_/g'
> > ab__eA___E
> > 
> > BUT:
> > 
> > $ echo abcdeABCDE | LANG=C sed 's/[B-D]/_/g'
> > abcdeA___E
> > 
> > This is very weird, indeed.
> > 
> > OTOH, in Linux I have the same LANG setup, yet it does work
> > correctly:
> > 
> > > echo $LANG
> > en_US.UTF-8
> > > echo abcdeABCDE | sed -e 's/[B-D]/_/g'
> > abcdeA___E
> > 
> > I believe that an en_US UTF-8 string representation for
> > "abcdeABCDE" is not any different from ASCII.
> 
> Wrong.  Try this:
> 
>   $ sort
>   a
>   b
>   c
>   d
>   e
>   A
>   B
>   C
>   D
>   E
>   <Ctrl-D>
>   a
>   A
>   b
>   B
>   c
>   C
>   d
>   D

Which also means, AFAICS, Cygwin's sed is doing it right, Linux' sed
is doing it wrong.  Yes, that puzzles me a bit at the moment, too.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019