X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:subject:date:message-id:references :in-reply-to:content-type:content-transfer-encoding :mime-version; q=dns; s=default; b=mRL8HY3u12QbpvCAPdqWeC/+Ac20r xz0G0GvA7WStUVQnmpc4vQ4uVx0hMeg9DQNbrgb6B9UuLqPRvP3ZHZUhbWABw++S worzyax21Zc9B9fXvf1FC1mgQhSVWMO2WMN7DQpi6wBc6MxqFYYnoSe91mLQ/dQ4 YHlnkLFIZSeb2E= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:subject:date:message-id:references :in-reply-to:content-type:content-transfer-encoding :mime-version; s=default; bh=ptGpVtfrhAxBryAlGvOnh96sJfk=; b=P+s GYClffgcxgMhq5j645YOeHihLnTbz/++LvHRzTbS7IzHyVTMgdqmDwnmRCL5TwbT wBgLB2/OguhGqjm5Hucbdk8OixRK/xWyEBZpyzWM2QWUs+Pq1bIuCEVsV819QkVS YLGvyG4TMw28nzp4A4kwYifQYWbU4TSSfWg3ZQfM= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com X-Spam-SWARE-Status: No, score=-4.7 required=5.0 tests=AWL,BAYES_00,KHOP_THREADED,MIME_BASE64_BLANKS,RCVD_IN_DNSWL_MED,RCVD_IN_HOSTKARMA_W,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.1 X-IronPortListener: Outbound_SMTP X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjMFALi4yVGcKEez/2dsb2JhbABagwl6gwW8NA12FnSCIwEBAQMBEhERSgsCAQgNDQIGIAICAh0TFQIBDQIEGxqHZgafdYoWkVCBJo1uOIJPM2EDjiqPXYNahyaDEIIo From: "Lavrentiev, Anton (NIH/NLM/NCBI) [C]" To: "cygwin AT cygwin DOT com" Subject: RE: [BUG REPORT]sed -e 's/[B-D]/_/g' replaces unexpected characters Date: Tue, 25 Jun 2013 15:38:19 +0000 Message-ID: <5F8AAC04F9616747BC4CC0E803D5907D0C37C240@MLBXv04.nih.gov> References: <20130625152356 DOT GD11958 AT calimero DOT vinschen DOT de> In-Reply-To: <20130625152356.GD11958@calimero.vinschen.de> Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id r5PFca9Z017995 > Your locale is zh_CN.UTF-8. What you're expecting is only guaranteed > in the C locale: I'm not quite sure it applies here. I'm using US English Windows 7. LANG = 'en_US.UTF-8' I get the same result: $ echo abcdeABCDE | sed -e 's/[B-D]/_/g' ab__eA___E BUT: $ echo abcdeABCDE | LANG=C sed 's/[B-D]/_/g' abcdeA___E This is very weird, indeed. OTOH, in Linux I have the same LANG setup, yet it does work correctly: > echo $LANG en_US.UTF-8 > echo abcdeABCDE | sed -e 's/[B-D]/_/g' abcdeA___E I believe that an en_US UTF-8 string representation for "abcdeABCDE" is not any different from ASCII. Anton Lavrentiev Contractor NIH/NLM/NCBI