X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:subject:date:message-id:references :in-reply-to:content-type:content-transfer-encoding :mime-version; q=dns; s=default; b=nakLIQgElFOE+TRhMm1bpbM0gzb+R YBj0sZDKdvIHhQH2p/tp+tjI+023vElxgyyNSpdMupcPIQwlbAUHQQDNVoHgXQU3 bLlJnG4cuuf7r3KQaWL2478HmybTVp2aSSwEE3vUpooxIaR7ODr/+CWE1B8lXt2c mOij07m10qWhgE= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:subject:date:message-id:references :in-reply-to:content-type:content-transfer-encoding :mime-version; s=default; bh=pWDjG9YYff0NZoLpGGfDMGclHno=; b=SQa vORLxTNRLmZYKdeoYA6Ko4lZKvkpBw091VY8rphqlw/4MVgwJBf2xqZI0ohRI/WA uq3yKzpaqy11VzivE2cg2JG8max8BSTbsoHRrnl+LwjAQJFbxuvnUIX3wmirBCet W7BtzIkHtWeJEUeChkFYQ9lIMu5NEm57icgDgVXA= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com X-Spam-SWARE-Status: No, score=-4.8 required=5.0 tests=AWL,BAYES_00,KHOP_THREADED,MIME_BASE64_BLANKS,RCVD_IN_DNSWL_MED,RCVD_IN_HOSTKARMA_W,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.1 X-IronPortListener: Outbound_SMTP X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjQFAEy/yVGcKEes/2dsb2JhbABagwkxSYMFvDQNdxZ0giQBAQQSERFVAgEIDQ0CBiACAgIdExUCAQ0CBBsBGYdsDJ99ihaRSoEmjW4WIoJPM2EDjiqFR4oWg1qHJoMQgig From: "Buchbinder, Barry (NIH/NIAID) [E]" To: "cygwin AT cygwin DOT com" Subject: RE: [BUG REPORT]sed -e 's/[B-D]/_/g' replaces unexpected characters Date: Tue, 25 Jun 2013 16:06:58 +0000 Message-ID: <6CF2FC1279D0844C9357664DC5A08BA20D86A2@MLBXV06.nih.gov> References: <20130625152356 DOT GD11958 AT calimero DOT vinschen DOT de> <5F8AAC04F9616747BC4CC0E803D5907D0C37C25C AT MLBXv04 DOT nih DOT gov> In-Reply-To: <5F8AAC04F9616747BC4CC0E803D5907D0C37C25C@MLBXv04.nih.gov> Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id r5PG7FsV020487 Lavrentiev, Anton sent the following at Tuesday, June 25, 2013 11:44 AM >> The character ordering is based on the default Windows ordering for the >> locale, and that's dictionary ordering, apparently. > >Ah, I see what you meant here. There's an elaborated explanation: > >http://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html Also, the sed info documentation "Reporting Bugs" explicitly says that this is not a bug. `[a-z]' is case insensitive You are encountering problems with locales. POSIX mandates that `[a-z]' uses the current locale's collation order - in C parlance, that means using `strcoll(3)' instead of `strcmp(3)'. Some locales have a case-insensitive collation order, others don't. Another problem is that `[a-z]' tries to use collation symbols. This only happens if you are on the GNU system, using GNU libc's regular expression matcher instead of compiling the one supplied with GNU sed. In a Danish locale, for example, the regular expression `^[a-z]$' matches the string `aa', because this is a single collating symbol that comes after `a' and before `b'; `ll' behaves similarly in Spanish locales, or `ij' in Dutch locales. To work around these problems, which may cause bugs in shell scripts, set the `LC_COLLATE' and `LC_CTYPE' environment variables to `C'.