delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2010/12/04/10:07:10

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Sat, 4 Dec 2010 16:06:42 +0100
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: Problem with Bash regex test case sensitivity
Message-ID: <20101204150642.GA26471@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <4CF96F70 DOT 3090507 AT veritech DOT com> <AANLkTikQJEJ6kHKZdzzA_YB_DHgZBevCLDKtAEm6ZgBg AT mail DOT gmail DOT com> <4CF9BA08 DOT 8060703 AT redhat DOT com> <AANLkTi=pSXnqvF5OsQbaP8nE6sGHsL6crOG3z9D6SzWs AT mail DOT gmail DOT com>
MIME-Version: 1.0
In-Reply-To: <AANLkTi=pSXnqvF5OsQbaP8nE6sGHsL6crOG3z9D6SzWs@mail.gmail.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Dec  4 10:05, Lee wrote:
> On 12/3/10, Eric Blake <eblake@ > wrote:
> > Read the FAQ.  http://www.faqs.org/faqs/unix-faq/shell/bash/, E9.
> 
> Which says the en_US locale collates the upper and lower case letters like this:
> 	AaBb...Zz
> 
> I got that much :)  What I don't get is why someone would _want_ the
> collating sequence to be AaBb... or why that sequence was picked for
> en_US instead of using the natural order of A-Za-z.

It's not the "natural" order, it's an arbitrary order which has been
chosen back in 1963 when the ASCII code has been defined.  It's not used
as "natural" order outside of computer systems and it's not even the
natural order on some computer systems (See EBCDIC).

If you take a look into a hardcopy encyclopedia written in english,
you'll be very comfortable that the words are ordered lexicographically
instead of in ASCII coding, probably.  Needless to say that ordering
criteria for non-english languages may contain more characters in the
sequence, in german for instance

  "AaäBb...Ooö...Ssß...Uuü...Zz"

So, let's reiterate:

- If I need the order for the computer language, I say so:

   LC_COLLATE=C.UTF-8

- Otherwise, if I need the order for the natural language, I say so:

   LC_COLLATE=en_US.UTF-8
   LC_COLLATE=de_DE.UTF-8
   ...


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019