X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Sat, 4 Dec 2010 16:06:42 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: Problem with Bash regex test case sensitivity Message-ID: <20101204150642.GA26471@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <4CF96F70 DOT 3090507 AT veritech DOT com> <4CF9BA08 DOT 8060703 AT redhat DOT com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Dec 4 10:05, Lee wrote: > On 12/3/10, Eric Blake wrote: > > Read the FAQ. http://www.faqs.org/faqs/unix-faq/shell/bash/, E9. > > Which says the en_US locale collates the upper and lower case letters like this: > AaBb...Zz > > I got that much :) What I don't get is why someone would _want_ the > collating sequence to be AaBb... or why that sequence was picked for > en_US instead of using the natural order of A-Za-z. It's not the "natural" order, it's an arbitrary order which has been chosen back in 1963 when the ASCII code has been defined. It's not used as "natural" order outside of computer systems and it's not even the natural order on some computer systems (See EBCDIC). If you take a look into a hardcopy encyclopedia written in english, you'll be very comfortable that the words are ordered lexicographically instead of in ASCII coding, probably. Needless to say that ordering criteria for non-english languages may contain more characters in the sequence, in german for instance "AaäBb...Ooö...Ssß...Uuü...Zz" So, let's reiterate: - If I need the order for the computer language, I say so: LC_COLLATE=C.UTF-8 - Otherwise, if I need the order for the natural language, I say so: LC_COLLATE=en_US.UTF-8 LC_COLLATE=de_DE.UTF-8 ... Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple