X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-0.3 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: sourceware.org MIME-Version: 1.0 In-Reply-To: <20101204150642.GA26471@calimero.vinschen.de> References: <4CF96F70 DOT 3090507 AT veritech DOT com> <4CF9BA08 DOT 8060703 AT redhat DOT com> <20101204150642 DOT GA26471 AT calimero DOT vinschen DOT de> Date: Sat, 4 Dec 2010 17:08:25 -0400 Message-ID: Subject: Re: Problem with Bash regex test case sensitivity From: Lee To: cygwin AT cygwin DOT com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On 12/4/10, Corinna Vinschen wrote: > On Dec 4 10:05, Lee wrote: >> On 12/3/10, Eric Blake wrote: >> > Read the FAQ. http://www.faqs.org/faqs/unix-faq/shell/bash/, E9. >> >> Which says the en_US locale collates the upper and lower case letters li= ke >> this: >> AaBb...Zz >> >> I got that much :) What I don't get is why someone would _want_ the >> collating sequence to be AaBb... or why that sequence was picked for >> en_US instead of using the natural order of A-Za-z. > > It's not the "natural" order, it's an arbitrary order which has been > chosen back in 1963 when the ASCII code has been defined. It's not used > as "natural" order outside of computer systems and it's not even the > natural order on some computer systems (See EBCDIC). My idea of "natural order" is treating each character as an unsigned integer. So even though ASCII has a different collating sequence than EBCDIC, the characters are still treated as unsigned integers when sorting them. Setting LANG to something other than C seems to break that model.. > If you take a look into a hardcopy encyclopedia written in english, > you'll be very comfortable that the words are ordered lexicographically > instead of in ASCII coding, probably. I never paid all that much attention to how the words were ordered, but now that I have.. they're backwards! "god" comes before "God", "hopper" before "Hopper", etc. > Needless to say that ordering > criteria for non-english languages may contain more characters in the > sequence, in german for instance > > "Aa=E4Bb...Oo=F6...Ss=DF...Uu=FC...Zz" > > So, let's reiterate: > > - If I need the order for the computer language, I say so: > > LC_COLLATE=3DC.UTF-8 > > - Otherwise, if I need the order for the natural language, I say so: > > LC_COLLATE=3Den_US.UTF-8 > LC_COLLATE=3Dde_DE.UTF-8 You're quite good at explaining this.. I think I'm actually beginning to understand it :) So... the reason for setting LANG is a shorthand method of setting all the LC_xxx environment variables? Thanks, Lee -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple