X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-0.3 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,TW_DF,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: sourceware.org MIME-Version: 1.0 In-Reply-To: <4CFAB766.9030900@veritech.com> References: <4CF96F70 DOT 3090507 AT veritech DOT com> <4CF9BA08 DOT 8060703 AT redhat DOT com> <20101204150642 DOT GA26471 AT calimero DOT vinschen DOT de> <4CFAB766 DOT 9030900 AT veritech DOT com> Date: Sat, 4 Dec 2010 18:34:59 -0400 Message-ID: Subject: Re: Problem with Bash regex test case sensitivity From: Lee To: cygwin AT cygwin DOT com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On 12/4/10, Lee Rothstein wrote: > On 12/4/2010 10:06 AM, Corinna Vinschen wrote: > > > On Dec 4 10:05, Lee wrote: > > >> On 12/3/10, Eric Blake wrote: > >>> Read the FAQ. http://www.faqs.org/faqs/unix-faq/shell/bash/, E9. > > >> Which says the en_US locale collates the upper and lower case > >> letters like this: > >> AaBb...Zz > > >> I got that much :) What I don't get is why someone would _want_ the > >> collating sequence to be AaBb... or why that sequence was picked for > >> en_US instead of using the natural order of A-Za-z. > > > It's not the "natural" order, it's an arbitrary order which has been > > chosen back in 1963 when the ASCII code has been defined. It's not us= ed > > as "natural" order outside of computer systems and it's not even the > > natural order on some computer systems (See EBCDIC). > > > If you take a look into a hardcopy encyclopedia written in english, > > you'll be very comfortable that the words are ordered lexicographically > > instead of in ASCII coding, probably. Needless to say that ordering > > criteria for non-english languages may contain more characters in the > > sequence, in german for instance > > > "Aa=E4Bb...Oo=F6...Ss=DF...Uu=FC...Zz" > > > So, let's reiterate: > > > - If I need the order for the computer language, I say so: > > > LC_COLLATE=3DC.UTF-8 > > > - Otherwise, if I need the order for the natural language, I > > say so: > > > LC_COLLATE=3Den_US.UTF-8 > > LC_COLLATE=3Dde_DE.UTF-8 > > ... > > Here's my takeaway, given Corinna's interesting and complete > context, and my intents. (My intentions, BTW, are for my scripts > to have as much generality as possible [given my limited skills > ;-|].) > > Therefore, instead of using '[A-Z]' to represent caps, I should > have used (?) the Posixly Correct, '[:upper:]'. Close, you should have used '[[:upper:]]' $ cat t_regex #!/bin/bash # t_regex: Test test regex # By Lee Rothstein, 2010-12-03, 16:27:38 regex_test () { echo -n "[A-Z] test: " if [[ "$1" =3D~ [A-Z] ]] ; then echo Contains Capital Letters: $1 else echo Doesn\'t Contain Capital Letters: $1 fi echo -n "[:upper:] test: " if [[ "$1" =3D~ [[:upper:]] ]] ; then echo Contains Capital Letters: $1 else echo Doesn\'t Contain Capital Letters: $1 fi } unset LC_COLLATE export LANG=3D"C.UTF-8" echo "=3D=3D=3D LANG=3D$LANG" regex_test dfgh regex_test Dfgh echo echo export LANG=3D"en_US.UTF-8" echo "=3D=3D=3D LANG=3D$LANG" regex_test dfgh regex_test Dfgh ~/src $ ./t_regex =3D=3D=3D LANG=3DC.UTF-8 [A-Z] test: Doesn't Contain Capital Letters: dfgh [:upper:] test: Doesn't Contain Capital Letters: dfgh [A-Z] test: Contains Capital Letters: Dfgh [:upper:] test: Contains Capital Letters: Dfgh =3D=3D=3D LANG=3Den_US.UTF-8 [A-Z] test: Contains Capital Letters: dfgh [:upper:] test: Doesn't Contain Capital Letters: dfgh [A-Z] test: Contains Capital Letters: Dfgh [:upper:] test: Contains Capital Letters: Dfgh ~/src $ Lee -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple