Mail Archives: cygwin/2010/12/04/17:35:17
X-Recipient: | archive-cygwin AT delorie DOT com
|
X-SWARE-Spam-Status: | No, hits=-0.3 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,TW_DF,T_TO_NO_BRKTS_FREEMAIL
|
X-Spam-Check-By: | sourceware.org
|
MIME-Version: | 1.0
|
In-Reply-To: | <4CFAB766.9030900@veritech.com>
|
References: | <4CF96F70 DOT 3090507 AT veritech DOT com> <AANLkTikQJEJ6kHKZdzzA_YB_DHgZBevCLDKtAEm6ZgBg AT mail DOT gmail DOT com> <4CF9BA08 DOT 8060703 AT redhat DOT com> <AANLkTi=pSXnqvF5OsQbaP8nE6sGHsL6crOG3z9D6SzWs AT mail DOT gmail DOT com> <20101204150642 DOT GA26471 AT calimero DOT vinschen DOT de> <4CFAB766 DOT 9030900 AT veritech DOT com>
|
Date: | Sat, 4 Dec 2010 18:34:59 -0400
|
Message-ID: | <AANLkTikHpfsEkrJfN+zEBtsSECu4Vea-Y9mzn3+0V0_g@mail.gmail.com>
|
Subject: | Re: Problem with Bash regex test case sensitivity
|
From: | Lee <ler762 AT gmail DOT com>
|
To: | cygwin AT cygwin DOT com
|
X-IsSubscribed: | yes
|
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm
|
List-Id: | <cygwin.cygwin.com>
|
List-Unsubscribe: | <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
|
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com>
|
List-Archive: | <http://sourceware.org/ml/cygwin/>
|
List-Post: | <mailto:cygwin AT cygwin DOT com>
|
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
|
Sender: | cygwin-owner AT cygwin DOT com
|
Mail-Followup-To: | cygwin AT cygwin DOT com
|
Delivered-To: | mailing list cygwin AT cygwin DOT com
|
On 12/4/10, Lee Rothstein <lee@ > wrote:
> On 12/4/2010 10:06 AM, Corinna Vinschen wrote:
>
> > On Dec 4 10:05, Lee wrote:
>
> >> On 12/3/10, Eric Blake <eblake@ > wrote:
> >>> Read the FAQ. http://www.faqs.org/faqs/unix-faq/shell/bash/, E9.
>
> >> Which says the en_US locale collates the upper and lower case
> >> letters like this:
> >> AaBb...Zz
>
> >> I got that much :) What I don't get is why someone would _want_ the
> >> collating sequence to be AaBb... or why that sequence was picked for
> >> en_US instead of using the natural order of A-Za-z.
>
> > It's not the "natural" order, it's an arbitrary order which has been
> > chosen back in 1963 when the ASCII code has been defined. It's not us=
ed
> > as "natural" order outside of computer systems and it's not even the
> > natural order on some computer systems (See EBCDIC).
>
> > If you take a look into a hardcopy encyclopedia written in english,
> > you'll be very comfortable that the words are ordered lexicographically
> > instead of in ASCII coding, probably. Needless to say that ordering
> > criteria for non-english languages may contain more characters in the
> > sequence, in german for instance
>
> > "Aa=E4Bb...Oo=F6...Ss=DF...Uu=FC...Zz"
>
> > So, let's reiterate:
>
> > - If I need the order for the computer language, I say so:
>
> > LC_COLLATE=3DC.UTF-8
>
> > - Otherwise, if I need the order for the natural language, I
> > say so:
>
> > LC_COLLATE=3Den_US.UTF-8
> > LC_COLLATE=3Dde_DE.UTF-8
> > ...
>
> Here's my takeaway, given Corinna's interesting and complete
> context, and my intents. (My intentions, BTW, are for my scripts
> to have as much generality as possible [given my limited skills
> ;-|].)
>
> Therefore, instead of using '[A-Z]' to represent caps, I should
> have used (?) the Posixly Correct, '[:upper:]'.
Close, you should have used '[[:upper:]]'
$ cat t_regex
#!/bin/bash
# t_regex: Test test regex
# By Lee Rothstein, 2010-12-03, 16:27:38
regex_test () {
echo -n "[A-Z] test: "
if [[ "$1" =3D~ [A-Z] ]] ; then
echo Contains Capital Letters: $1
else
echo Doesn\'t Contain Capital Letters: $1
fi
echo -n "[:upper:] test: "
if [[ "$1" =3D~ [[:upper:]] ]] ; then
echo Contains Capital Letters: $1
else
echo Doesn\'t Contain Capital Letters: $1
fi
}
unset LC_COLLATE
export LANG=3D"C.UTF-8"
echo "=3D=3D=3D LANG=3D$LANG"
regex_test dfgh
regex_test Dfgh
echo
echo
export LANG=3D"en_US.UTF-8"
echo "=3D=3D=3D LANG=3D$LANG"
regex_test dfgh
regex_test Dfgh
~/src
$ ./t_regex
=3D=3D=3D LANG=3DC.UTF-8
[A-Z] test: Doesn't Contain Capital Letters: dfgh
[:upper:] test: Doesn't Contain Capital Letters: dfgh
[A-Z] test: Contains Capital Letters: Dfgh
[:upper:] test: Contains Capital Letters: Dfgh
=3D=3D=3D LANG=3Den_US.UTF-8
[A-Z] test: Contains Capital Letters: dfgh
[:upper:] test: Doesn't Contain Capital Letters: dfgh
[A-Z] test: Contains Capital Letters: Dfgh
[:upper:] test: Contains Capital Letters: Dfgh
~/src
$
Lee
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
- Raw text -