X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=0.1 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS X-Spam-Check-By: sourceware.org Message-ID: <4CFAB766.9030900@veritech.com> Date: Sat, 04 Dec 2010 16:49:26 -0500 From: Lee Rothstein User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.12) Gecko/20101027 Lightning/1.0b2 Thunderbird/3.1.6 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: Problem with Bash regex test case sensitivity References: <4CF96F70 DOT 3090507 AT veritech DOT com> <4CF9BA08 DOT 8060703 AT redhat DOT com> <20101204150642 DOT GA26471 AT calimero DOT vinschen DOT de> In-Reply-To: <20101204150642.GA26471@calimero.vinschen.de> Content-Type: multipart/mixed; boundary="------------060503060205090409040107" X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com --------------060503060205090409040107 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 12/4/2010 10:06 AM, Corinna Vinschen wrote: > On Dec 4 10:05, Lee wrote: >> On 12/3/10, Eric Blake wrote: >>> Read the FAQ. http://www.faqs.org/faqs/unix-faq/shell/bash/, E9. >> Which says the en_US locale collates the upper and lower case >> letters like this: >> AaBb...Zz >> I got that much :) What I don't get is why someone would _want_ the >> collating sequence to be AaBb... or why that sequence was picked for >> en_US instead of using the natural order of A-Za-z. > It's not the "natural" order, it's an arbitrary order which has been > chosen back in 1963 when the ASCII code has been defined. It's not used > as "natural" order outside of computer systems and it's not even the > natural order on some computer systems (See EBCDIC). > If you take a look into a hardcopy encyclopedia written in english, > you'll be very comfortable that the words are ordered lexicographically > instead of in ASCII coding, probably. Needless to say that ordering > criteria for non-english languages may contain more characters in the > sequence, in german for instance > "AaäBb...Ooö...Ssß...Uuü...Zz" > So, let's reiterate: > - If I need the order for the computer language, I say so: > LC_COLLATE=C.UTF-8 > - Otherwise, if I need the order for the natural language, I > say so: > LC_COLLATE=en_US.UTF-8 > LC_COLLATE=de_DE.UTF-8 > ... Here's my takeaway, given Corinna's interesting and complete context, and my intents. (My intentions, BTW, are for my scripts to have as much generality as possible [given my limited skills ;-|].) Therefore, instead of using '[A-Z]' to represent caps, I should have used (?) the Posixly Correct, '[:upper:]'. However, the test script (attached) still doesn't work on either my Cygwin config, or a Linux config, with this change. (I have not yet made the above indicated environment variable changes, since I am still waiting for clarification to the new issue I bring up, here.) The latter test would, IMHO, seem to imply that the changes to NIX shells were mandated by I18N considerations, BUT the other required changes in code or default setting were NOT implemented. This would seem to penalize only those folks who are conversant with long-term convention of the 'NIX world. Please correct my misunderstanding if I'm wrong! Lee --------------060503060205090409040107 Content-Type: text/plain; name="t_regex" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="t_regex" IyEvYmluL2Jhc2gKCiMgdF9yZWdleDogVHV0b3JpYWwgb24gcmVnZXgsIHRl c3QKCiMgQnkgTGVlIFJvdGhzdGVpbiwgMjAxMC0xMi0wNCwgMTM6NTc6NTQK CiMgRWFjaCBUZXN0IHBlcmZvcm1lZCBvbjoKCiMgKiBDWUdXSU5fTlQtNi4w LVdPVzY0IDEuNy43KDAuMjMwLzUvMykKIyAqIExpbnV4IDIuNi4xNS01NS1h bWQ2NC1nZW5lcmljCgojaWYgW1sgIiQxIiA9fiBbQS1aXSBdXSA7IHRoZW4g ICAgICAgIyBkb2Vzbid0IHdvcmsKaWYgW1sgIiQxIiA9fiBbOnVwcGVyOl0g XV0gOyB0aGVuICAgIyBkb2Vzbid0IHdvcmsKI2lmIFtbICIkMSIgPX4gW0FC Q0RFRkdISUpLTE1OT1BRUlNUVVZXWFlaXSBdXSA7IHRoZW4gICMgV29ya3MK ICBlY2hvIENvbnRhaW5zIENhcGl0YWwgTGV0dGVyczogJDEKZWxzZQogIGVj aG8gRG9lc25cJ3QgQ29udGFpbiBDYXBpdGFsIExldHRlcnM6ICQxCmZpCg== --------------060503060205090409040107 Content-Type: text/plain; charset=us-ascii -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple --------------060503060205090409040107--