delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2011/04/04/18:19:31

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-6.9 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Message-ID: <4D9A43E4.50305@redhat.com>
Date: Mon, 04 Apr 2011 16:19:16 -0600
From: Eric Blake <eblake AT redhat DOT com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.15) Gecko/20110307 Fedora/3.1.9-0.39.b3pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.9
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: grep problem?
References: <CAD31248BE809F4A869854C597A558EE9433F274 AT TROUX-EX01 DOT hq DOT troux DOT com>
In-Reply-To: <CAD31248BE809F4A869854C597A558EE9433F274@TROUX-EX01.hq.troux.com>
OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

--------------enig43BC6E6DE4AD495DF777B6FE
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On 04/04/2011 04:09 PM, Jim Garrison wrote:
> I'm getting weird behavior from grep. Searching for a bracketed range of =
characters (i.e. [A-Z]) is doing case-insensitive matching, while an identi=
cal but explicit character set match (i.e. [ABCDE...Z]) does not.

Your problem is not with grep, but with your LC_COLLATE settings (which
inherit from LC_ALL).

POSIX states that range expressions (such as [A-Z]) are undefined in any
locale except C; and some locales (like en_US.UTF-8) happen to treat A-B
as AaB, A-b as AaBb, and so forth (that is, they collate
case-insensitively).

>=20
> $ grep '[a-b]' test.dat
> abcde
> ABCDE

So, in a case-insensitive collation, this range expression includes at
least one of A or B (but probably not both); and since that matches the
ABCDE line, you get a correct result for the collation locale you requested.

>=20
> Contrast with the correctly-working examples below
>=20
> $ grep '[ab]' test.dat
> abcde

Here, there's no range, so there's no ambiguity.

Also, try "LC_ALL=3DC grep '[a-b]' test.dat" to see a difference.

--=20
Eric Blake   eblake AT redhat DOT com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org


--------------enig43BC6E6DE4AD495DF777B6FE
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBCAAGBQJNmkPkAAoJEKeha0olJ0NqgJIH/ioD8/TqSao22mBQIrZjOWoG
hz+DHmBW9WHfcXFo3WY4iX2Fq9GTAmBKCOpYhymfiWbVVMOVmexFIWRlrFUQ4uPx
pdGuaVqb5VMh0UNGazF8nrT4/0I1a0uF8C0SWZ3OqucB5w71nNA2YhMuDMkAYZZN
PCYdy4WsnCXHG/6UK50k+YdswEN+njgPrYOE+VPqJOZ+UTA0cUNIKVbz6Va7C9Eq
x+3zSTBNge9OAORE+Vo9Pc04D1YGQfNVVf+vkKM0JH5FKwpYaMseSJngHDLgDy2t
LSWlz3l0lS2Bfhprnxm/EETq+69+DbboJ419Z2lU1caQDMfUmeyYPw7TA+GMOz4=
=BvCW
-----END PGP SIGNATURE-----

--------------enig43BC6E6DE4AD495DF777B6FE--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019