delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
DomainKey-Signature: | a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:subject:to:references:from:message-id:date | |
:mime-version:in-reply-to:content-type; q=dns; s=default; b=v+MW | |
Bqc+TBH7vRo+H/JCjLH3jXqqaLW2ptzFFdPLyoZhcj4L45hQjdXCfZiZNJHwaGT7 | |
Rsr0wizJPHkVmzo+YEhxBn5B5ct3jv4GApBWNLKw0YzZPEd8zV6dTlkQp0FLijJA | |
u3Cbue/NrouPDhcpesLIl1HsV/PPfoEhTe1LzQA= | |
DKIM-Signature: | v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:subject:to:references:from:message-id:date | |
:mime-version:in-reply-to:content-type; s=default; bh=PdJtgkiM4T | |
mqs2I+7+iSD4PDOUk=; b=FcOpp2Ow/a3MzJnj84G6b/OqOxaCs0RrK3zB9XBoIZ | |
ZKLshZSHG2Rc7VlD6fTEc4QQZZA4qntsmdT7WwXlw+eZ7kBe9dHVaTDAG/RUF+Bu | |
yXYdnSLvS1t4Z+H2elky3A0WWn0JlfM6Cltq/dN/R+PNC0nD3XGiuuiHwCzXJF4/ | |
I= | |
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
List-Id: | <cygwin.cygwin.com> |
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
List-Archive: | <http://sourceware.org/ml/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> |
Sender: | cygwin-owner AT cygwin DOT com |
Mail-Followup-To: | cygwin AT cygwin DOT com |
Delivered-To: | mailing list cygwin AT cygwin DOT com |
Authentication-Results: | sourceware.org; auth=none |
X-Virus-Found: | No |
X-Spam-SWARE-Status: | No, score=-1.9 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=UD:UTF-8, Hx-languages-length:1677, forcing, x80 |
X-HELO: | mx1.redhat.com |
Subject: | Re: BUG: grep (GNU grep) 2.24 |
To: | cygwin AT cygwin DOT com |
References: | <70C6F637A7E2844391D854CF15BB7A8F5AAA58FF06 AT TUS1XCHEVSPIN42 DOT SYMC DOT SYMANTEC DOT COM> |
From: | Eric Blake <eblake AT redhat DOT com> |
Openpgp: | url=http://people.redhat.com/eblake/eblake.gpg |
X-Enigmail-Draft-Status: | N1110 |
Message-ID: | <56F021B1.20802@redhat.com> |
Date: | Mon, 21 Mar 2016 10:30:41 -0600 |
User-Agent: | Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 |
MIME-Version: | 1.0 |
In-Reply-To: | <70C6F637A7E2844391D854CF15BB7A8F5AAA58FF06@TUS1XCHEVSPIN42.SYMC.SYMANTEC.COM> |
X-IsSubscribed: | yes |
--1r8QfTwH2vAGj0hSRsGtve7UU81dM4FLu Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 03/21/2016 07:40 AM, Gordon Grimes wrote: > Hi, >=20 > I had generated a FILE by simply doing a 'find' on a directory and used g= rep to cull the results. I wasn't working so I repeated and tried the foll= owing trivial 'grep': >=20 > % wc -l FILE > 48786 > % grep . FILE > 2240 >=20 > Very wrong.=20 Umm, grep doesn't output counts unless you use 'grep -c'. Also, one of the big changes in recent grep is more efficient handling of encoding errors; remember, the regular expression '.' is only supposed to match valid characters, and that an encoding error can cause grep to quit checking; so in all likelihood, your problem stems from the fact that the contents of FILE contain an encoding error in your current locale. But, as others have already pointed out, you didn't post a simple reproducible example for us to confirm, nor tell us what locale you are using, nor tell us whether you have tried LC_ALL=3DC to see if forcing a single-byte locale with no encoding errors cleans up the problem. So, as the grep maintainer, I'm awaiting proof that there is a problem (or confirmation that the bug is on your end, and not in grep) before I worry about putting out another build of grep. Something like this is repeatable: $ printf 'a\n\x80\nc\n' | LC_ALL=3Den_US.UTF-8 wc -l 3 $ printf 'a\n\x80\nc\n' | LC_ALL=3Den_US.UTF-8 src/grep -c . 2 $ printf 'a\n\x80\nc\n' | LC_ALL=3DC src/grep -c . 3 Note how wc counts \n characters, regardless of encoding errors elsewhere, while grep -c skips the \x80 line because it contains nothing but encoding errors in UTF-8. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --1r8QfTwH2vAGj0hSRsGtve7UU81dM4FLu Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJW8CGxAAoJEKeha0olJ0Nq0MIH/RPG4KjaUiiMgBidSsS7UxDf 50DjBn6r/V4rylgEJv76RI7wLlYpaymOEz6GImomc0VlDsKq35Qiq1uyS8XSagAG zl/JajLMs+cPUrjE3mE4TFrKw9WSfeleV4smQ/8QuFGL5sbhTnB00wtyBL9OR1kS nqplFwjb3IrLqF77XY0Z8GUqHg7ldLaefoMi04aBIapesaLFabJ0MS0WkkrTc4o+ wprvl6OWZPMQBbVRfN8eEvRdywlANtGo9EGLvqfD43XtWrW6KkF+VVyn73QQ7jCh nLtxO8bl197aGSoXKtAQ1nrYxBqXsViVjWiMBMhCN4dLfa2mnHg/2VvvXOXvoGM= =CSSL -----END PGP SIGNATURE----- --1r8QfTwH2vAGj0hSRsGtve7UU81dM4FLu--
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |