delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2016/03/21/12:31:06

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type; q=dns; s=default; b=v+MW
Bqc+TBH7vRo+H/JCjLH3jXqqaLW2ptzFFdPLyoZhcj4L45hQjdXCfZiZNJHwaGT7
Rsr0wizJPHkVmzo+YEhxBn5B5ct3jv4GApBWNLKw0YzZPEd8zV6dTlkQp0FLijJA
u3Cbue/NrouPDhcpesLIl1HsV/PPfoEhTe1LzQA=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type; s=default; bh=PdJtgkiM4T
mqs2I+7+iSD4PDOUk=; b=FcOpp2Ow/a3MzJnj84G6b/OqOxaCs0RrK3zB9XBoIZ
ZKLshZSHG2Rc7VlD6fTEc4QQZZA4qntsmdT7WwXlw+eZ7kBe9dHVaTDAG/RUF+Bu
yXYdnSLvS1t4Z+H2elky3A0WWn0JlfM6Cltq/dN/R+PNC0nD3XGiuuiHwCzXJF4/
I=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=UD:UTF-8, Hx-languages-length:1677, forcing, x80
X-HELO: mx1.redhat.com
Subject: Re: BUG: grep (GNU grep) 2.24
To: cygwin AT cygwin DOT com
References: <70C6F637A7E2844391D854CF15BB7A8F5AAA58FF06 AT TUS1XCHEVSPIN42 DOT SYMC DOT SYMANTEC DOT COM>
From: Eric Blake <eblake AT redhat DOT com>
Openpgp: url=http://people.redhat.com/eblake/eblake.gpg
X-Enigmail-Draft-Status: N1110
Message-ID: <56F021B1.20802@redhat.com>
Date: Mon, 21 Mar 2016 10:30:41 -0600
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0
MIME-Version: 1.0
In-Reply-To: <70C6F637A7E2844391D854CF15BB7A8F5AAA58FF06@TUS1XCHEVSPIN42.SYMC.SYMANTEC.COM>
X-IsSubscribed: yes

--1r8QfTwH2vAGj0hSRsGtve7UU81dM4FLu
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

On 03/21/2016 07:40 AM, Gordon Grimes wrote:
> Hi,
>=20
> I had generated a FILE by simply doing a 'find' on a directory and used g=
rep to cull the results.  I wasn't working so I repeated and tried the foll=
owing trivial 'grep':
>=20
> % wc -l FILE
> 48786
> % grep . FILE
> 2240
>=20
> Very wrong.=20

Umm, grep doesn't output counts unless you use 'grep -c'.  Also, one of
the big changes in recent grep is more efficient handling of encoding
errors; remember, the regular expression '.' is only supposed to match
valid characters, and that an encoding error can cause grep to quit
checking; so in all likelihood, your problem stems from the fact that
the contents of FILE contain an encoding error in your current locale.

But, as others have already pointed out, you didn't post a simple
reproducible example for us to confirm, nor tell us what locale you are
using, nor tell us whether you have tried LC_ALL=3DC to see if forcing a
single-byte locale with no encoding errors cleans up the problem.

So, as the grep maintainer, I'm awaiting proof that there is a problem
(or confirmation that the bug is on your end, and not in grep) before I
worry about putting out another build of grep.  Something like this is
repeatable:

$ printf 'a\n\x80\nc\n' | LC_ALL=3Den_US.UTF-8 wc -l
3
$ printf 'a\n\x80\nc\n' | LC_ALL=3Den_US.UTF-8 src/grep -c .
2
$ printf 'a\n\x80\nc\n' | LC_ALL=3DC src/grep -c .
3

Note how wc counts \n characters, regardless of encoding errors
elsewhere, while grep -c skips the \x80 line because it contains nothing
but encoding errors in UTF-8.

--=20
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


--1r8QfTwH2vAGj0hSRsGtve7UU81dM4FLu
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBCAAGBQJW8CGxAAoJEKeha0olJ0Nq0MIH/RPG4KjaUiiMgBidSsS7UxDf
50DjBn6r/V4rylgEJv76RI7wLlYpaymOEz6GImomc0VlDsKq35Qiq1uyS8XSagAG
zl/JajLMs+cPUrjE3mE4TFrKw9WSfeleV4smQ/8QuFGL5sbhTnB00wtyBL9OR1kS
nqplFwjb3IrLqF77XY0Z8GUqHg7ldLaefoMi04aBIapesaLFabJ0MS0WkkrTc4o+
wprvl6OWZPMQBbVRfN8eEvRdywlANtGo9EGLvqfD43XtWrW6KkF+VVyn73QQ7jCh
nLtxO8bl197aGSoXKtAQ1nrYxBqXsViVjWiMBMhCN4dLfa2mnHg/2VvvXOXvoGM=
=CSSL
-----END PGP SIGNATURE-----

--1r8QfTwH2vAGj0hSRsGtve7UU81dM4FLu--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019