X-Recipient: archive-cygwin@delorie.com
X-Spam-Check-By: sourceware.org
Date: Fri, 6 Nov 2009 08:47:51 -0500
From: Christopher Faylor <cgf-use-the-mailinglist-please@cygwin.com>
To: cygwin@cygwin.com
Subject: Re: 1.7] BUG - GREP slows to a crawl with large number of matches  on a  single file
Message-ID: <20091106134751.GA1311@ednor.casa.cgf.cx>
Reply-To: cygwin@cygwin.com
Mail-Followup-To: cygwin@cygwin.com
References: <26224019.post@talk.nabble.com>  <4AF393C6.3000505@tlinx.org>  <20091106033243.GB30410@ednor.casa.cgf.cx>  <4AF42027.80604@towo.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4AF42027.80604@towo.net>
User-Agent: Mutt/1.5.20 (2009-06-14)
Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm
Precedence: bulk
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie.com@cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe@cygwin.com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-help@cygwin.com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner@cygwin.com
Mail-Followup-To: cygwin@cygwin.com
Delivered-To: mailing list cygwin@cygwin.com

On Fri, Nov 06, 2009 at 02:09:59PM +0100, Thomas Wolff wrote:
>Christopher Faylor wrote:
>> On Thu, Nov 05, 2009 at 07:11:02PM -0800, Linda Walsh wrote:
>>   
>>> aputerguy wrote:
>>>     
>>>> Running grep on a 20MB file with ~100,000 matches takes an incredible almost
>>>> 8 minutes under Cygwin 1.7 while taking just 0.2 seconds under Cygwin 1.5
>>>> (on a 2nd machine).
>>>>       
>>> I've seen nasty behavior with grep that isnt' cygwin specific.  Try
>>> "pcregrep" and see if you have the same issue.
>>>
>>> I found it to be about ~100 times faster under _some_ searches though
>>> 2-3x is more typical.  The gnu re-parser isn't real efficient under
>>> some circumstances.
>>>
>>> If you find a big difference, you might also want to report it to the
>>> bug-grep@gnu.org mailing list, but last time I did, they told me
>>> "that's the way it is" due to some posix conformance thing...
>>>     
>>
>> The fact that it behaves differently between Cygwin 1.5 and 1.7 would
>> suggest that this isn't a grep problem.
>>   
>This is likely to be triggered by the transition to UTF-8 as a default 
>charset. The same problem is observed on Linux, with grep as well as 
>with sed.
>That's why I have changed most of my shell scripts to use something like
>LC_ALL=C grep or LC_ALL=C sed
>where possible. Please try this.

Thanks for catching this.  I'll hold off on trying the test case until I
hear a report about running the same test with LC_ALL=C.

cgf

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

