X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Fri, 6 Nov 2009 08:47:51 -0500 From: Christopher Faylor To: cygwin AT cygwin DOT com Subject: Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file Message-ID: <20091106134751.GA1311@ednor.casa.cgf.cx> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <26224019 DOT post AT talk DOT nabble DOT com> <4AF393C6 DOT 3000505 AT tlinx DOT org> <20091106033243 DOT GB30410 AT ednor DOT casa DOT cgf DOT cx> <4AF42027 DOT 80604 AT towo DOT net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4AF42027.80604@towo.net> User-Agent: Mutt/1.5.20 (2009-06-14) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Fri, Nov 06, 2009 at 02:09:59PM +0100, Thomas Wolff wrote: >Christopher Faylor wrote: >> On Thu, Nov 05, 2009 at 07:11:02PM -0800, Linda Walsh wrote: >> >>> aputerguy wrote: >>> >>>> Running grep on a 20MB file with ~100,000 matches takes an incredible almost >>>> 8 minutes under Cygwin 1.7 while taking just 0.2 seconds under Cygwin 1.5 >>>> (on a 2nd machine). >>>> >>> I've seen nasty behavior with grep that isnt' cygwin specific. Try >>> "pcregrep" and see if you have the same issue. >>> >>> I found it to be about ~100 times faster under _some_ searches though >>> 2-3x is more typical. The gnu re-parser isn't real efficient under >>> some circumstances. >>> >>> If you find a big difference, you might also want to report it to the >>> bug-grep AT gnu DOT org mailing list, but last time I did, they told me >>> "that's the way it is" due to some posix conformance thing... >>> >> >> The fact that it behaves differently between Cygwin 1.5 and 1.7 would >> suggest that this isn't a grep problem. >> >This is likely to be triggered by the transition to UTF-8 as a default >charset. The same problem is observed on Linux, with grep as well as >with sed. >That's why I have changed most of my shell scripts to use something like >LC_ALL=C grep or LC_ALL=C sed >where possible. Please try this. Thanks for catching this. I'll hold off on trying the test case until I hear a report about running the same test with LC_ALL=C. cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple