delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
X-SWARE-Spam-Status: | No, hits=-2.3 required=5.0 tests=AWL,BAYES_00 |
X-Spam-Check-By: | sourceware.org |
Message-ID: | <4AF42027.80604@towo.net> |
Date: | Fri, 06 Nov 2009 14:09:59 +0100 |
From: | Thomas Wolff <towo AT towo DOT net> |
User-Agent: | Thunderbird 2.0.0.23 (Windows/20090812) |
MIME-Version: | 1.0 |
To: | cygwin AT cygwin DOT com |
Subject: | Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file |
References: | <26224019 DOT post AT talk DOT nabble DOT com> <4AF393C6 DOT 3000505 AT tlinx DOT org> <20091106033243 DOT GB30410 AT ednor DOT casa DOT cgf DOT cx> |
In-Reply-To: | <20091106033243.GB30410@ednor.casa.cgf.cx> |
X-IsSubscribed: | yes |
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
List-Id: | <cygwin.cygwin.com> |
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
List-Archive: | <http://sourceware.org/ml/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> |
Sender: | cygwin-owner AT cygwin DOT com |
Mail-Followup-To: | cygwin AT cygwin DOT com |
Delivered-To: | mailing list cygwin AT cygwin DOT com |
Christopher Faylor wrote: > On Thu, Nov 05, 2009 at 07:11:02PM -0800, Linda Walsh wrote: > >> aputerguy wrote: >> >>> Running grep on a 20MB file with ~100,000 matches takes an incredible almost >>> 8 minutes under Cygwin 1.7 while taking just 0.2 seconds under Cygwin 1.5 >>> (on a 2nd machine). >>> >> I've seen nasty behavior with grep that isnt' cygwin specific. Try >> "pcregrep" and see if you have the same issue. >> >> I found it to be about ~100 times faster under _some_ searches though >> 2-3x is more typical. The gnu re-parser isn't real efficient under >> some circumstances. >> >> If you find a big difference, you might also want to report it to the >> bug-grep AT gnu DOT org mailing list, but last time I did, they told me >> "that's the way it is" due to some posix conformance thing... >> > > The fact that it behaves differently between Cygwin 1.5 and 1.7 would > suggest that this isn't a grep problem. > This is likely to be triggered by the transition to UTF-8 as a default charset. The same problem is observed on Linux, with grep as well as with sed. That's why I have changed most of my shell scripts to use something like LC_ALL=C grep or LC_ALL=C sed where possible. Please try this. The problem *is* with grep (and sed), however, because there is no good reason that UTF-8 should give us a penalty of being 100times slower on most search operations, this is just poor programming of grep and sed. Thomas -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |