delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/11/08/05:30:58

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Sun, 8 Nov 2009 11:30:38 +0100
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: 1.7] BUG - GREP slows to a crawl with large number of matches on a single file
Message-ID: <20091108103038.GY26344@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <26224019 DOT post AT talk DOT nabble DOT com> <4AF393C6 DOT 3000505 AT tlinx DOT org> <20091106033243 DOT GB30410 AT ednor DOT casa DOT cgf DOT cx> <4AF42027 DOT 80604 AT towo DOT net> <20091106135152 DOT GK26344 AT calimero DOT vinschen DOT de> <814i7o$49eric AT dmzms99802 DOT na DOT baesystems DOT com> <806a89db0911061422l290ff84u3d58cbbe1d3eface AT mail DOT gmail DOT com> <1257632832 DOT 5773 DOT 48 DOT camel AT fast> <26249599 DOT post AT talk DOT nabble DOT com>
MIME-Version: 1.0
In-Reply-To: <26249599.post@talk.nabble.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Nov  7 15:26, aputerguy wrote:
> 
> Changing LC_ALL also solved the problem for me.
> But it begs the question of how many other basic and take-for-granted
> functions might be affected by this apparent UTF-8 slowdown. And again we,
> are not talking about some minor overhead, we are talking about a slowdown
> of 1500X or 150,000%!!!!

Yeah, that's really still strange to me.  In my testing, the multibyte
to widechar conversion performed by grep in case of UTF-8 took only
1.5 up to 4 seconds for 10 times the number of input lines as in your
case.  It still puzzles me where the time is wasted in grep.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019