Mailing-List: contact cygwin-help AT sourceware DOT cygnus DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT sources DOT redhat DOT com Delivered-To: mailing list cygwin AT sources DOT redhat DOT com From: "Zack Weinberg" Date: Fri, 8 Jun 2001 09:59:32 -0700 To: Eli Zaretskii Cc: dj AT redhat DOT com, gcc AT gcc DOT gnu DOT org, gdb AT sources DOT redhat DOT com, binutils AT sources DOT redhat DOT com, cygwin AT sources DOT redhat DOT com Subject: Re: Another RFC: regex in libiberty Message-ID: <20010608095932.S979@stanford.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9003-Fri08Jun2001100651+0300-eliz@is.elta.co.il> User-Agent: Mutt/1.3.18i On Fri, Jun 08, 2001 at 10:06:51AM +0300, Eli Zaretskii wrote: > > One notorious problem with GNU regex is that it is quite slow for many > simple jobs, such as matching a simple regular expression with no > backtracking. It seems that the main reason for this slowness is the > fact that GNU regex supports null characters in strings. For > examnple, Sed 3.02 compiled with GNU regex is about 2-4 times slower > on simple jobs than the same Sed compiled with Spencer's regex > library. I think the null characters are a red herring. I looked into GNU regex's performance in the context of GCC's fixincludes program, last year. On a platform that has mostly-okay headers, fixincludes spends most of its time matching regular expressions. The regex.c that came with GDB 4.18, which I think is the one that got spread around widely, had a bug in its implementation of the POSIX regcomp/regexec interface, which caused a major performance hit. That bug has been fixed in GNU libc for a long time. When I replaced fixincludes' copy of regex.c with a more recent version from glibc, fixincludes was sped up by a factor of nine. That same bug affects Sed 3.02 - replace the regex.c it ships with with the one from glibc 2.2.x and I bet you'll see better performance. There's some discussion in these messages: http://gcc.gnu.org/ml/gcc-patches/2000-01/msg00764.html http://gcc.gnu.org/ml/gcc-patches/2000-01/msg00765.html The relevant fix is in there, too, if you want to pull it out and apply it. I did some benchmarking of fixincludes with Spencer's regexp library as well. IIRC, it was about the same as the fixed GNU regex.c. -- zw This is, no doubt, the rational strategy; quite possibly the only one that will work. But it ignores the exigiencies of the tenure system and is therefore impractical. -- Jerry Fodor, _The Mind Doesn't Work That Way_ -- Want to unsubscribe from this list? Check out: http://cygwin.com/ml/#unsubscribe-simple