From: Radical DOT NetSurfer AT delorie DOT com
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Benchmark Testing
Date: Thu, 27 Sep 2001 09:06:49 -0400
Organization: Posted via Supernews, http://www.supernews.com
Message-ID: <ep86rtg0mp41toqlis77ojcd50mojb3tc7@4ax.com>
References: <b4o2rtg4vhngk1ielgftmdo2jk4d9ogu86 AT 4ax DOT com> <9osdbt$r8k$1 AT nets3 DOT rz DOT RWTH-Aachen DOT DE> <ao54rtg7v4un2mc04466ch55eq53fvha1h AT 4ax DOT com> <9ov09c$gig$1 AT nets3 DOT rz DOT RWTH-Aachen DOT DE>
X-Newsreader: Forte Agent 1.8/32.548
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Complaints-To: newsabuse AT supernews DOT com
Lines: 86
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

On 27 Sep 2001 10:50:20 GMT, Hans-Bernhard Broeker
<broeker AT physik DOT rwth-aachen DOT de> wrote:

>Radical wrote:
>> On 26 Sep 2001 11:15:09 GMT, Hans-Bernhard Broeker
>> <broeker AT physik DOT rwth-aachen DOT de> wrote:
>
>>>What is the total number of keywords you search for?  And their
>>>average length?
>
>> The master file (not actual file that gets linked) is 60,000 terms,
>> the actual program uses only 25,000 max terms or so.
>
>That's pretty many, indeed.
>
>> The only way I can see the process running any faster, is perform as
>> much as the operation in RAM as possible, with the code optimized for
>> the best register use...
>
>You don't see far enough, then.  Or so I strongly suspect.
>
>Just compare what your program does to what GCC has to do when it
>compiles a pretty big program: search for appearances of one out of a
>rather large number of token strings, and act accordingly.  Now, how
>much source code can you compile, in the time your program takes to
>check through those 2 Megabytes of input?  I bet it's more than 2 Megs
>even before preprocessing, which usually blows up the file size a bit.
>
>> in crude terms, all my program does is use   strstr(lineoftext,term1)
>> and then it checks the left and right -hand sides of the position
>> returned, and if they are whitespace, or non-alnum, the substitution
>> is allowed to take place using direct pointer manipulations.
>
>Bad plan, I'd say. Here's a better one:
>
>1) organize your search words into lots of individual lists, one list per
>   initial letter.
>
>2) Use strpbrk() or similar methods to cut your input line into words.
>
>3) For each word, do a string comparison against each entry the list
>   of search terms that corresponds to its first letter (start strcmp 
>   at the second letter --- you already know the first one matches).
>
>It may be advisable to sub-structure the word lists even further.
>Organize them by classes of equal length, say, or array out by second
>letter, too.

the list is already alphabetized; that leads to something outragious
like a 1/10th the time....
the method taken has a certain "elegance" and other, and apparently
unseen  benefit

>If that word list is stable for extended periods of time, you'll
>definitely want to give "flex" a try, instead --- you might be
>surprised...

I have no objection to being flex-"ible"
I'd have to find some flex-style programs to decide it, all other
things being considered, that would work for me here.

>> It's too simple to be time consuming, at least thats what I would of
>> thought.
>
>Au contraire: it's time-consuming because it's *too* simple. 25000
>strstrs()'s per input line, and the complicated way you jump back and
>forth in processing it is almost certain to be time-consuming.  You're
>comparing 2 million input letters against 25000 words. Even without
>taking into account word length overheads, that's giving you 50
>billion operations.  No wonder that takes so long, then.

Hmm.....

>> and even GCC 2.95.3 still does have   strrev()  in   libc.a, sheesh!
>> (just a comment).

That was supposed to have said NOT HAVE strrev()...double-sheesh!

>It's not GCC that has anything in its libc --- that's provided
>independently of GCC version, by DJGPP.  And what on earth is so wrong
>about having a strrev() in there, that it makes you sheesh?

PLEASE libc.a people, __add__  strrev() its been missing for far too
long. Thanks.