From: Radical DOT NetSurfer AT delorie DOT com Newsgroups: comp.os.msdos.djgpp Subject: Re: Benchmark Testing Date: Thu, 27 Sep 2001 09:06:49 -0400 Organization: Posted via Supernews, http://www.supernews.com Message-ID: References: <9osdbt$r8k$1 AT nets3 DOT rz DOT RWTH-Aachen DOT DE> <9ov09c$gig$1 AT nets3 DOT rz DOT RWTH-Aachen DOT DE> X-Newsreader: Forte Agent 1.8/32.548 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: newsabuse AT supernews DOT com Lines: 86 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com On 27 Sep 2001 10:50:20 GMT, Hans-Bernhard Broeker wrote: >Radical wrote: >> On 26 Sep 2001 11:15:09 GMT, Hans-Bernhard Broeker >> wrote: > >>>What is the total number of keywords you search for? And their >>>average length? > >> The master file (not actual file that gets linked) is 60,000 terms, >> the actual program uses only 25,000 max terms or so. > >That's pretty many, indeed. > >> The only way I can see the process running any faster, is perform as >> much as the operation in RAM as possible, with the code optimized for >> the best register use... > >You don't see far enough, then. Or so I strongly suspect. > >Just compare what your program does to what GCC has to do when it >compiles a pretty big program: search for appearances of one out of a >rather large number of token strings, and act accordingly. Now, how >much source code can you compile, in the time your program takes to >check through those 2 Megabytes of input? I bet it's more than 2 Megs >even before preprocessing, which usually blows up the file size a bit. > >> in crude terms, all my program does is use strstr(lineoftext,term1) >> and then it checks the left and right -hand sides of the position >> returned, and if they are whitespace, or non-alnum, the substitution >> is allowed to take place using direct pointer manipulations. > >Bad plan, I'd say. Here's a better one: > >1) organize your search words into lots of individual lists, one list per > initial letter. > >2) Use strpbrk() or similar methods to cut your input line into words. > >3) For each word, do a string comparison against each entry the list > of search terms that corresponds to its first letter (start strcmp > at the second letter --- you already know the first one matches). > >It may be advisable to sub-structure the word lists even further. >Organize them by classes of equal length, say, or array out by second >letter, too. the list is already alphabetized; that leads to something outragious like a 1/10th the time.... the method taken has a certain "elegance" and other, and apparently unseen benefit >If that word list is stable for extended periods of time, you'll >definitely want to give "flex" a try, instead --- you might be >surprised... I have no objection to being flex-"ible" I'd have to find some flex-style programs to decide it, all other things being considered, that would work for me here. >> It's too simple to be time consuming, at least thats what I would of >> thought. > >Au contraire: it's time-consuming because it's *too* simple. 25000 >strstrs()'s per input line, and the complicated way you jump back and >forth in processing it is almost certain to be time-consuming. You're >comparing 2 million input letters against 25000 words. Even without >taking into account word length overheads, that's giving you 50 >billion operations. No wonder that takes so long, then. Hmm..... >> and even GCC 2.95.3 still does have strrev() in libc.a, sheesh! >> (just a comment). That was supposed to have said NOT HAVE strrev()...double-sheesh! >It's not GCC that has anything in its libc --- that's provided >independently of GCC version, by DJGPP. And what on earth is so wrong >about having a strrev() in there, that it makes you sheesh? PLEASE libc.a people, __add__ strrev() its been missing for far too long. Thanks.