Mail Archives: djgpp/2001/09/27/09:19:14
On 27 Sep 2001 10:50:20 GMT, Hans-Bernhard Broeker
<broeker AT physik DOT rwth-aachen DOT de> wrote:
>Radical wrote:
>> On 26 Sep 2001 11:15:09 GMT, Hans-Bernhard Broeker
>> <broeker AT physik DOT rwth-aachen DOT de> wrote:
>
>>>What is the total number of keywords you search for? And their
>>>average length?
>
>> The master file (not actual file that gets linked) is 60,000 terms,
>> the actual program uses only 25,000 max terms or so.
>
>That's pretty many, indeed.
>
>> The only way I can see the process running any faster, is perform as
>> much as the operation in RAM as possible, with the code optimized for
>> the best register use...
>
>You don't see far enough, then. Or so I strongly suspect.
>
>Just compare what your program does to what GCC has to do when it
>compiles a pretty big program: search for appearances of one out of a
>rather large number of token strings, and act accordingly. Now, how
>much source code can you compile, in the time your program takes to
>check through those 2 Megabytes of input? I bet it's more than 2 Megs
>even before preprocessing, which usually blows up the file size a bit.
>
>> in crude terms, all my program does is use strstr(lineoftext,term1)
>> and then it checks the left and right -hand sides of the position
>> returned, and if they are whitespace, or non-alnum, the substitution
>> is allowed to take place using direct pointer manipulations.
>
>Bad plan, I'd say. Here's a better one:
>
>1) organize your search words into lots of individual lists, one list per
> initial letter.
>
>2) Use strpbrk() or similar methods to cut your input line into words.
>
>3) For each word, do a string comparison against each entry the list
> of search terms that corresponds to its first letter (start strcmp
> at the second letter --- you already know the first one matches).
>
>It may be advisable to sub-structure the word lists even further.
>Organize them by classes of equal length, say, or array out by second
>letter, too.
the list is already alphabetized; that leads to something outragious
like a 1/10th the time....
the method taken has a certain "elegance" and other, and apparently
unseen benefit
>If that word list is stable for extended periods of time, you'll
>definitely want to give "flex" a try, instead --- you might be
>surprised...
I have no objection to being flex-"ible"
I'd have to find some flex-style programs to decide it, all other
things being considered, that would work for me here.
>> It's too simple to be time consuming, at least thats what I would of
>> thought.
>
>Au contraire: it's time-consuming because it's *too* simple. 25000
>strstrs()'s per input line, and the complicated way you jump back and
>forth in processing it is almost certain to be time-consuming. You're
>comparing 2 million input letters against 25000 words. Even without
>taking into account word length overheads, that's giving you 50
>billion operations. No wonder that takes so long, then.
Hmm.....
>> and even GCC 2.95.3 still does have strrev() in libc.a, sheesh!
>> (just a comment).
That was supposed to have said NOT HAVE strrev()...double-sheesh!
>It's not GCC that has anything in its libc --- that's provided
>independently of GCC version, by DJGPP. And what on earth is so wrong
>about having a strrev() in there, that it makes you sheesh?
PLEASE libc.a people, __add__ strrev() its been missing for far too
long. Thanks.
- Raw text -