Mail Archives: djgpp/1997/03/01/10:01:54
Brian Osman (osmanb AT rpi DOT edu) writes:
> nikki wrote:
>>
>> hardly a great surprise seeing as the loop above would quite probably fit in
>> the cache when well optimised, but unrolled would thrash it horribly.
>> unrolling loops won't save an enormous amount of time, after all a jump
>> instruction will only take you 3 or 4 cycles at most.
>>
>> nik
>>
>> --
>> Graham Tootell
>> nikki AT gameboutique DOT com
>
> Bear in mind that in many of the newer processes (ie PPro) which
> use predictive branching, branches are one of the single worst
> instructions. A mispredicted branch means that all of the pipeline,
> and the cache has to be invalidated and flushed. Not pretty.
> There are some cases where loop unrolling won't help much, but
> it's still a valid and useful optimization technique. I don't
> suppose -O3 is causing any unrolling? :)
No, -O3 does inlining but not unrolling. And I don't trust -funroll-loops
or -funroll-all-loops, because they might unroll loops with
run-time-determined numbers of executions that will die very horribly if they
run a few times more than they are supposed to, i.e. they will access a
mallocked array out of its bounds or something. So, I did it manually.
#define XLOOP stuff in the innermost loop
#define YLOOP outer loop stuff and about twenty of
XLOOP;XLOOP;XLOOP etc.
#define ZLOOP about 14 of YLOOP; YLOOP etc.
(All macros made "function-like" in leaving a semicolon off the last
statement so ZLOOP; etc. would be correct syntax when expanded.)
--
.*. Where feelings are concerned, answers are rarely simple [GeneDeWeese]
-() < When I go to the theater, I always go straight to the "bag and mix"
`*' bulk candy section...because variety is the spice of life... [me]
Paul Derbyshire ao950 AT freenet DOT carleton DOT ca, http://chat.carleton.ca/~pderbysh
- Raw text -