Mail Archives: djgpp/1996/12/24/10:39:16
In article <59bopp$vn3 AT winx03 DOT informatik DOT uni-wuerzburg DOT de>, Manuel
Kessler <mlkessle AT cip DOT physik DOT uni-wuerzburg DOT de> writes
>Leath Muller (leathm AT gbrmpa DOT gov DOT au) wrote:
>I have no manuals at my hands, but i KNOW that the pentium is capable of
>doing one fmul EVERY cycle, because i DID it. For serious problems you
>don't get that throughput, but something around 2 cycles per flop (fmul
>or fadd/fsub) is possible, if no memory is slowing things down. See the
>BLAS homepage at
The P5 has a 3 clk latency (the time it takes from issue to retiring an
op), a throughput (the time before another op can be issued) of 1 clk
*unless* you issue consecutive multiplies when is has a 2 clk
throughput.
AFAIK you can achieve a maximum multiply throughput of 2clks/mul.
However in real code you have to actually load the next operand or sum
the result which eats up that otherwise wasted cycle. The gcc fpu code
is actually pretty good.
---
Paul Shirley: shuffle chocolat before foobar for my real email address
- Raw text -