Mail Archives: djgpp/1996/12/22/02:30:28
> :> Well, I have the Pentium Programmers Manual sitting in front of me in
> :> Acrobat, and it says it _does_ do 3 cycles per mul. If you want proof
> :> of the speed, look at Quake. Even Abrash said he couldn't get the same
> :> performance out of the pentium with fixed point as he could with
> :> floating point.
> I have no manuals at my hands, but i KNOW that the pentium is capable of
> doing one fmul EVERY cycle, because i DID it. For serious problems you
> don't get that throughput, but something around 2 cycles per flop (fmul
> or fadd/fsub) is possible, if no memory is slowing things down. See the
> BLAS homepage at
> http://cip.physik.uni-wuerzburg.de/~mlkessle/blas1.html
> For simple functions like dot product of short vectors coming out of the
> L1 cache it's possible to achieve 79 MFLOP at a P-133. This gives one
> fpu result every 1.6 cycles. Latency for both fmul and fadd is three cycles,
> therefore you have to use heavily fxch, but it's mostly free anyway.
> Of course, it's not very easy to get that performance, but it's
> possible.
And as Lord Shaman says:
> Anyway, even you are wrong, it's 3 clocks for the first mul, if the next
> FP operation is a mul, it goes through in 1 clock.
Which sounds about right... but then I thought this was in the lower
precision
modes. Maybe I should go check that... :)
Leathal.
- Raw text -