Mail Archives: djgpp/1997/01/22/04:45:43
In article <853780174 DOT 909237 AT araga DOT funcom DOT com>, Kurt Skauen
<kurt DOT skauen AT funcom DOT com> writes
>I don't have the excact numbers, but float to int, and int to float
>conversions are supposed to be very slow. So normaly it is a good idea
>to avoid them. But since the FPU can process one instruction in
>parallel with the CPU (four on Cyrics I heard?), you can execute all
>FPU instrucions on *ONE* cycle as long as you have enough integer
>instructions to fill in between each FPU instruction.
Not quite, fild is fast (int->float), fist is very slow (float->int).
So loading ints is OK but don't ever convert floats->int if possible.
Using ints in an operation is risky however, because whilst loads are
fast, fpu int operations are slow (fiadd for instance) so generally it
makes sense to stick with floats all the way. Its also important to
remember that the integer and fpu units can't pass data between each
other directly (another reason for not mixing them).
Also, whilst most simple fpu ops can effectively execute in 1 clk, there
is still the 2 clk pipeline delay to account for when you try to use the
result.
One useful trick is to manually split floating point calculations up and
re-arrange them interleaved with integer ops, this gives you more
control since in general the compiler will not rearrange floating point
calculations. On a non P5 aware compiler (djgpp) this may only be
relevant for known slow operations of course!
Other things:
*Don't* use comparisons with floats, the code to access the fpu
flags is awful. If you have to test them its quicker to write a float
result to memory and test the sign bit (b31 in a float,b63 for a
double). This will also allow more integer/float overlap.
Define temporary values as 'long double's, that way the fpu
stack can be used (otherwise they will often get copied to ram)
Finally, whilst the Cyrix does have a 4 level fifo on its fpu, the fpu
is unpipelined. fpu intensive code will still stall very quickly. And
the stacked operations use the speculative execution hardware, so you
may get unexpected delays during branches. Even so, all the tips for P5
code will help on a Cyrix (the same instructions are relatively slower)
---
Paul Shirley: shuffle chocolat before foobar for my real email address
- Raw text -