delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1997/01/22/04:45:43

From: Paul Shirley <Paul AT foobar DOT co DOT uk DOT chocolat>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: floating point is... fast???
Date: Tue, 21 Jan 1997 18:56:15 +0000
Organization: wot? me?
Lines: 43
Distribution: world
Message-ID: <CXSrODAPFR5yEwJB@foobar.co.uk>
References: <5brd2e$dap AT lyra DOT csx DOT cam DOT ac DOT uk> <32e22337 DOT 2066519 AT ursa DOT smsu DOT edu>
<5bvjeb$mji AT lyra DOT csx DOT cam DOT ac DOT uk> <853780174 DOT 909237 AT araga DOT funcom DOT com>
Reply-To: Paul Shirley <junk AT defeating DOT email DOT address>
NNTP-Posting-Host: chocolat.foobar.co.uk
Mime-Version: 1.0
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

In article <853780174 DOT 909237 AT araga DOT funcom DOT com>, Kurt Skauen
<kurt DOT skauen AT funcom DOT com> writes
>I don't have the excact numbers, but float to int, and int to float
>conversions are supposed to be very slow. So normaly it is a good idea
>to avoid them. But since the FPU can process one instruction in
>parallel with the CPU (four on Cyrics I heard?), you can execute all
>FPU instrucions on *ONE* cycle as long as you have enough integer
>instructions to fill in between each FPU instruction.

Not quite, fild is fast (int->float), fist is very slow (float->int).
So loading ints is OK but don't ever convert floats->int if possible.
Using ints in an operation is risky however, because whilst loads are
fast, fpu int operations are slow (fiadd for instance) so generally it
makes sense to stick with floats all the way. Its also important to
remember that the integer and fpu units can't pass data between each
other directly (another reason for not mixing them).

Also, whilst most simple fpu ops can effectively execute in 1 clk, there
is still the 2 clk pipeline delay to account for when you try to use the
result.

One useful trick is to manually split floating point calculations up and
re-arrange them interleaved with integer ops, this gives you more
control since in general the compiler will not rearrange floating point
calculations. On a non P5 aware compiler (djgpp) this may only be
relevant for known slow operations of course!

Other things:
        *Don't* use comparisons with floats, the code to access the fpu
flags is awful. If you have to test them its quicker to write a float
result to memory and test the sign bit (b31 in a float,b63 for a
double). This will also allow more integer/float overlap.
        Define temporary values as 'long double's, that way the fpu
stack can be used (otherwise they will often get copied to ram)

Finally, whilst the Cyrix does have a 4 level fifo on its fpu, the fpu
is unpipelined. fpu intensive code will still stall very quickly. And
the stacked operations use the speculative execution hardware, so you
may get unexpected delays during branches. Even so, all the tips for P5
code will help on a Cyrix (the same instructions are relatively slower)

---
Paul Shirley: shuffle chocolat before foobar for my real email address

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019