From: Paul Shirley Newsgroups: comp.os.msdos.djgpp Subject: Re: floating point is... fast??? Date: Tue, 21 Jan 1997 18:56:15 +0000 Organization: wot? me? Lines: 43 Distribution: world Message-ID: References: <5brd2e$dap AT lyra DOT csx DOT cam DOT ac DOT uk> <32e22337 DOT 2066519 AT ursa DOT smsu DOT edu> <5bvjeb$mji AT lyra DOT csx DOT cam DOT ac DOT uk> <853780174 DOT 909237 AT araga DOT funcom DOT com> Reply-To: Paul Shirley NNTP-Posting-Host: chocolat.foobar.co.uk Mime-Version: 1.0 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp In article <853780174 DOT 909237 AT araga DOT funcom DOT com>, Kurt Skauen writes >I don't have the excact numbers, but float to int, and int to float >conversions are supposed to be very slow. So normaly it is a good idea >to avoid them. But since the FPU can process one instruction in >parallel with the CPU (four on Cyrics I heard?), you can execute all >FPU instrucions on *ONE* cycle as long as you have enough integer >instructions to fill in between each FPU instruction. Not quite, fild is fast (int->float), fist is very slow (float->int). So loading ints is OK but don't ever convert floats->int if possible. Using ints in an operation is risky however, because whilst loads are fast, fpu int operations are slow (fiadd for instance) so generally it makes sense to stick with floats all the way. Its also important to remember that the integer and fpu units can't pass data between each other directly (another reason for not mixing them). Also, whilst most simple fpu ops can effectively execute in 1 clk, there is still the 2 clk pipeline delay to account for when you try to use the result. One useful trick is to manually split floating point calculations up and re-arrange them interleaved with integer ops, this gives you more control since in general the compiler will not rearrange floating point calculations. On a non P5 aware compiler (djgpp) this may only be relevant for known slow operations of course! Other things: *Don't* use comparisons with floats, the code to access the fpu flags is awful. If you have to test them its quicker to write a float result to memory and test the sign bit (b31 in a float,b63 for a double). This will also allow more integer/float overlap. Define temporary values as 'long double's, that way the fpu stack can be used (otherwise they will often get copied to ram) Finally, whilst the Cyrix does have a 4 level fifo on its fpu, the fpu is unpipelined. fpu intensive code will still stall very quickly. And the stacked operations use the speculative execution hardware, so you may get unexpected delays during branches. Even so, all the tips for P5 code will help on a Cyrix (the same instructions are relatively slower) --- Paul Shirley: shuffle chocolat before foobar for my real email address