Message-ID: <360C0985.8C21E9F1@mailexcite.com>
Date: Fri, 25 Sep 1998 17:22:14 -0400
From: Doug Gale <dgale AT mailexcite DOT com>
MIME-Version: 1.0
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Floating/fixed point
References: <000101bddf18$9db5fa00$d54b08c3 AT arthur> <01bde20c$af410200$0200a8c0 AT clive>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
NNTP-Posting-Host: oshawappp34.idirect.com
Organization: "Usenet User"
Lines: 49
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Precedence: bulk


Clive Paterson wrote:

> Just incase anyone is interested I've done a fair bit of experimentation
> with fixed point verses floating point math. I found some very interesting
> results.
>
> Firstly, intel definitly has superior floating point. You will  see more of
> a speed increase using fixed point math on a K6 or a cyrix than an intel. I
> don't know about the K6-2 but I hear they've improved their floating point.
>
> Secondly, most people will say that the load/store operations are slow with
> the FPU. This is partially true because a load/store operation will take
> about 33 clock ticks on the old intel FPU. But with newer FPU's, everything
> is cached so if you are doing repetitive calcuations using the FPU the
> load/store time is neglegible.
>
> Also, signed divions will take longer using fixed point math because you
> have to check the signs, and negate negative numbers then if neccessary
> negate the result back to a negative number.
>
> Overall though, fixed point math can be much quicker when applied
> correctly. I tested fixed point vs floating point math on my K6 200 and a
> pentium 100. The program made was a fractal drawer using assembler. The K6
> a speed increase of about 3 times using fixed point math and the p100
> increase in speed about 2 times.

Using the FPU can be a big improvement, if you schedule your instructions not
to store the FPU result until it has finished. One popular use of this trick is
with perspective correct texture mapping. You do your divide, and, while the
divide is executing, draw 16 pixels using the integer unit. After 16 pixels,
store the division result, start another divide, and draw more pixels. I know
Quake uses this trick, among others. One thing that is nice to know is that the
GNU C compiler (and c++ I assume) in DJGPP does this scheduling for you. Just
try not to use the result of the FPU math right after it is calculated and the
compiler will move as many integer instructions between the calculation and
it's use as it can.

One more thing. The K6 has an excellent multiplier unit. It can do an integer
multiply in 2 cycles (3 if result is >= 2^32). On the pentium, a floating point
multiply takes 10 cycles, and pentium integer multiply is even slower! (Compare
a 2 cycle multiply on the K6 to the 43 cycle multiply on the 486! OUCH! :)

Anyone that has done geometry calculations like rotation, or classifying points
with respect to a plane (ie. dot products/cross products), etc. will know that
3D math is full of multiplications. I think the K6 has a clear advantage here.