Message-ID: <360C0985.8C21E9F1@mailexcite.com> Date: Fri, 25 Sep 1998 17:22:14 -0400 From: Doug Gale MIME-Version: 1.0 Newsgroups: comp.os.msdos.djgpp Subject: Re: Floating/fixed point References: <000101bddf18$9db5fa00$d54b08c3 AT arthur> <01bde20c$af410200$0200a8c0 AT clive> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit NNTP-Posting-Host: oshawappp34.idirect.com Organization: "Usenet User" Lines: 49 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Precedence: bulk Clive Paterson wrote: > Just incase anyone is interested I've done a fair bit of experimentation > with fixed point verses floating point math. I found some very interesting > results. > > Firstly, intel definitly has superior floating point. You will see more of > a speed increase using fixed point math on a K6 or a cyrix than an intel. I > don't know about the K6-2 but I hear they've improved their floating point. > > Secondly, most people will say that the load/store operations are slow with > the FPU. This is partially true because a load/store operation will take > about 33 clock ticks on the old intel FPU. But with newer FPU's, everything > is cached so if you are doing repetitive calcuations using the FPU the > load/store time is neglegible. > > Also, signed divions will take longer using fixed point math because you > have to check the signs, and negate negative numbers then if neccessary > negate the result back to a negative number. > > Overall though, fixed point math can be much quicker when applied > correctly. I tested fixed point vs floating point math on my K6 200 and a > pentium 100. The program made was a fractal drawer using assembler. The K6 > a speed increase of about 3 times using fixed point math and the p100 > increase in speed about 2 times. Using the FPU can be a big improvement, if you schedule your instructions not to store the FPU result until it has finished. One popular use of this trick is with perspective correct texture mapping. You do your divide, and, while the divide is executing, draw 16 pixels using the integer unit. After 16 pixels, store the division result, start another divide, and draw more pixels. I know Quake uses this trick, among others. One thing that is nice to know is that the GNU C compiler (and c++ I assume) in DJGPP does this scheduling for you. Just try not to use the result of the FPU math right after it is calculated and the compiler will move as many integer instructions between the calculation and it's use as it can. One more thing. The K6 has an excellent multiplier unit. It can do an integer multiply in 2 cycles (3 if result is >= 2^32). On the pentium, a floating point multiply takes 10 cycles, and pentium integer multiply is even slower! (Compare a 2 cycle multiply on the K6 to the 43 cycle multiply on the 486! OUCH! :) Anyone that has done geometry calculations like rotation, or classifying points with respect to a plane (ie. dot products/cross products), etc. will know that 3D math is full of multiplications. I think the K6 has a clear advantage here.