From: leathm AT solwarra DOT gbrmpa DOT gov DOT au (Leath Muller) Message-Id: <199704302348.JAA11429@solwarra.gbrmpa.gov.au> Subject: Re: Alignment To: wapex AT silesia DOT top DOT pl (Michal) Date: Thu, 1 May 1997 09:48:35 +3400 (EST) Cc: djgpp AT delorie DOT com In-Reply-To: <3367B958.88@silesia.top.pl> from "Michal" at Apr 30, 97 11:27:52 pm Content-Type: text Precedence: bulk > As far as I know double and single operations are the same speed on > pentium. The only instruction, which is faster in single precision is > fdiv, but it takes more time to put 8 pixels(I interpolate u & v in > texture and light value lineary every 8 pixels) then to exectute double > precision fdiv. With double I can use some tricks, and have better > precision. No - your wrong... :) The fdiv, sqrt, fmul, fadd and fsub are all affected by moving the FPU into single precision mode... I also get the impression then that your texturing 8 pixels, lighting 8 pixels, texturing 8 pixels, lighting... etc ... Basically, this is _really_ bad for cache coherency - your better off texturing the complete scanline and then lighting the complete scanline. I moved to this way with using a temporary offscreen memory buffer of 2560 bytes (I do stuff in true colour). Write the texture stuff to the offscreen memory (which in my inner loop never left the 8k cache area per line), and then do your lighting from there... If your wondering, I had my perspective correct, sub-pixel accurate true colour light-sourced, gouraud shaded engine running at 16 cycles per pixel. With MMX registers, I could get it running in 9 cycles per pixel... which is faster than Quake and looks a whole lot better... Leathal.