Mail Archives: djgpp/1997/03/05/06:28:24
> Hmmm...you realise if you extend this code to use all 8 registers, you can
> speed it up even more, performing only 2 cache loads per loop. You can also
> remove the addl's by using indexed addressing to save another cycle each
> loop...
ah, but then it's >32 bytes and won't fit in a cache. the resulting loss is
probably not worth it therefore :( if you have a moment give it a try though
and see if you can come up with any hard and fast values here, my timing
routines suck pretty bad :(
> I haven't played with this at all actually, because I haven't need to fully
> optimise yet. But I will have a look at it tonight and see how I go. Have
> you tried putting the FPU into double precision mode before doing this? If
ah there's a problem there. using 80bit values will take longer to load :(
it's 3 cycles for an 80bit load and 1cycle for a 64bit load.
how do i change the fpu mode in inline asm like that anyway btw? i haven't
managed to ever get that to work :(
> you do that, the values should be stored as loaded, and no conversion should
> occur. If you are using the FPU in extended precision, it might be causing
> problems with the 64-80-64 bit conversion process. Reducing the precision
> would probably help by causing no conversions to be done...and not run any
> slower because your still moving 8 bytes a time...
i suspect the 6 cycle loading rather than 2 cycle loading now causes considerable
slowdown though :(
regards,
nik
--
Graham Tootell
nikki AT gameboutique DOT com
- Raw text -