Mail Archives: djgpp/1997/03/05/18:39:56
> ah, but then it's >32 bytes and won't fit in a cache. the resulting loss is
> probably not worth it therefore :( if you have a moment give it a try though
> and see if you can come up with any hard and fast values here, my timing
> routines suck pretty bad :(
I timed it last night and found using all 8 registers sped it up overall about
1 cycle per 2 pixels... If your moving a huge chunk of memory around, your
going to have hit the cache anyway. While moving around 8 lots of 8 bytes,
your filling 2 32-byte cache lines automatically because the instructions
are sequentially reading from the same memory. Did that make sense? :)
> ah there's a problem there. using 80bit values will take longer to load :(
> it's 3 cycles for an 80bit load and 1cycle for a 64bit load.
> how do i change the fpu mode in inline asm like that anyway btw? i haven't
> managed to ever get that to work :(
No, I mean manually put the machine into double (64 bit) precision to ensure
your running at that precision. I can't remember exact numbers, but from
memory to put the FPU in say single precision the code is something of the
sort:
short OldFPUCW, FPUCW;
asm volatile ("
fstcw %ax;
movw %ax, (_OldFPUCW);
andw $value, %ax;
movw %ax, (_FPUCW);
fldcw (_FPUCW);
");
then to restore precision to its previous state:
asm volatile ("
fldcw (_OldFPUCW);
");
I think single precision mode can be attained with NOT 110000000. Try that
value and see how you go...
> i suspect the 6 cycle loading rather than 2 cycle loading now causes considerable
> slowdown though :(
I wrote a small routine last night to use normal fldl's and fstpl's and
didn't have a problem. The screen (appeared to) blit perfectly. One note
though, it was blitting to a true colour (32 bit) display...
Leathal.
- Raw text -