delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1997/03/05/18:39:56

From: leathm AT solwarra DOT gbrmpa DOT gov DOT au (Leath Muller)
Message-Id: <199703052331.JAA14058@solwarra.gbrmpa.gov.au>
Subject: Re: Allegro perspective-correct .. (fpu memcopy)
To: nikki AT gameboutique DOT co (nikki)
Date: Thu, 6 Mar 1997 09:31:41 +1000 (EST)
Cc: djgpp AT delorie DOT com
In-Reply-To: <5fji73$8fo@flex.uunet.pipex.com> from "nikki" at Mar 5, 97 10:34:11 am

> ah, but then it's >32 bytes and won't fit in a cache. the resulting loss is
> probably not worth it therefore :( if you have a moment give it a try though
> and see if you can come up with any hard and fast values here, my timing
> routines suck pretty bad :(

I timed it last night and found using all 8 registers sped it up overall about
1 cycle per 2 pixels... If your moving a huge chunk of memory around, your
going to have hit the cache anyway. While moving around 8 lots of 8 bytes,
your filling 2 32-byte cache lines automatically because the instructions
are sequentially reading from the same memory. Did that make sense? :)
 
> ah there's a problem there. using 80bit values will take longer to load :(
> it's 3 cycles for an 80bit load and 1cycle for a 64bit load. 
> how do i change the fpu mode in inline asm like that anyway btw? i haven't
> managed to ever get that to work :(

No, I mean manually put the machine into double (64 bit) precision to ensure
your running at that precision. I can't remember exact numbers, but from
memory to put the FPU in say single precision the code is something of the
sort:
	short	OldFPUCW, FPUCW;

	asm volatile ("
		fstcw	%ax;
		movw	%ax, (_OldFPUCW);
		andw	$value, %ax;
		movw	%ax, (_FPUCW);
		fldcw	(_FPUCW);	
	");

then to restore precision to its previous state:

	asm volatile ("
		fldcw	(_OldFPUCW);
	");

I think single precision mode can be attained with NOT 110000000. Try that
value and see how you go... 

> i suspect the 6 cycle loading rather than 2 cycle loading now causes considerable
> slowdown though :(
 
I wrote a small routine last night to use normal fldl's and fstpl's and
didn't have a problem. The screen (appeared to) blit perfectly. One note
though, it was blitting to a true colour (32 bit) display...

Leathal.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019