delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1997/03/04/23:29:52

Date: Tue, 4 Mar 1997 23:24:28 -0500 (EST)
From: Michael Phelps <morphine AT hops DOT cs DOT jhu DOT edu>
To: nikki <nikki AT gameboutique DOT co>
cc: djgpp AT delorie DOT com
Subject: Re: Allegro perspective-correct .. (fpu memcopy)
In-Reply-To: <5fhtio$rqm@flex.uunet.pipex.com>
Message-ID: <Pine.GSO.3.95.970304231733.4697A-100000@hops.cs.jhu.edu>
MIME-Version: 1.0

On 4 Mar 1997, nikki wrote:

> >> > 	2) Cause your program to ignore FP exceptions by including the
> >> > following somewhere at its beginning:
> >> > 
> >> > 	#include <signal.h>
> >> > 	...
> >> > 	signal (SIGFPE, SIG_IGN);
> >> 
> >> i wasn't aware you could do this actually. does this perhaps mean that you
> >> could use a memcopy with fld and fstp and just ignore errors like this? it
> >> would be much faster than the fild fistp version obviously...
> 
> well, for the benefit of the djgpp community as a whole here's the result.
> first the standard fpu memcopy which i use. this is 2 cycles faster than the
> fastest i've ever seen anywhere else (the agner fog articles) and is 100%
> accurate :
> 
> asm volatile ("1:\n\t"
>               "fildq (%%esi)\n\t"             // load first qword  1 NP (2,3)
>               "fildq 8(%%esi)\n\t"            // load second qword 2 NP (3,4)
>               "addl $16,%%esi\n\t"            // update esi        3 uv
>               "addl $16,%%edi\n\t"            // update edi        3 uv
>               "fistpq -8(%%edi)\n\t"          // save 2nd qword    4 NP (-9)
>               "fistpq -16(%%edi)\n\t"         // save 1st qword   10 NP (-15)
>               "decl %%ecx\n\t"                // dec ecx          16 uv
>               "jnz 1b"                        // (loop)           16  v
>              :
>              : "S" (scr_buf), "D" (videoptr), "c" (no_to_move)
>              : "ecx", "esi", "edi" );
> 
> as you can see, the slow part is the fist which takes a fat 6NP :( but it
> still manages 16 bytes in 16 cycles with 1/2 the normal write misses and
> associated cache penalties.
> 
> now the fast (and theoretically not so accurate) version i came up with.
> replace the flid and fist with fld and fst and set the flags as eli 
> described above. the result is an 8 cycle loop - twice as fast in fact.
> the disadvantages is that this is a 'lossy' form of moving data about. there
> are some sequences of numbers which cause errors and these show quite visibly
> if you're using a blitz to screen for instance. my suggestion therefore is to
> only use this for 24bit screen displays and to +-1 from the values that cause
> fpu errors so that this never happens. the result is something that's visually
> indistinguishable from what you want but twice as fast. (and 4 times faster
> than the rep stos versions) so my question really is - does anyone know which
> sequences cause fpu errors so i can avoid them? :) perhaps leath would know?
> 
> regards,
> nik
> 
> 
> 
> -- 
> Graham Tootell           
> nikki AT gameboutique DOT com  
> 

Now this is interesting.  I have a program that I translated part of into
extended asm because it was taking way too long on our workstation when
programmed in C.  This is basically what it does:
	1) subtract one long integer from another
	2) perform a negl if the result is negative
	3) check to see if the absolute value of the difference is < a
	   given number
	4) store result of comparison (0 or 1) in a given array
	5) repeat step #1 with next number in sequence
	6) when all numbers have been compared with that first number,
	   repeat step #1 using the next number and scanning through all
	   the rest, until all numbers have been compared with each other

(Actually, this is somewhat simplified, since I have taking into account
the fact that the vector comparison is commutative, and that each matches
exactly with itself, so the actual amount of comparisons I have to do is
half of the above, but it gives the idea.)
	Anyway, is there a way to do this faster using one of DJGPP's
FPU instructions?  If you would like more information, I will mail you a
piece of the actual code.



						---Michael Phelps
						   morphine AT cs DOT jhu DOT edu


                               CH3
                               |
                               N
                             / |
                     ______/   |
                    /      \   CH2
             _____/         \__|__      
           //     \\        /  |  \\     
         //        \\______/___CH2 \\  
          \        /       \       /
           \______/         \_____/
          / ------ \       /      \
        OH           \   /         OH
                       O
 
                   Morphine


	

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019