Message-ID: <32B53FBE.1B0A@pobox.oleane.com> Date: Mon, 16 Dec 1996 13:25:34 +0100 From: Francois Charton Organization: CCMSA MIME-Version: 1.0 To: murray AT southeast DOT net CC: djgpp AT delorie DOT com Subject: Re: math optimization References: <32b26866 DOT 241305048 AT nntp DOT southeast DOT net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Murray Stokely wrote: > > This is a little code snippet from a vesa lens effect I coded, > The diameter and magnification of the lens are adjustable in realtime > via the +/- keys so I need the lens calculating function as fast as > possible. Someone gave me a good tip with my sqrt problem earlier in > using the difference of squares to take out one of the multiplies but > I'd like to take it a step further. This routine is actualy faster > than I expected it would be on my 486dx4, but still every clock cycle > counts ;) I know theres lots of room for improvement, so any tips > would be appreciated. > Here are a few ideas (which should make your code a bit faster)... void calculate_tfm(int diameter, char magnification) { int a,b,x,y,z,s; int y2,x2y2; int radius,rad2; radius=diameter/2; /* this is used a lot, let's precalculate */ rad2=radius*radius; /* save a sqrt() */ s=abs(rad2 - (magnification * magnification)); /* use square symetry : 4 times less calculations */ y2=0; for(y=0;y= s) { /* this can be improved : for x or y = 0 some values are calculated too many times. maybe also the condition x2y2>=s can be taken out of the loop */ tfm[(y+radius)*diameter+(x+radius)]=(y+radius)*diameter+(x+radius); tfm[(-y+radius)*diameter+(x+radius)]=(-y+radius)*diameter+(x+radius); tfm[(y+radius)*diameter+(-x+radius)]=(y+radius)*diameter+(-x+radius); tfm[(-y+radius)*diameter+(-x+radius)]=(-y+radius)*diameter+(-x+radius); } else { z=round(sqrt(rad2-x2y2)); a=round(x*magnification/z); b=round(y*magnification/z); tfm[(y+radius)*diameter+(x+radius)]=(b+radius)*diameter+(a+radius); tfm[(-y+radius)*diameter+(x+radius)]=(-b+radius)*diameter+(a+radius); tfm[(y+radius)*diameter+(-x+radius)]=(b+radius)*diameter+(-a+radius); tfm[(-y+radius)*diameter+(-x+radius)]=(-b+radius)*diameter+(-a+radius); tfm[(y+radius)*diameter+(x+radius)]=(b+radius)*diameter+(a+radius); } } // end of for x } // end of for y } > ( I'll eventualy convert all doubles/floats to 16.16 fixed point, so > skip that obvious MAJOR optimization ) Do you really need 16.16 : this is low on integer part : especially if you have to compute squares, multiplies and things like that... Isn't 22.10 enough (that's three decimal places), or even 25.7 (two decimal places). Francois