delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1997/02/02/11:18:31

From: ovek AT arcticnet DOT no (Ove Kaaven)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: fixed point math: NEED HELP QUICK!
Date: Sun, 02 Feb 1997 08:44:50 GMT
Organization: Vplan Programvare AS
Lines: 111
Message-ID: <5d2as2$d48$1@troll.powertech.no>
References: <Pine DOT LNX DOT 3 DOT 95 DOT 970201053600 DOT 140d-100000 AT capslock DOT com> <19970201 DOT 123135 DOT 4847 DOT 3 DOT chambersb AT juno DOT com>
NNTP-Posting-Host: alwayscold.darkness.arcticnet.no
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

chambersb AT juno DOT com (Benjamin D Chambers) wrote:

>On Sat, 1 Feb 1997 05:50:56 -0500 (EST) mharris AT blackwidow DOT saultc DOT on DOT ca
>writes:
>>Write a program which demonstrates that using shifts/adds is more
>>efficient than using mul/divs for the majority of integer
>>multipliers/divisors between -32768 and +32767 (signed) and 0 and
>>65535 unsigned.
>Well, I'll give it a try :)
>Just for the record, I spent a few minutes on that example working out
>the ASM code, but for this I'll just do the mul's and shifts
>incrementally in ASM and do them again in C (the optimizer should switch
>them to shifts/muls).  If I have to, I WILL NOT write out 65536 shifting
>routines to prove my point - I don't have that kind of time ;)  But, I'll
>post the code when I get the results...

Need help?
After trying this out, I no longer have any doubt that gcc can make
shifting faster than mul's, it will employ lea, sal, add, and sub in
pretty ingenious ways (*never* resorts to mul). On my Pentium, I found
that the multiplication with shift versions was almost double as fast
on average as mul versions (at least the range I checked). Look at the
generated mult.s and be impressed by gcc's optimization prowess. I
don't have much time for further testing, so if you want to conduct
more extensive tests, modify the generator and test code below as
appropriate, provided you have enough RAM and stack (which I didn't
for the entire 64K range). This also shows the power of preprocessors.

Makefile:
---------
all: mult.exe

CFLAGS=-Wall -O2 -m486

mult.exe: mult.o
        gcc -s -o mult.exe mult.o

mult.o: mult.s
        gcc -c -o mult.o mult.s

mult.s: nums.h mult.c
        gcc -c -S $(CFLAGS) -o mult.s mult.c

nums.h: gen.exe
        gen

gen.exe: gen.o
        gcc -s -o gen.exe gen.o

gen.c:
------
#include <stdio.h>

int main()
{
 FILE*Out;
 unsigned Cnt;

 Out=fopen("nums.h","w");
 for (Cnt=0; Cnt<1024; Cnt++) {
  fprintf(Out,"MKMULT(Mult%d,%d)\n",Cnt,Cnt);
 }
 fclose(Out);
 return(0);
}

mult.c:
-------
#include <stdio.h>
#include <time.h>

#define MKMULT(x,y) \
 unsigned x(unsigned Val,unsigned dummy) \
 { return(Val*y); }

unsigned MultX(unsigned Val,unsigned X)
{ return (Val*X); }

#include "nums.h"

void TestIn(unsigned Val)
{
#undef MKMULT
#define MKMULT(x,y) x(Val,y);
#include "nums.h"
}

void TestMul(unsigned Val)
{
#undef MKMULT
#define MKMULT(x,y) MultX(Val,y);
#include "nums.h"
}

int main()
{
 unsigned Val=37,Cnt;
 clock_t InClk,MulClk;

 clock();
 for (Cnt=0; Cnt<4096; Cnt++)
  TestIn(Val);
 InClk=clock();
 for (Cnt=0; Cnt<4096; Cnt++)
  TestMul(Val);
 MulClk=clock();
 printf("Shift clock: %d  Multiply clock: %d\n",InClk,MulClk);
 return(0);
}


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019