From: ams AT ludd DOT luth DOT se (Martin Str|mberg) Newsgroups: comp.os.msdos.djgpp Subject: Re: gcc optimizes divisons! Date: 25 Sep 1999 17:34:40 GMT Organization: University of Lulea, Sweden Lines: 76 Message-ID: <7sj13g$3k0$1@news.luth.se> References: <37e37095 AT news DOT xenologics DOT com> NNTP-Posting-Host: queeg.ludd.luth.se X-Newsreader: TIN [UNIX 1.3 950824BETA PL0] To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com Oliver Roese (oroese AT edina DOT xnc DOT com) wrote: : Yesterday i found out by accident, that gcc knows how to divide by constants. : This is a optimization technique, mentioned by several experts (http://www.agner.org/ or : http://www.azillionmonkeys.com/qed/index .html). If you know about this, i am sure you are : interested to hear, that gcc 2.95 can do that for you. : Example: : #include : int Divc (int u) : { return u / 3;} // u%3 would work : too. : : int main(void) : { : int u = (int)&u; : printf ("%d",Div3(u)); : return 0; : } : : compile: gcc -S : I have seen that gcc sometimes inlines short functions, but it doesent do that here, even with -O6. : Anyway here is the result: : ... : .globl _Divc : _Divc: : pushl %ebp : movl %esp,%ebp : movl $1431655766,%edx : movl 8(%ebp),%ecx : movl %ecx,%eax : imull %edx : sarl $31,%ecx : movl %ebp,%esp : popl %ebp : subl %ecx,%edx : movl %edx,%eax : ret : ... : I dont tried to decipher it. : If you declare u as unsigned, you get this: : : LC2: : .long 0xaaaaaaab : .p2align 2 : .globl _Divc Things are becoming clear now. Note that 0xaaaaaaab =~ 2*1431655766. So it seems the signed version is using the same technique as the unsigned one, but for 31 bits instead of 32. Then it adjusts the result with 1 if the operator is negative as the C operator / truncates towards 0 (while the optimisation technique truncates towards -infinity). : _Divc: : pushl %ebp : movl %esp,%ebp : movl 8(%ebp),%eax : mull LC2 : movl %ebp,%esp : popl %ebp : shrl $1,%edx : movl %edx,%eax : ret : : We learn from that: : floor(x/3) = floor((0xaaaaaaab*x)/2^33) if 0<=x<2^32 : Trickery is simpler here, but suffers from "mull", which is not as optimized as "imull". : Which case is faster? I wonder if the same results would be obatined if imull was used instead of mull. If so and if imull is faster than mull, perhaps a possible improvement? Right, MartinS