delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1999/09/25/15:26:22

From: ams AT ludd DOT luth DOT se (Martin Str|mberg)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: gcc optimizes divisons!
Date: 25 Sep 1999 17:34:40 GMT
Organization: University of Lulea, Sweden
Lines: 76
Message-ID: <7sj13g$3k0$1@news.luth.se>
References: <37e37095 AT news DOT xenologics DOT com>
NNTP-Posting-Host: queeg.ludd.luth.se
X-Newsreader: TIN [UNIX 1.3 950824BETA PL0]
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

Oliver Roese (oroese AT edina DOT xnc DOT com) wrote:
: Yesterday i found out by accident, that gcc knows how to divide by constants.
: This is a optimization technique, mentioned by several experts (http://www.agner.org/ or
: http://www.azillionmonkeys.com/qed/index .html). If you know about this, i am sure you are
: interested to hear, that gcc 2.95 can do that for you.
: Example:
: #include <stdio.h>
: int Divc (int u)
: { return u / 3;}    // u%3 would work
: too.
: 
: int main(void)
: {
:  int u = (int)&u;
:  printf ("%d",Div3(u));
:  return 0;
: }
: 
: compile: gcc -S <filename>
: I have seen that gcc sometimes inlines short functions, but it doesent do that here, even with -O6.
: Anyway here is the result:
: ...
: .globl _Divc
: _Divc:
:  pushl %ebp
:  movl %esp,%ebp
:  movl $1431655766,%edx
:  movl 8(%ebp),%ecx
:  movl %ecx,%eax
:  imull %edx
:  sarl $31,%ecx
:  movl %ebp,%esp
:  popl %ebp
:  subl %ecx,%edx
:  movl %edx,%eax
:  ret
: ...
: I dont tried to decipher it.
: If you declare u as unsigned, you get this:
: 
: LC2:
:  .long 0xaaaaaaab
:  .p2align 2
: .globl _Divc

Things are becoming clear now. Note that 0xaaaaaaab =~ 2*1431655766.
So it seems the signed version is using the same technique as the
unsigned one, but for 31 bits instead of 32. Then it adjusts the
result with 1 if the operator is negative as the C operator /
truncates towards 0 (while the optimisation technique truncates
towards -infinity).

: _Divc:
:  pushl %ebp
:  movl %esp,%ebp
:  movl 8(%ebp),%eax
:  mull LC2
:  movl %ebp,%esp
:  popl %ebp
:  shrl $1,%edx
:  movl %edx,%eax
:  ret
: 
: We learn from that:
: floor(x/3) = floor((0xaaaaaaab*x)/2^33)  if 0<=x<2^32
: Trickery is simpler here, but suffers from "mull", which is not as optimized as "imull".
: Which case is faster?

I wonder if the same results would be obatined if imull was used
instead of mull. If so and if imull is faster than mull, perhaps a
possible improvement?


Right,

							MartinS

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019