X-Authentication-Warning: sal.physics.ucsb.edu: dwhysong owned process doing -bs Date: Thu, 13 May 1999 18:19:43 -0700 (PDT) From: David Whysong To: Marc Lehmann cc: pgcc AT delorie DOT com Subject: Common subexpression elimination In-Reply-To: <19990511004113.N22062@cerebro.laendle> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Reply-To: pgcc AT delorie DOT com X-Mailing-List: pgcc AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk On Tue, 11 May 1999, Marc Lehmann wrote: >Have you benchmarked it? fmuls are not really slower than fadds, for >example. However, if you can manage to get a (sensibly sized ;) example >of where cse fails I'd be happy to look into it! Did you see my previous example? If it was too large, here is a small one. Here's a simple test function: inline double work(double A1, double A2, double A3, double A4) { return(A1*A2*A3 + A1*A3*A4 + A2*A3*A4); } There are 8 arithmetic operations, 6 multiplies and two adds. Now here is the same function, with manual subexpression elimination: inline double fast(double A1, double A2, double A3, double A4) { register double tmp=A3*A4; return( A1*(A2*A3+tmp) + A2*tmp); } There are only 5 arithmetic operations, 3 multiplies and 2 adds. Take a look at the assembly produced by pgcc-1.1.3: /local/bin/gcc -O6 -Wall -mcpu=pentiumpro -march=pentiumpro -mpentiumpro -ffast-math -mno-ieee-fp -fschedule-insns -fstrength-reduce -fomit-frame-pointer -malign-double -mstack-align-double -funroll-loops -funroll-all-loops test.c -S -o test.s .globl work .type work,@function work: subl $4,%esp fldl 8(%esp) fld %st(0) fldl 24(%esp) fldl 16(%esp) fxch %st(3) fmul %st(1),%st fxch %st(2) fmul %st(3),%st fldl 32(%esp) fxch %st(4) fmul %st(2),%st fxch %st(3) fmul %st(4),%st fxch %st(1) fmulp %st,%st(2) fxch %st(2) fmulp %st,%st(3) faddp %st,%st(1) faddp %st,%st(1) popl %ecx ret .Lfe2: .size work,.Lfe2-work .align 16 .globl fast .type fast,@function fast: subl $4,%esp fldl 16(%esp) fldl 24(%esp) fld %st(1) fld %st(1) fxch %st(1) fmulp %st,%st(2) fmull 32(%esp) fadd %st,%st(1) fmulp %st,%st(2) fmull 8(%esp) faddp %st,%st(1) popl %ecx ret .Lfe3: .size fast,.Lfe3-fast .ident "GCC: (GNU) pgcc-2.91.66 19990314 (egcs-1.1.2 release)" Function "work" has 6 multiplies and 2 adds, while functin "fast" has 4 multiplies and 2 adds (I'm not sure why it has 4 fmul? operations isntead of 3). Benchmarking shows almost no difference in speed, but the functions are so small that I doubt I could measure any difference reliably anyway. As far as I can tell, the compiler does no common subexpression elimination at all. Dave David Whysong dwhysong AT physics DOT ucsb DOT edu Astrophysics graduate student University of California, Santa Barbara My public PGP keys are on my web page - http://www.physics.ucsb.edu/~dwhysong DSS PGP Key 0x903F5BD6 : FE78 91FE 4508 106F 7C88 1706 B792 6995 903F 5BD6 D-H PGP key 0x5DAB0F91 : BC33 0F36 FCCD E72C 441F 663A 72ED 7FB7 5DAB 0F91