Mail Archives: pgcc/1999/05/13/21:22:20
On Tue, 11 May 1999, Marc Lehmann wrote:
>Have you benchmarked it? fmuls are not really slower than fadds, for
>example. However, if you can manage to get a (sensibly sized ;) example
>of where cse fails I'd be happy to look into it!
Did you see my previous example? If it was too large, here is a small one.
Here's a simple test function:
inline double work(double A1, double A2, double A3, double A4)
{ return(A1*A2*A3 + A1*A3*A4 + A2*A3*A4); }
There are 8 arithmetic operations, 6 multiplies and two adds.
Now here is the same function, with manual subexpression elimination:
inline double fast(double A1, double A2, double A3, double A4)
{
register double tmp=A3*A4;
return( A1*(A2*A3+tmp) + A2*tmp);
}
There are only 5 arithmetic operations, 3 multiplies and 2 adds.
Take a look at the assembly produced by pgcc-1.1.3:
/local/bin/gcc -O6 -Wall -mcpu=pentiumpro -march=pentiumpro -mpentiumpro
-ffast-math -mno-ieee-fp -fschedule-insns -fstrength-reduce
-fomit-frame-pointer -malign-double -mstack-align-double -funroll-loops
-funroll-all-loops test.c -S -o test.s
.globl work
.type work,@function
work:
subl $4,%esp
fldl 8(%esp)
fld %st(0)
fldl 24(%esp)
fldl 16(%esp)
fxch %st(3)
fmul %st(1),%st
fxch %st(2)
fmul %st(3),%st
fldl 32(%esp)
fxch %st(4)
fmul %st(2),%st
fxch %st(3)
fmul %st(4),%st
fxch %st(1)
fmulp %st,%st(2)
fxch %st(2)
fmulp %st,%st(3)
faddp %st,%st(1)
faddp %st,%st(1)
popl %ecx
ret
.Lfe2:
.size work,.Lfe2-work
.align 16
.globl fast
.type fast,@function
fast:
subl $4,%esp
fldl 16(%esp)
fldl 24(%esp)
fld %st(1)
fld %st(1)
fxch %st(1)
fmulp %st,%st(2)
fmull 32(%esp)
fadd %st,%st(1)
fmulp %st,%st(2)
fmull 8(%esp)
faddp %st,%st(1)
popl %ecx
ret
.Lfe3:
.size fast,.Lfe3-fast
.ident "GCC: (GNU) pgcc-2.91.66 19990314 (egcs-1.1.2 release)"
Function "work" has 6 multiplies and 2 adds, while functin "fast" has 4
multiplies and 2 adds (I'm not sure why it has 4 fmul? operations isntead
of 3). Benchmarking shows almost no difference in speed, but the functions
are so small that I doubt I could measure any difference reliably anyway.
As far as I can tell, the compiler does no common subexpression
elimination at all.
Dave
David Whysong dwhysong AT physics DOT ucsb DOT edu
Astrophysics graduate student University of California, Santa Barbara
My public PGP keys are on my web page - http://www.physics.ucsb.edu/~dwhysong
DSS PGP Key 0x903F5BD6 : FE78 91FE 4508 106F 7C88 1706 B792 6995 903F 5BD6
D-H PGP key 0x5DAB0F91 : BC33 0F36 FCCD E72C 441F 663A 72ED 7FB7 5DAB 0F91
- Raw text -