X-Authentication-Warning: sal.physics.ucsb.edu: dwhysong owned process doing -bs
Date: Mon, 10 May 1999 00:50:56 -0700 (PDT)
From: David Whysong <dwhysong AT physics DOT ucsb DOT edu>
To: pgcc AT delorie DOT com
Subject: Optimization question
Message-ID: <Pine.LNX.4.04.9905100038010.12328-100000@sal.physics.ucsb.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Reply-To: pgcc AT delorie DOT com


Hi,

I'm not on the mailing list so please cc me with any replies.

I have a 3300 line source file that calculates 24 double precision vector
coefficients in only 24 statements. Yes, it's ugly :-) but it's necessary.
I'm using pgcc 1.1.3, compiling with:

/usr/local/bin/gcc dynamics_exact.c -c -O6 -mcpu=pentiumpro -Wall
-ffast-math -mstack-align-double -mpentiumpro -mieee-fp -fstrength-reduce
-malign-double -funroll-loops -funroll-all-loops -fomit-frame-pointer
-DNDEBUG -DARCRANDOM -DDOUBLEPREC -I../..//include

My question is, how well does the compiler optimize code that looks like
this:

inline void Q_k(fourvector u, fourvector du, vector xo, vector vo,
		vector px, vector pv, coeffmatrix *Q)
{

fourvector Asq={u[0]*u[0],u[1]*u[1],u[2]*u[2],u[3]*u[3]};
fourvector Acu={Asq[0]*u[0],Asq[1]*u[1],Asq[2]*u[2],Asq[3]*u[3]};
fourvector Afo={Asq[0]*Asq[0],Asq[1]*Asq[1],Asq[2]*Asq[2],Asq[3]*Asq[3]};
fourvector Bsq={du[0]*du[0],du[1]*du[1],du[2]*du[2],du[3]*du[3]};
fourvector Bcu={Bsq[0]*du[0],Bsq[1]*du[1],Bsq[2]*du[2],Bsq[3]*du[3]};
fourvector Bfo={Bsq[0]*Bsq[0],Bsq[1]*Bsq[1],Bsq[2]*Bsq[2],Bsq[3]*Bsq[3]};

*Q[0][0]=(((Asq[0] + Asq[1] + Asq[2] + Asq[3])* (Acu[0]*(Pv1*v01 +
Px1*x01) - Acu[1]*(Pv2*v01 + Px2*x01) - A3*(Asq[2]*Pv3*v01 -
Asq[3]*Pv3*v01 + 2*A3*A4*Pv3*v02 + Asq[2]*Px3*x01 - Asq[3]*Px3*x01 +
2*A3*A4*Px3*x02) - Asq[1]*(A3*Pv3*v01 - 2*A4*Pv2*v03 + A3*Px3*x01 -
2*A4*Px2*x03) + A2*(-(Asq[2]*Pv2*v01) + Asq[3]*Pv2*v01 - 2*A3*A4*Pv2*v02 +
2*A3*A4*Pv3*v03 - Asq[2]*Px2*x01 + Asq[3]*Px2*x01 - 2*A3*A4*Px2*x02 +
2*A3*A4*Px3*x03) + Asq[0]*(A2*(Pv2*v01 + 2*Pv1*v02 + Px2*x01 + 2*Px1*x02)
+ A3*(Pv3*v01 + 2*Pv1*v03 + Px3*x01 + 2*Px1*x03)) + A1*(-(Asq[2]*Pv1*v01)
+ Asq[3]*Pv1*v01 - 2*A3*A4*Pv1*v02 + 2*Asq[2]*Pv3*v03 - Asq[2]*Px1*x01 +
Asq[3]*Px1*x01 - 2*A3*A4*Px1*x02 - Asq[1]*(Pv1*v01 - 2*Pv2*v02 + Px1*x01 -
2*Px2*x02) + 2*Asq[2]*Px3*x03 + 2*A2*(A3*Pv3*v02 + A4*Pv1*v03 + A3*Pv2*v03
+ A3*Px3*x02 + A4*Px1*x03 + A3*Px2*x03))))/2);

...and so on. Actually this is by far the smallest statement in the file.
Can I do anything so that the compiler produces a faster binary? I have
already algebraically simplified each term. This is very time-critical
code, unfortunately it runs in an inner loop of my n-body gravitational
simulation...

Dave

David Whysong                                       dwhysong AT physics DOT ucsb DOT edu
Astrophysics graduate student         University of California, Santa Barbara
My public PGP keys are on my web page - http://www.physics.ucsb.edu/~dwhysong
DSS PGP Key 0x903F5BD6  :  FE78 91FE 4508 106F 7C88  1706 B792 6995 903F 5BD6
D-H PGP key 0x5DAB0F91  :  BC33 0F36 FCCD E72C 441F  663A 72ED 7FB7 5DAB 0F91