X-Authentication-Warning: sal.physics.ucsb.edu: dwhysong owned process doing -bs Date: Mon, 10 May 1999 00:50:56 -0700 (PDT) From: David Whysong To: pgcc AT delorie DOT com Subject: Optimization question Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Reply-To: pgcc AT delorie DOT com Hi, I'm not on the mailing list so please cc me with any replies. I have a 3300 line source file that calculates 24 double precision vector coefficients in only 24 statements. Yes, it's ugly :-) but it's necessary. I'm using pgcc 1.1.3, compiling with: /usr/local/bin/gcc dynamics_exact.c -c -O6 -mcpu=pentiumpro -Wall -ffast-math -mstack-align-double -mpentiumpro -mieee-fp -fstrength-reduce -malign-double -funroll-loops -funroll-all-loops -fomit-frame-pointer -DNDEBUG -DARCRANDOM -DDOUBLEPREC -I../..//include My question is, how well does the compiler optimize code that looks like this: inline void Q_k(fourvector u, fourvector du, vector xo, vector vo, vector px, vector pv, coeffmatrix *Q) { fourvector Asq={u[0]*u[0],u[1]*u[1],u[2]*u[2],u[3]*u[3]}; fourvector Acu={Asq[0]*u[0],Asq[1]*u[1],Asq[2]*u[2],Asq[3]*u[3]}; fourvector Afo={Asq[0]*Asq[0],Asq[1]*Asq[1],Asq[2]*Asq[2],Asq[3]*Asq[3]}; fourvector Bsq={du[0]*du[0],du[1]*du[1],du[2]*du[2],du[3]*du[3]}; fourvector Bcu={Bsq[0]*du[0],Bsq[1]*du[1],Bsq[2]*du[2],Bsq[3]*du[3]}; fourvector Bfo={Bsq[0]*Bsq[0],Bsq[1]*Bsq[1],Bsq[2]*Bsq[2],Bsq[3]*Bsq[3]}; *Q[0][0]=(((Asq[0] + Asq[1] + Asq[2] + Asq[3])* (Acu[0]*(Pv1*v01 + Px1*x01) - Acu[1]*(Pv2*v01 + Px2*x01) - A3*(Asq[2]*Pv3*v01 - Asq[3]*Pv3*v01 + 2*A3*A4*Pv3*v02 + Asq[2]*Px3*x01 - Asq[3]*Px3*x01 + 2*A3*A4*Px3*x02) - Asq[1]*(A3*Pv3*v01 - 2*A4*Pv2*v03 + A3*Px3*x01 - 2*A4*Px2*x03) + A2*(-(Asq[2]*Pv2*v01) + Asq[3]*Pv2*v01 - 2*A3*A4*Pv2*v02 + 2*A3*A4*Pv3*v03 - Asq[2]*Px2*x01 + Asq[3]*Px2*x01 - 2*A3*A4*Px2*x02 + 2*A3*A4*Px3*x03) + Asq[0]*(A2*(Pv2*v01 + 2*Pv1*v02 + Px2*x01 + 2*Px1*x02) + A3*(Pv3*v01 + 2*Pv1*v03 + Px3*x01 + 2*Px1*x03)) + A1*(-(Asq[2]*Pv1*v01) + Asq[3]*Pv1*v01 - 2*A3*A4*Pv1*v02 + 2*Asq[2]*Pv3*v03 - Asq[2]*Px1*x01 + Asq[3]*Px1*x01 - 2*A3*A4*Px1*x02 - Asq[1]*(Pv1*v01 - 2*Pv2*v02 + Px1*x01 - 2*Px2*x02) + 2*Asq[2]*Px3*x03 + 2*A2*(A3*Pv3*v02 + A4*Pv1*v03 + A3*Pv2*v03 + A3*Px3*x02 + A4*Px1*x03 + A3*Px2*x03))))/2); ...and so on. Actually this is by far the smallest statement in the file. Can I do anything so that the compiler produces a faster binary? I have already algebraically simplified each term. This is very time-critical code, unfortunately it runs in an inner loop of my n-body gravitational simulation... Dave David Whysong dwhysong AT physics DOT ucsb DOT edu Astrophysics graduate student University of California, Santa Barbara My public PGP keys are on my web page - http://www.physics.ucsb.edu/~dwhysong DSS PGP Key 0x903F5BD6 : FE78 91FE 4508 106F 7C88 1706 B792 6995 903F 5BD6 D-H PGP key 0x5DAB0F91 : BC33 0F36 FCCD E72C 441F 663A 72ED 7FB7 5DAB 0F91