Date: Mon, 10 May 1999 13:07:16 +0100 (BST) From: "Dr H. T. Leung" To: David Whysong cc: pgcc AT delorie DOT com Subject: Re: Optimization question In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Reply-To: pgcc AT delorie DOT com X-Mailing-List: pgcc AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk If you had read the mailing list archive, it is terribly unfair to people on the list to cc you replies when you are not on the list. If you want to ask a question, subscribe, read on for a while, then post (then maybe unsubscribe). Your codes are very badly written (probably generated by automatic symbolic algebra packages like reduce/mathematica?). An optimized compiler can't help you much if your codes doesn't let itself be optimized. For a very simple example, You were doing multipication "Pv1*v01" 4 times; that means retrieves 2 values from memory, multiply, store it back, done 4 times. Whereas you could have defined a new variable "Pv1_v01 = Pv1*v01", then it is two retrievals, multiplication once, storage once, then retrieval 4 times. You would be much better off spending some time defining new variables like this and break down your calculation so that it doesn't re-do little multiplications like this; pgcc can only bring you 10-20% speed improvement, but if speed is really important for you, you should rewrite your codes as detailed above so that simple operations are not repeated. But of course, don't over-do it - storage and retrival from memory to the cache also takes time. Good luck and have fun rewriting the codes. Relativity is not meant to be that terrible algebra-wise. (yes, I am a physicist...) On Mon, 10 May 1999, David Whysong wrote: > > Hi, > > I'm not on the mailing list so please cc me with any replies. > > I have a 3300 line source file that calculates 24 double precision vector > coefficients in only 24 statements. Yes, it's ugly :-) but it's necessary. > I'm using pgcc 1.1.3, compiling with: > > /usr/local/bin/gcc dynamics_exact.c -c -O6 -mcpu=pentiumpro -Wall > -ffast-math -mstack-align-double -mpentiumpro -mieee-fp -fstrength-reduce > -malign-double -funroll-loops -funroll-all-loops -fomit-frame-pointer > -DNDEBUG -DARCRANDOM -DDOUBLEPREC -I../..//include > > My question is, how well does the compiler optimize code that looks like > this: > > inline void Q_k(fourvector u, fourvector du, vector xo, vector vo, > vector px, vector pv, coeffmatrix *Q) > { > > fourvector Asq={u[0]*u[0],u[1]*u[1],u[2]*u[2],u[3]*u[3]}; > fourvector Acu={Asq[0]*u[0],Asq[1]*u[1],Asq[2]*u[2],Asq[3]*u[3]}; > fourvector Afo={Asq[0]*Asq[0],Asq[1]*Asq[1],Asq[2]*Asq[2],Asq[3]*Asq[3]}; > fourvector Bsq={du[0]*du[0],du[1]*du[1],du[2]*du[2],du[3]*du[3]}; > fourvector Bcu={Bsq[0]*du[0],Bsq[1]*du[1],Bsq[2]*du[2],Bsq[3]*du[3]}; > fourvector Bfo={Bsq[0]*Bsq[0],Bsq[1]*Bsq[1],Bsq[2]*Bsq[2],Bsq[3]*Bsq[3]}; > > *Q[0][0]=(((Asq[0] + Asq[1] + Asq[2] + Asq[3])* (Acu[0]*(Pv1*v01 + > Px1*x01) - Acu[1]*(Pv2*v01 + Px2*x01) - A3*(Asq[2]*Pv3*v01 - > Asq[3]*Pv3*v01 + 2*A3*A4*Pv3*v02 + Asq[2]*Px3*x01 - Asq[3]*Px3*x01 + > 2*A3*A4*Px3*x02) - Asq[1]*(A3*Pv3*v01 - 2*A4*Pv2*v03 + A3*Px3*x01 - > 2*A4*Px2*x03) + A2*(-(Asq[2]*Pv2*v01) + Asq[3]*Pv2*v01 - 2*A3*A4*Pv2*v02 + > 2*A3*A4*Pv3*v03 - Asq[2]*Px2*x01 + Asq[3]*Px2*x01 - 2*A3*A4*Px2*x02 + > 2*A3*A4*Px3*x03) + Asq[0]*(A2*(Pv2*v01 + 2*Pv1*v02 + Px2*x01 + 2*Px1*x02) > + A3*(Pv3*v01 + 2*Pv1*v03 + Px3*x01 + 2*Px1*x03)) + A1*(-(Asq[2]*Pv1*v01) > + Asq[3]*Pv1*v01 - 2*A3*A4*Pv1*v02 + 2*Asq[2]*Pv3*v03 - Asq[2]*Px1*x01 + > Asq[3]*Px1*x01 - 2*A3*A4*Px1*x02 - Asq[1]*(Pv1*v01 - 2*Pv2*v02 + Px1*x01 - > 2*Px2*x02) + 2*Asq[2]*Px3*x03 + 2*A2*(A3*Pv3*v02 + A4*Pv1*v03 + A3*Pv2*v03 > + A3*Px3*x02 + A4*Px1*x03 + A3*Px2*x03))))/2); > > ...and so on. Actually this is by far the smallest statement in the file. > Can I do anything so that the compiler produces a faster binary? I have > already algebraically simplified each term. This is very time-critical > code, unfortunately it runs in an inner loop of my n-body gravitational > simulation... > > Dave > > David Whysong dwhysong AT physics DOT ucsb DOT edu > Astrophysics graduate student University of California, Santa Barbara > My public PGP keys are on my web page - http://www.physics.ucsb.edu/~dwhysong > DSS PGP Key 0x903F5BD6 : FE78 91FE 4508 106F 7C88 1706 B792 6995 903F 5BD6 > D-H PGP key 0x5DAB0F91 : BC33 0F36 FCCD E72C 441F 663A 72ED 7FB7 5DAB 0F91