delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/1999/05/10/08:09:19

Date: Mon, 10 May 1999 13:07:16 +0100 (BST)
From: "Dr H. T. Leung" <htl10 AT cus DOT cam DOT ac DOT uk>
To: David Whysong <dwhysong AT physics DOT ucsb DOT edu>
cc: pgcc AT delorie DOT com
Subject: Re: Optimization question
In-Reply-To: <Pine.LNX.4.04.9905100038010.12328-100000@sal.physics.ucsb.edu>
Message-ID: <Pine.SOL.3.96.990510125208.12696D-100000@ursa.cus.cam.ac.uk>
MIME-Version: 1.0
Reply-To: pgcc AT delorie DOT com
X-Mailing-List: pgcc AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com

If you had read the mailing list archive, it is terribly unfair to people
on the list to cc you replies when you are not on the list. If you want to
ask a question, subscribe, read on for a while, then post (then maybe
unsubscribe). 

Your codes are very badly written (probably generated by automatic
symbolic algebra packages like reduce/mathematica?). An optimized compiler
can't help you much if your codes doesn't let itself be optimized. For a
very simple example, You were doing multipication "Pv1*v01" 4 times; that
means retrieves 2 values from memory, multiply, store it back, done 4
times. Whereas you could have defined a new variable "Pv1_v01 = Pv1*v01",
then it is two retrievals, multiplication once, storage once, then
retrieval 4 times. You would be much better off spending some time
defining new variables like this and break down your calculation so that
it doesn't re-do little multiplications like this; pgcc can only bring you
10-20% speed improvement, but if speed is really important for you, you
should rewrite your codes as detailed above so that simple operations are
not repeated. But of course, don't over-do it - storage and retrival from
memory to the cache also takes time.

Good luck and have fun rewriting the codes. Relativity is not meant to be
that terrible algebra-wise. (yes, I am a physicist...)

On Mon, 10 May 1999, David Whysong wrote:

> 
> Hi,
> 
> I'm not on the mailing list so please cc me with any replies.
> 
> I have a 3300 line source file that calculates 24 double precision vector
> coefficients in only 24 statements. Yes, it's ugly :-) but it's necessary.
> I'm using pgcc 1.1.3, compiling with:
> 
> /usr/local/bin/gcc dynamics_exact.c -c -O6 -mcpu=pentiumpro -Wall
> -ffast-math -mstack-align-double -mpentiumpro -mieee-fp -fstrength-reduce
> -malign-double -funroll-loops -funroll-all-loops -fomit-frame-pointer
> -DNDEBUG -DARCRANDOM -DDOUBLEPREC -I../..//include
> 
> My question is, how well does the compiler optimize code that looks like
> this:
> 
> inline void Q_k(fourvector u, fourvector du, vector xo, vector vo,
> 		vector px, vector pv, coeffmatrix *Q)
> {
> 
> fourvector Asq={u[0]*u[0],u[1]*u[1],u[2]*u[2],u[3]*u[3]};
> fourvector Acu={Asq[0]*u[0],Asq[1]*u[1],Asq[2]*u[2],Asq[3]*u[3]};
> fourvector Afo={Asq[0]*Asq[0],Asq[1]*Asq[1],Asq[2]*Asq[2],Asq[3]*Asq[3]};
> fourvector Bsq={du[0]*du[0],du[1]*du[1],du[2]*du[2],du[3]*du[3]};
> fourvector Bcu={Bsq[0]*du[0],Bsq[1]*du[1],Bsq[2]*du[2],Bsq[3]*du[3]};
> fourvector Bfo={Bsq[0]*Bsq[0],Bsq[1]*Bsq[1],Bsq[2]*Bsq[2],Bsq[3]*Bsq[3]};
> 
> *Q[0][0]=(((Asq[0] + Asq[1] + Asq[2] + Asq[3])* (Acu[0]*(Pv1*v01 +
> Px1*x01) - Acu[1]*(Pv2*v01 + Px2*x01) - A3*(Asq[2]*Pv3*v01 -
> Asq[3]*Pv3*v01 + 2*A3*A4*Pv3*v02 + Asq[2]*Px3*x01 - Asq[3]*Px3*x01 +
> 2*A3*A4*Px3*x02) - Asq[1]*(A3*Pv3*v01 - 2*A4*Pv2*v03 + A3*Px3*x01 -
> 2*A4*Px2*x03) + A2*(-(Asq[2]*Pv2*v01) + Asq[3]*Pv2*v01 - 2*A3*A4*Pv2*v02 +
> 2*A3*A4*Pv3*v03 - Asq[2]*Px2*x01 + Asq[3]*Px2*x01 - 2*A3*A4*Px2*x02 +
> 2*A3*A4*Px3*x03) + Asq[0]*(A2*(Pv2*v01 + 2*Pv1*v02 + Px2*x01 + 2*Px1*x02)
> + A3*(Pv3*v01 + 2*Pv1*v03 + Px3*x01 + 2*Px1*x03)) + A1*(-(Asq[2]*Pv1*v01)
> + Asq[3]*Pv1*v01 - 2*A3*A4*Pv1*v02 + 2*Asq[2]*Pv3*v03 - Asq[2]*Px1*x01 +
> Asq[3]*Px1*x01 - 2*A3*A4*Px1*x02 - Asq[1]*(Pv1*v01 - 2*Pv2*v02 + Px1*x01 -
> 2*Px2*x02) + 2*Asq[2]*Px3*x03 + 2*A2*(A3*Pv3*v02 + A4*Pv1*v03 + A3*Pv2*v03
> + A3*Px3*x02 + A4*Px1*x03 + A3*Px2*x03))))/2);
> 
> ...and so on. Actually this is by far the smallest statement in the file.
> Can I do anything so that the compiler produces a faster binary? I have
> already algebraically simplified each term. This is very time-critical
> code, unfortunately it runs in an inner loop of my n-body gravitational
> simulation...
> 
> Dave
> 
> David Whysong                                       dwhysong AT physics DOT ucsb DOT edu
> Astrophysics graduate student         University of California, Santa Barbara
> My public PGP keys are on my web page - http://www.physics.ucsb.edu/~dwhysong
> DSS PGP Key 0x903F5BD6  :  FE78 91FE 4508 106F 7C88  1706 B792 6995 903F 5BD6
> D-H PGP key 0x5DAB0F91  :  BC33 0F36 FCCD E72C 441F  663A 72ED 7FB7 5DAB 0F91

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019