X-Authentication-Warning: acp3bf.physik.rwth-aachen.de: broeker owned process doing -bs
Date: Tue, 6 Feb 2001 21:58:22 +0100 (MET)
From: Hans-Bernhard Broeker <broeker AT physik DOT rwth-aachen DOT de>
X-Sender: broeker AT acp3bf
To: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
cc: djgpp AT delorie DOT com
Subject: Re: Function and File ordering and speed optimization
In-Reply-To: <6480-Tue06Feb2001223313+0200-eliz@is.elta.co.il>
Message-ID: <Pine.LNX.4.10.10102062142250.3286-100000@acp3bf>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Reply-To: djgpp AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: djgpp AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com
Precedence: bulk

On Tue, 6 Feb 2001, Eli Zaretskii wrote:
> > From: Hans-Bernhard Broeker <broeker AT physik DOT rwth-aachen DOT de>
[...]

> > I don't think it makes any sense to even try. Except for situation
> > which other compiler optimizations already should have taken care of,
> > the relative position of functions in address space doesn't make any
> > difference at all on x86 processors in 32 bit flat mode and using GCC.
> 
> ???  One thing I'd do is to group functions which are called many
> times together.  This would maximize the probability that they are in
> the L1 cache most of the time.

The things you might be missing would be that the L1 cache is a dynamic
beast, and that the chunks it caches as one piece of memory ("cache
lines") are small compared to the average size of an average function.  
I.e. you'll hardly ever fit two or more functions into a single cache
line. And even if you can, I doubt the effect is going to be measurable.
Plus it would probably be a more promising idea to ensure these very small
functions are inlined or optimized out of existence.

For any pair of functions that call each other frequently enough to
matter, they'll sooner or later both be in the L1 cache, anyway. Keeping
the functions small or removing them completely is more important than
their absolute positions, in that case.

It's all a balance between subroutine call overhead and the size of the
"active code set", at any given period of time.

The original motivation for the function ordering offered by gprof, IIRC,
is for processors where the cost of a jump varies strongly with its
distance. In segmented x86 operation modes, e.g., it could pay off to
reorder functions if it allowed short jumps and calls instead of far ones.

-- 
Hans-Bernhard Broeker (broeker AT physik DOT rwth-aachen DOT de)
Even if all the snow were burnt, ashes would remain.