Date: Wed, 7 Feb 2001 08:40:39 +0200 (IST)
From: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
X-Sender: eliz AT is
To: Hans-Bernhard Broeker <broeker AT physik DOT rwth-aachen DOT de>
cc: djgpp AT delorie DOT com
Subject: Re: Function and File ordering and speed optimization
In-Reply-To: <Pine.LNX.4.10.10102062142250.3286-100000@acp3bf>
Message-ID: <Pine.SUN.3.91.1010207083723.2148M-100000@is>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Reply-To: djgpp AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: djgpp AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com
Precedence: bulk


On Tue, 6 Feb 2001, Hans-Bernhard Broeker wrote:

> > ???  One thing I'd do is to group functions which are called many
> > times together.  This would maximize the probability that they are in
> > the L1 cache most of the time.
> 
> The things you might be missing would be that the L1 cache is a dynamic
> beast, and that the chunks it caches as one piece of memory ("cache
> lines") are small compared to the average size of an average function.  
> I.e. you'll hardly ever fit two or more functions into a single cache
> line.

Sorry, the cache is indeed not the issue.  But the resident set in a 
virtual-memory environment, especially on Windows, _is_ and issue, 
because 4KB, the size of a page, can hold quite a lot of code.

Keeping frequently-used code together improves the locality of the code, 
exactly as referencing a large array in order improves the locality of 
data.  When your program or your OS pages, this really makes a 
difference.

> The original motivation for the function ordering offered by gprof, IIRC,
> is for processors where the cost of a jump varies strongly with its
> distance. In segmented x86 operation modes, e.g., it could pay off to
> reorder functions if it allowed short jumps and calls instead of far ones.

Does this hold on IA64 as well?