X-Authentication-Warning: acp3bf.physik.rwth-aachen.de: broeker owned process doing -bs Date: Tue, 6 Feb 2001 21:58:22 +0100 (MET) From: Hans-Bernhard Broeker X-Sender: broeker AT acp3bf To: Eli Zaretskii cc: djgpp AT delorie DOT com Subject: Re: Function and File ordering and speed optimization In-Reply-To: <6480-Tue06Feb2001223313+0200-eliz@is.elta.co.il> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Reply-To: djgpp AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: djgpp AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk On Tue, 6 Feb 2001, Eli Zaretskii wrote: > > From: Hans-Bernhard Broeker [...] > > I don't think it makes any sense to even try. Except for situation > > which other compiler optimizations already should have taken care of, > > the relative position of functions in address space doesn't make any > > difference at all on x86 processors in 32 bit flat mode and using GCC. > > ??? One thing I'd do is to group functions which are called many > times together. This would maximize the probability that they are in > the L1 cache most of the time. The things you might be missing would be that the L1 cache is a dynamic beast, and that the chunks it caches as one piece of memory ("cache lines") are small compared to the average size of an average function. I.e. you'll hardly ever fit two or more functions into a single cache line. And even if you can, I doubt the effect is going to be measurable. Plus it would probably be a more promising idea to ensure these very small functions are inlined or optimized out of existence. For any pair of functions that call each other frequently enough to matter, they'll sooner or later both be in the L1 cache, anyway. Keeping the functions small or removing them completely is more important than their absolute positions, in that case. It's all a balance between subroutine call overhead and the size of the "active code set", at any given period of time. The original motivation for the function ordering offered by gprof, IIRC, is for processors where the cost of a jump varies strongly with its distance. In segmented x86 operation modes, e.g., it could pay off to reorder functions if it allowed short jumps and calls instead of far ones. -- Hans-Bernhard Broeker (broeker AT physik DOT rwth-aachen DOT de) Even if all the snow were burnt, ashes would remain.