delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2000/10/19/19:15:26

From: Hans-Bernhard Broeker <broeker AT physik DOT rwth-aachen DOT de>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Performance 2.7.2 -> 2.8.1
Date: 19 Oct 2000 15:10:32 GMT
Organization: Aachen University of Technology (RWTH)
Lines: 36
Message-ID: <8sn2t8$q6d$1@nets3.rz.RWTH-Aachen.DE>
References: <8smgof$l4f$1 AT nnrp1 DOT deja DOT com> <971961901 DOT 585069 AT shelley DOT paradise DOT net DOT nz> <8sn0t7$1nd$1 AT nnrp1 DOT deja DOT com>
NNTP-Posting-Host: acp3bf.physik.rwth-aachen.de
X-Trace: nets3.rz.RWTH-Aachen.DE 971968232 26829 137.226.32.75 (19 Oct 2000 15:10:32 GMT)
X-Complaints-To: abuse AT rwth-aachen DOT de
NNTP-Posting-Date: 19 Oct 2000 15:10:32 GMT
Originator: broeker@
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

gdemont AT my-deja DOT com wrote:

> Some of the programs I have under hand do run slower.

*How much* slower? It's really hard to speak about optimization
differences if their size is not known. Is it: a percent slower?
Fifty?  Taking 5 times as long?

And what's that code doing, in the first place?

> Target is a Pentium-S 166Mhz.
> The options are
> * For gcc 2.7.2 (gnat 3.10):
>  -i -gnatpn -O2 -fomit-frame-pointer -funroll-loops
>  -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2

'-funroll-loops' hardly does you any good, on any x86 type machine. It
increases the code size, and as soon as the amount of code being
looped over (the 'active set') goes beyond the size of the 1st level
Cache, you'll receive a noticeable performance degradation. Same as
you cross other size barriers. Pentium-class machines are good enough
at branching (and branch prediction, in particular) that unrolling
loops doesn't gain you terribly much, anyway, before the cache
coherency loss strikes back.

For further exploration, a look at compiler input (source) and output
(assembly) would be necessary. And of course some profiling to see
where the code spends the majority of its time, in the first place, so
the scrutiny can be limited in scope.

The version of the DJGPP runtime and binutils also can play an
important role. Fully correct alignment of the code on 32byte
boundaries helps, but it took us some iterations to get it right.
-- 
Hans-Bernhard Broeker (broeker AT physik DOT rwth-aachen DOT de)
Even if all the snow were burnt, ashes would remain.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019