From: Hans-Bernhard Broeker Newsgroups: comp.os.msdos.djgpp Subject: Re: Performance 2.7.2 -> 2.8.1 Date: 19 Oct 2000 15:10:32 GMT Organization: Aachen University of Technology (RWTH) Lines: 36 Message-ID: <8sn2t8$q6d$1@nets3.rz.RWTH-Aachen.DE> References: <8smgof$l4f$1 AT nnrp1 DOT deja DOT com> <971961901 DOT 585069 AT shelley DOT paradise DOT net DOT nz> <8sn0t7$1nd$1 AT nnrp1 DOT deja DOT com> NNTP-Posting-Host: acp3bf.physik.rwth-aachen.de X-Trace: nets3.rz.RWTH-Aachen.DE 971968232 26829 137.226.32.75 (19 Oct 2000 15:10:32 GMT) X-Complaints-To: abuse AT rwth-aachen DOT de NNTP-Posting-Date: 19 Oct 2000 15:10:32 GMT Originator: broeker@ To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com gdemont AT my-deja DOT com wrote: > Some of the programs I have under hand do run slower. *How much* slower? It's really hard to speak about optimization differences if their size is not known. Is it: a percent slower? Fifty? Taking 5 times as long? And what's that code doing, in the first place? > Target is a Pentium-S 166Mhz. > The options are > * For gcc 2.7.2 (gnat 3.10): > -i -gnatpn -O2 -fomit-frame-pointer -funroll-loops > -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 '-funroll-loops' hardly does you any good, on any x86 type machine. It increases the code size, and as soon as the amount of code being looped over (the 'active set') goes beyond the size of the 1st level Cache, you'll receive a noticeable performance degradation. Same as you cross other size barriers. Pentium-class machines are good enough at branching (and branch prediction, in particular) that unrolling loops doesn't gain you terribly much, anyway, before the cache coherency loss strikes back. For further exploration, a look at compiler input (source) and output (assembly) would be necessary. And of course some profiling to see where the code spends the majority of its time, in the first place, so the scrutiny can be limited in scope. The version of the DJGPP runtime and binutils also can play an important role. Fully correct alignment of the code on 32byte boundaries helps, but it took us some iterations to get it right. -- Hans-Bernhard Broeker (broeker AT physik DOT rwth-aachen DOT de) Even if all the snow were burnt, ashes would remain.