[an error occurred while processing this directive]
Node:<a name="Older%20is%20faster">Older is faster</a>,
Next:<a rel=next href="faq14_3.html">Pentium</a>,
Previous:<a rel=previous href="faq14_1.html">How fast</a>,
Up:<a rel=up href="faq14.html">Performance</a>
<br>

<h2>14.2 Comparing newer versions with old ones</h2>

<p><em><strong>Q</strong>: I switched to v2 and my programs now run slower than when
compiled with v1.x<small>...</small>.</em>

<br><p>
<p><em><strong>Q</strong>: I timed a test program and it seems that GCC 2.8.1 produces
slower executables than GCC 2.7.2.1 was, which in turn was slower than
DJGPP v1.x.  Why are we giving up so much speed as we get newer
versions?</em>

<br><p>
<p><em><strong>Q</strong>: I installed Binutils 2.8.1, and my programs are now much slower
than when they are linked with Binutils 2.7!</em>

<br><p>
<p><strong>A</strong>: In general, newer versions of GCC generate tighter, faster code,
than older versions.  Comparison between different versions of GCC shows
that they all optimize reasonably well, but it takes a different
combination of the optimization-related options to achieve the greatest
speed in each compiler version.  The default optimization options can
also change; for example, <code>--force-mem</code> is switched on by
<code>-O2</code> in 2.7.2.1; it wasn't before.  GCC offers a plethora of
optimization options which might make your code faster or slower (see
the GCC docs for a complete list); the best way to find the correct
combination for a given program is to profile and experiment.  Here are
some tips:

<ul>
<li>Compile your code using the GCC switches <code>-O2 -mpentium
-fomit-frame-pointer -ffast-math</code>.  (For PGCC and GCC version 2.95 and
later, use <code>-O6</code> instead of <code>-O2</code>.)

<li>Profile your code and check which functions are "hot spots". 
Disassemble them, or compile with <code>-S</code> (see <a href="faq8_20.html">getting assembly listing</a>),
and examine the machine code.

<li>If the disassembly shows there aren't too many memory accesses inside
the inner loops, try adding <code>-fforce-addr</code> option.  This option
helps a lot if a couple of pointers are used heavily within a single
loop.  If there are a lot of memory references, try adding
<code>-fno-force-mem</code>, to prevent GCC from repeatedly copying variables
from memory into registers.

<li>Sometimes, <code>-fomit-frame-pointer</code> might make things worse, since it
uses stack-relative addresses which have longer encoding and could
therefore overflow the CPU cache.  So try with and without this switch.

<li>With newer versions of GCC, programs whose inner loops include many
function calls, or which are deeply recursive, could benefit from using
the <code>-mpreferred-stack-boundary=2</code> compiler option.  This causes
the compiler to relax its stack-alignment requirements that need a lot
of <code>sub esp,xx</code> instructions.  The default stack alignment is 16
bytes, unless overridden by <code>-mpreferred-stack-boundary</code>.  The
argument to this option is the power of 2 used for alignment, so 2 means
4-byte alignment; if your code uses <code>double</code> and <code>long double</code>
variables, an argument of 3 might be a better choice.

<li>Depending on the structure of your code, try different combinations of
alignment for loops (<code>-malign-loops</code>), jumps
(<code>-malign-jumps</code>), and function entry points
(<code>-malign-functions</code>).  Alignment changes can have especially
profound effects when programs are run on AMD's K6 CPU, since these CPUs
suffer significant slowdown for code aligned on 4-byte boundaries.

<li>Try adding <code>-funroll-loops</code> and <code>-funroll-all-loops</code> and
profile the effect.

<li>Try compiling with <code>-fno-strength-reduce</code>.  In some cases where GCC
is in dire need of registers, this could be a substantial win, since
strength reduction typically results in using additional registers to
replace multiplication with addition.

<li>If different time-critical functions exhibit different behavior under
some of the optimization options, try compiling them with the best
combination for each one of them. 
</ul>

<p>I'm told that the PGCC version of GCC has bugs in its optimizer which
show when you use level 7 or higher.  Until that is solved in some
future version, you are advised to stick to <code>-O6</code>.  Some
programs actually run faster when compiled with <code>-O2</code> or
<code>-O3</code>, even when compiled with PGCC, so you might try that as
well.  Several users reported that PGCC v2.95.1 tends to crash a lot
during compilation, especially with <code>-O5</code>, <code>-O6</code> and
<code>-mpentium</code> options.  (In general, PGCC version 2.95 is deemed
buggy; you are advised not to use it.)

<p>Programs which manipulate multi-dimensional arrays inside their
innermost loops can sometimes gain speed by switching from dynamically
allocated arrays to static ones.  This can speed up code because the
size of a static array is known to GCC at compile time, which allows it
to avoid dedicating a CPU register to computing offsets.  This register
is then available for general-purpose use.

<p>Another problem that is related to C<tt>++</tt> programs which manipulate
arrays happens when you fail to qualify the methods used for array
manipulation as <code>inline</code>.  Each method or function that wasn't
declared <code>inline</code> will <em>not</em> be inlined by GCC, and will incur
an overhead of a function call at run time.

<p>However, inlining only helps with small functions/methods; large inlined
functions will overflow the CPU cache and typically slow down the code
instead of speeding it up.

<p>If your CPU is AMD's K6, try upgrading to GCC 2.96 or later and use the
<code>-mcpu=k6</code> switch.  I'm told that K6-specific optimizations are
much better in these versions of GCC.

<p>A bug in the startup code distributed with DJGPP versions before v2.02
can also be a reason for slow-down.  The problem is that the runtime
stack of DJGPP programs was not guaranteed to be properly aligned.  This
usually only shows up on Windows (since CWSDPMI aligns the stack on its
own), and even then only sometimes.  But it has been reported that
switching to Binutils 2.8.1 sometimes causes such slow-down, and
switching to PGCC can reveal this problem as well.  In some cases,
restarting Windows would cause programs run at normal speed again.  If
you experience such problems too much, upgrade to v2.02.

<p><hr>
[an error occurred while processing this directive]