Mail Archives: djgpp/2000/01/18/15:46:11
On 17 Jan 2000, Dieter Buerssner wrote:
> Btw. I always use -O. For my programs this always almost produces
> the fastest code. And I tried this with various hardware and
> with FPU intensive code as well as with code that doesn´t
> need the FPU. I started with djgpp on a 386SX with 16 MHz.
> I thought this may be due to my ancient hardware. I upgraded to
> an 486 with 66 MHz and used the -m486. Often the -m486 would result
> in slower code and -O2 almost always produced slower code then -O.
> Recently I upgraded to AMD K6-2 266 MHz (running at 333 MHz).
> The same thing. Almost all my programs run fastest when compiled
> with -O (and -fomit-frame-pointer -ffast-math).
>
> Also I upgraded gcc. From (I believe) 1.39 upto 2.9.2 now.
> The fastest code seems to be produced by 2.7.3. Even when I
> compile with -march=k6 or -march=586 with 2.9.2, it won´t produce
> faster code then 2.7.3 in the examples I tested [1].
>
> So, am I stupid or has anybody got similar experiences?
Some quantitative data about the relative speed of -O and -O2 would
probably get our feet on the ground when discussing this. Without the
numbers, we are just waving hands here.
Having said that, here's what I know about this (see also section 14.2
in the FAQ):
-O and -O2 usually produce very similar code, with slight advantage
for -O2. But, for any specific code, it's quite possible that the
combination of optimization options defined by -O is a larger win than
the combination defined by -O2. In particular, your code might be
overflowing the CPU cache under -O2, but not under -O.
Most of the strange effects like what you describe are due to
alignment problems. The causes for these problems are distributed
between the compiler, Binutils, and the library in a complex way that
changes depending on the versions you are using. Short summary:
* The library wasn't aligning assembly functions and labels until
v2.03. Library functions written in C are not aligned optimally
due to problems with GCC versions before 2.9x (the v2.03 library
was compiled with GCC 2.8.1).
* GCC and Binutils were configured inconsistently as far as the
meaning of the .align directive is considered. This caused code
and data be misaligned.
* GCC 2.9x finally gets the alignment right (and also produces the
right .align directives that avoid cache misses on a Pentium).
* Binutils 2.8.1 and even 2.9.1 (for which there's no official port
yet) still align subsections on 4-byte boundaries, which can
easily cause significant run-time penalties in code that branches
and calls functions a lot. The next version of Binutils will
correct that.
* All versions of GCC before 2.9x were misaligning the stack, in
perticular if the program used double float data type.
So, to get rid of the alignment problems at this time, you need (in
the order mentioned):
- build Binutils with a patch that bumps up the subsection
alignment;
- rebuild libc.a with GCC 2.95.2 and the patched Binutils;
- recompile and relink your program with GCC 2.95.2 and the patched
Binutils.
I suspect that step 2 above will require to change the compiler
switches used for the library build (some of them define alignment).
- Raw text -