delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2000/01/19/07:40:27

Date: Wed, 19 Jan 2000 11:31:06 +0200 (IST)
From: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
X-Sender: eliz AT is
To: Dieter Buerssner <buers AT gmx DOT de>
cc: djgpp AT delorie DOT com
Subject: Re: gcc optimization (Was: Executable size: limit to acceptability?)
In-Reply-To: <8623d1$26n4p$1@fu-berlin.de>
Message-ID: <Pine.SUN.3.91.1000119113049.9609J-100000@is>
MIME-Version: 1.0
Reply-To: djgpp AT delorie DOT com
Errors-To: dj-admin AT delorie DOT com
X-Mailing-List: djgpp AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com

On 18 Jan 2000, Dieter Buerssner wrote:

> My CPU is AMD K6-2 266.

I don't know anything about K6.  AFAIK, GCC's code is optimized
towards Intel's recommendations; I don't know how well these fit K6.

> gcc 2.9.2: flags -fomit-frame-pointer -ffast-math + indicated flags

You mean 2.95.2, right?

>           -On -mcpu=k6  -On -march=k6
> -O   86383       92070          92070
> -O2  85852       86966          87009
> -O3  81476       89791          89814
> -O6  81421       89833          89818
> 
> In all three cases -O produces the fastest code.

The differences are small enough to be explained by alignment.  I
suggest to look at the code (disassemble inside a debugger) and see
how many targets of jmp and call instructions are misaligned.  Intel
recommends them to be aligned on 16-byte boundaries, unless they are
more than 7 bytes far from this boundary.  GCC 2.95.2 emits the
correct alignment directives (.balign 16,,7), but your Binutils mess
that up, because each .o file is aligned on 4-byte boundary instead of
16-byte.  In effect, you are disrupting the CPU's prefetch queues,
which can have significant effect on performance.

> The produced code runs slower than code produced with gcc 2.6.3!
> The same was true for my old 486 66 and 386SX when comparing
> newer versions of gcc with 2.6.3.

You need to experiment with more optimization options than just -mcpu
and -march.  GCC has lots of different optimization options, and -O2
turns on almost all of them; you should try to selectively turn on
only some of them.  Section 14.2 of the FAQ refers to this, although
it's probably not up-to-date yet with the latest GCC releases.

Also, GCC tries very hard to align the stack on 8-byte boundary, and
that causes it to emit a lot of stack-alignment instructions (subl
%esp, 4 etc.).  This could lose big time if your program doesn't need
this alignment.  I suggest to experiment with the alignment-related
options.

> My conclusion is, to useally use -O only, and to still have
> an old version of gcc around.

There's nothing wrong with this conclusion, but I think there's lots
more to check before this conclusion is general enough.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019