Newsgroups: comp.os.msdos.djgpp
From: "e.oti" <e DOT oti AT stud DOT warande DOT ruu DOT nl>
Subject: Re: Speed Optimization is getting worse with V2.01
Sender: usenet AT fys DOT ruu DOT nl (News system Tijgertje)
Message-ID: <3275129C.1FC5@stud.warande.ruu.nl>
Date: Mon, 28 Oct 1996 20:07:56 GMT
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
References: <551i8f$2dv AT sjx-ixn9 DOT ix DOT netcom DOT com>
Mime-Version: 1.0
Organization: Physics and Astronomy, University of Utrecht, The Netherlands
Lines: 56
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

William D. Kirby wrote:
> 
> I have a test program for timing tests, and it shows that DJGPP v2.01
> produces slower executables than v2.0 which are slower than v1.0. We
> are giving up a lot of speed with the improvements being made. Presently
> DJGPP v2.01 exe's are about 10% slower than targets made with Borland 4.5
> with Power Pack a 32 DPMI extention.
> 
> --

I recently downloaded v. 2.01 too and noticed a couple of things
(probably already mentioned here in other threads):

*The executables are more bloated; it seems to be due to the symbol
 table because stripping the coff output produces an executable
 of the same size as v2.00.

*The optimisation flags are switched on differently; -fforce-mem
 is now a part of O2 ; it wasn't before. I spent a couple of hours
 playing around with optimisation switches to get the "feel" of
 how it works. The end result is, it optimises just as well as v 2.00
 but it takes a different combination to achieve the same result.
 Gcc allows you to fiddle with the nature of optimisation quite a
 bit, and it helps to know what your code is doing.

 Here are a couple of tips:
 
1. Profile your code and check which routines are time critical.
   Compile your code with the ordinary optimisation switches:
    -O2 -m486 -fomit-frame-pointer -ffast-math
   Disassemble it or compile with -S to get the gas input file.
   Examine the assembler code.

2. If there aren't too many memory accesses within the inner loop
   try adding -fforce-addr to the optimisation switch list. This
   copies addresses into registers for pointer arithmetic. It helps
   a lot if a couple of pointers are used heavily within one single 
   loop. It doesn't help if you're referencing dozens of different
   addresses in the inner loop.  
   
3. If there are a lot of memory references, try adding -fno-force-mem,   
   because repeated copies of memory variables into registers causes a   
   lot of bloat and slows down the speed, naturally. 

4. Profile the effect of -funroll-loop and -funroll-all-loops,
   and -fstrength-reduce. Try to compile the different source files
   with different optimisation flags.


Finally, may I add that I've always succeeded in getting gcc executables
to run faster than the Borland-compiled version ( version 5.0 excepted,
I haven't access to it) and certainly faster than djgpp v1.
Not that that's an objective criterion or anything, but let that
be a stimulus to optimise further.

Elliott