Mail Archives: djgpp/1996/10/28/06:15:05
William D. Kirby wrote:
>
> I have a test program for timing tests, and it shows that DJGPP v2.01
> produces slower executables than v2.0 which are slower than v1.0. We
> are giving up a lot of speed with the improvements being made. Presently
> DJGPP v2.01 exe's are about 10% slower than targets made with Borland 4.5
> with Power Pack a 32 DPMI extention.
>
> --
I recently downloaded v. 2.01 too and noticed a couple of things
(probably already mentioned here in other threads):
*The executables are more bloated; it seems to be due to the symbol
table because stripping the coff output produces an executable
of the same size as v2.00.
*The optimisation flags are switched on differently; -fforce-mem
is now a part of O2 ; it wasn't before. I spent a couple of hours
playing around with optimisation switches to get the "feel" of
how it works. The end result is, it optimises just as well as v 2.00
but it takes a different combination to achieve the same result.
Gcc allows you to fiddle with the nature of optimisation quite a
bit, and it helps to know what your code is doing.
Here are a couple of tips:
1. Profile your code and check which routines are time critical.
Compile your code with the ordinary optimisation switches:
-O2 -m486 -fomit-frame-pointer -ffast-math
Disassemble it or compile with -S to get the gas input file.
Examine the assembler code.
2. If there aren't too many memory accesses within the inner loop
try adding -fforce-addr to the optimisation switch list. This
copies addresses into registers for pointer arithmetic. It helps
a lot if a couple of pointers are used heavily within one single
loop. It doesn't help if you're referencing dozens of different
addresses in the inner loop.
3. If there are a lot of memory references, try adding -fno-force-mem,
because repeated copies of memory variables into registers causes a
lot of bloat and slows down the speed, naturally.
4. Profile the effect of -funroll-loop and -funroll-all-loops,
and -fstrength-reduce. Try to compile the different source files
with different optimisation flags.
Finally, may I add that I've always succeeded in getting gcc executables
to run faster than the Borland-compiled version ( version 5.0 excepted,
I haven't access to it) and certainly faster than djgpp v1.
Not that that's an objective criterion or anything, but let that
be a stimulus to optimise further.
Elliott
- Raw text -