delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1997/11/05/22:49:04

Date: Wed, 5 Nov 1997 19:45:44 -0800 (PST)
Message-Id: <199711060345.TAA03125@adit.ap.net>
Mime-Version: 1.0
To: Alistair Bain <AHBain AT pulsedesign DOT demon DOT co DOT uk>, djgpp AT delorie DOT com
From: Nate Eldredge <eldredge AT ap DOT net>
Subject: Re: Code optimisations

At 11:27  11/4/1997 +0000, Alistair Bain wrote:
>Ok don't want to start a war here but...
>I'm new-ish to C and I am looking for ways to speed up my code.
>I don't care a whit about readability and maitainability of code, or
>the size of code generated (As long as it runs on clean 16Mb system). I
>just want the best performance poss. (for games programming).
Okay, but... if you're "new-ish" to C, focusing on performance over
maintainability might not be such a good idea. Just a suggestion.
>
>
>1) structs - Are they slow? what do they compile to? 
No, not particularly. The compiler knows the offset of each member of the
struct, so something like this:
foo.x = 3;
compiles to (pseudocode)
mov [foo+offsetof(x)],3

A member/dereference (you know, the `->' operator) may be slightly slower
than a straight pointer dereference, because it must add the member offset
to the pointer, then move indirect. But the compiler can often optimize
these away... you might be surprised.
>
>2) unrolling loops - to what extent does -funroll-loops do this? Does -
>O3 do this? Adding both I get an excecutable of exactly the same size as
>using just -O. Should I just manually unroll the loops?
`-funroll-loops' unrolls all the loops where "the number of iterations is
known at compile time" (GCC manual). Things like this:
for (i=0; i<1000; i++) /* do something */;
So perhaps your loop wasn't so precisely determined. `-funroll-all-loops'
unrolls even loops where it doesn't know how many iterations, but this is
generally a Bad Thing, since there is still a test and conditional jump
after each iteration, and the code just gets huge and exceeds caches.

`-O3' does not unroll loops. If you read the GCC manual, it says that:
* `-O' performs the optimizations that have the most effect for the time
they take;
* `-O2' performs almost all optimizations that don't trade-off size for speed;
* `-O3' is the same as `-O2' but also inlines functions when possible.

Also, I seem to remember something about a bug causing crashes or incorrect
code using loop unrolling. IMO, it doesn't tend to be a win anyway.
>
>3) global variables - I heard, read, dreamt something about them being
>faster. ie declare all variables at top of prog and just be careful
>about accessing wrong ones, etc.
Not that I know of. An access to a global variable compiles to:
mov [12345],42 ; where 12345 is the variable's address

while a local access becomes:
mov [ebp-17],42 ; where the variable is 17 bytes into the stack frame

My 386 manual gives identical instruction timings for both forms, and AFAIK
this hasn't changed on newer chips. In fact, local variables may even be
stored in registers and become faster yet.

I've found that the best options for optimizing are:
-O3,                     to make the compiler work hard;
-fomit-frame-pointer,    unless you need to debug your code
-ffast-math,             unless you use strict ANSI/IEEE floating point
-m486                    assuming you have a 486 or better

Here are some tips on code optimizing that I saw somewhere:
* First, profile your code and find what needs improving; it's often not
what you think.
* A good algorithm is likely to make the most difference in speed.
* Tricks in rearranging syntax to get the compiler to make better code
rarely help.

Hope this helps!

Nate Eldredge
eldredge AT ap DOT net



- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019