delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1997/11/06/13:33:31

Message-Id: <m0xTR1J-000S1iC@inti.gov.ar>
Comments: Authenticated sender is <salvador AT natacha DOT inti DOT gov DOT ar>
From: "Salvador Eduardo Tropea (SET)" <salvador AT inti DOT edu DOT ar>
Organization: INTI
To: Nate Eldredge <eldredge AT ap DOT net>, AHBain AT pulsedesign DOT demon DOT co DOT uk
Date: Thu, 6 Nov 1997 15:37:57 +0000
MIME-Version: 1.0
Subject: Re: Code optimisations
CC: djgpp AT delorie DOT com

Nate Eldredge <eldredge AT ap DOT net> replaied to
> At 11:27  11/4/1997 +0000, Alistair Bain wrote:
> >Ok don't want to start a war here but...
> >I'm new-ish to C and I am looking for ways to speed up my code.
> >I don't care a whit about readability and maitainability of code, or
> >the size of code generated (As long as it runs on clean 16Mb system). I
> >just want the best performance poss. (for games programming).
> Okay, but... if you're "new-ish" to C, focusing on performance over
> maintainability might not be such a good idea. Just a suggestion.
> >
> >
> >1) structs - Are they slow? what do they compile to? 
> No, not particularly. The compiler knows the offset of each member of the
> struct, so something like this:
> foo.x = 3;
> compiles to (pseudocode)
> mov [foo+offsetof(x)],3
> 
> A member/dereference (you know, the `->' operator) may be slightly slower
> than a straight pointer dereference, because it must add the member offset
> to the pointer, then move indirect. But the compiler can often optimize
> these away... you might be surprised.
Nate is right. But for maximun is better:

static int XCoors[10];
static int YCoors[10];

and then XCoors[index] & YCoors[index]

Than: Coors[index].X & Coors[index].Y

Both takes the same number of clocks but the first uses less cache so is 
normally faster.

> >2) unrolling loops - to what extent does -funroll-loops do this? Does -
> >O3 do this? Adding both I get an excecutable of exactly the same size as
> >using just -O. Should I just manually unroll the loops?
> `-funroll-loops' unrolls all the loops where "the number of iterations is
> known at compile time" (GCC manual). Things like this:
> for (i=0; i<1000; i++) /* do something */;
> So perhaps your loop wasn't so precisely determined. `-funroll-all-loops'
> unrolls even loops where it doesn't know how many iterations, but this is
> generally a Bad Thing, since there is still a test and conditional jump
> after each iteration, and the code just gets huge and exceeds caches.
Yes that's the bad point, some times the unroll is huge and the speed is 
inferior. You must experiment in your particular case.
 
> `-O3' does not unroll loops. If you read the GCC manual, it says that:
> * `-O' performs the optimizations that have the most effect for the time
> they take;
> * `-O2' performs almost all optimizations that don't trade-off size for speed;
> * `-O3' is the same as `-O2' but also inlines functions when possible.
> 
> Also, I seem to remember something about a bug causing crashes or incorrect
> code using loop unrolling. IMO, it doesn't tend to be a win anyway.
Don't discard a thing because have bugs because even using -O1 you can hit bugs 
in GCC, just take a look to my list of bugs (in my Web). The probabilities to 
hit one are normally ultra-low.

> >3) global variables - I heard, read, dreamt something about them being
> >faster. ie declare all variables at top of prog and just be careful
> >about accessing wrong ones, etc.
> Not that I know of. An access to a global variable compiles to:
> mov [12345],42 ; where 12345 is the variable's address
> 
> while a local access becomes:
> mov [ebp-17],42 ; where the variable is 17 bytes into the stack frame
> 
> My 386 manual gives identical instruction timings for both forms, and AFAIK
> this hasn't changed on newer chips. In fact, local variables may even be
> stored in registers and become faster yet.
True, but depends on the code. Global arrays (static) are faster than dynamically 
allocated arrays because with the global ones you don't loose a register (used 
as base). So depends on the variable. There isn't one answer.
 
> I've found that the best options for optimizing are:
> -O3,                     to make the compiler work hard;
Perhaps true for P5, not for 486 and 5x86 because the cache is bloated.

> -fomit-frame-pointer,    unless you need to debug your code
Using -O2 is hard to debug anyways ;-))).

> -ffast-math,             unless you use strict ANSI/IEEE floating point
> -m486                    assuming you have a 486 or better
Have a very small impact and normally the code bloating makes it invisible, 
but yes can be a little better for a 486.
 
> Here are some tips on code optimizing that I saw somewhere:
> * First, profile your code and find what needs improving; it's often not
> what you think.
> * A good algorithm is likely to make the most difference in speed.
> * Tricks in rearranging syntax to get the compiler to make better code
> rarely help.

I agree with Nate for 1 and 2. Not for 3 if you'll use GCC because for 
somethings is really stupid and needs some help (I don't know about a better 
compiler).

SET 
------------------------------------ 0 --------------------------------
Visit my home page: http://www.geocities.com/SiliconValley/Vista/6552/
Salvador Eduardo Tropea (SET). (Electronics Engineer)
Alternative e-mail: set-sot AT usa DOT net - ICQ: 2951574
Address: Curapaligue 2124, Caseros, 3 de Febrero
Buenos Aires, (1678), ARGENTINA
TE: +(541) 759 0013

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019