delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1997/05/13/12:23:13

From: Andrew Crabtree <andrewc AT typhoon DOT rose DOT hp DOT com>
Message-Id: <199705131619.AA156030341@typhoon.rose.hp.com>
Subject: Re: Any tips on optimizing C code?
To: quacci AT vera DOT com (jon)
Date: Tue, 13 May 1997 9:19:00 PDT
Cc: djgpp AT delorie DOT com
In-Reply-To: <33775c59.19219875@news.cis.yale.edu>; from "jon" at May 12, 97 6:15 pm

> C code. In the specific thing I am writing, I've already done the
> obvious things, like switched most calcs from FP to integer, using bit
> shifting wherever possible for multiplying and dividing, etc. But is

This kind of optimization is likely unneeded these days.  From Intels
optimizing guide they state that for a Pentium Pro, a multiple 
should only be replaced by a single shift.  For a pentium a shift, a mov,
and a subtract/add beats a multiple, but not any more.  They have
a small algorithm based on the number of bits set in the value to
tell you wether to use shifts/adds on a given processor.  Unless your
using a 386 or 486 its probably not worth it.  GCC will find the 
simple ones (like powers of 2).

> Like, does running a "for" loop by
> decrementing rather than incrementing actually save a cycle? 

Running a for loop thats triggers on 0 saves a couple of cycles.  Again,
GCC figures out to do this for you most of the time.

I have seen it do both of these.

for (i=0; i<=10;i++)

1)

	mov ECX,10
loop
	...
	dec ECX
 	jnz loop

2)
	
	mov ECX, -10
loop
	...
	inc ECX
	jnz loop

both of which eliminate a compare instruction.


> or does a
> "case" command actually beat a series of "if"s? 

If you can get a case statement optimized to a jump table (for a lot of
cases), this is much faster.  

You should probably play around with the different optimization flags
and see what kind of assembly they output, you'd be surprised at how
good the compiler can be sometimes (you'd also be surprised how
retarted it is).

I would be careful using -m486, as that can often lead to slower performance
on Pentium class machines or newer (it screws up alignment and uses shifts
when it should mul).  I have an environment setup now where I can 
rebuild gcc 2.8 with the pentium optimizations patches, and I will make the 
compiled EXEs available to whoever wants them.  My preliminary testing
shows a pretty decent speed increase. 

Your best bet in a high-level language is algorithm choice and overall
program structure.

Andrew





- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019