Mail Archives: djgpp/1997/05/13/04:18:29
jon (quacci AT vera DOT com) writes:
> I'm interested in understanding what can be done to speed up straight
> C code. In the specific thing I am writing, I've already done the
> obvious things, like switched most calcs from FP to integer, using bit
> shifting wherever possible for multiplying and dividing, etc. But is
> there a complied source of information on just
> what-is-faster-than-what? Like, does running a "for" loop by
> decrementing rather than incrementing actually save a cycle? or does a
> "case" command actually beat a series of "if"s? Do global variable
> speed things up? I figure there must be something out there that has
> the low-down on just this sort of nitty-gritty info.
CASE statements: Put the most likely case first, and so on. This reduces
the number of comparisons until it finds the right case.
Loops: If possible, make a loop decrement, and make it stop on zero, since
this saves an entire comparison instruction; GCC can optimize
for (y=21; y; y--) { foo(y); bar(x) }
by directly following the y-- with a jz or a jnz, testing if the decrement
put up the zero flag. With (y=1; y<21; y++) there is a CMP %eax,21 or
somesuch as well. That's a bunch of lost cycles.
Use longs and ints, avoid shorts where speed is paramount. (Use shorts
where shorts are enough and space is at a premium, such as on disk; I
often store data that is "short" as shorts in files, but convert them to
regular (in gcc, long) ints in ram and in calculation areas. Shorts
require extra instructions here and there when the CPU operates in 32 bit
protected mode.
Compile with -m486, since anything speed-requiring should be run on 486
and up nowadays. This causes gcc to optimize for 486 and up chips.
Compile with -O3. Also -fforce-addr might help, it sometimes saves speed
here and there. -fomit-frame-pointer frees an extra *whole bloody
register*. But, it makes debugging hairier in that a stack traceback after
a segfault will only show the last couple of calls, rather than the entire
subroutine nesting stack since launch. Usually the problem is discernible
from the line of the crash, but sometimes that info f-omit-frame-pointer
loses is needed to track down some of the more diabolical bugs.
(Without this flag, GCC saves a register for storing pointers for this
debugging info.)
If bugs seem to appear and disappear or prove very elusive, compile
without optimizations to debug, but be sure to test with optimization
flags after!
Also, read the info docs for special function attributes. A function
called frequently, especially recursively or in inner loops, can be
declared "register" and registers are used to pass parameters instead of
the stack. If the function has only two or three arguments, they all get
passed by register. This saves a lot of push and pop instructions, but it
might cause gcc to juggle registers instead, so test your program's speed
with and without this for each function. It of course does not affect
functions with no arguments.
--
.*. Where feelings are concerned, answers are rarely simple [GeneDeWeese]
-() < When I go to the theater, I always go straight to the "bag and mix"
`*' bulk candy section...because variety is the spice of life... [me]
Paul Derbyshire ao950 AT freenet DOT carleton DOT ca, http://chat.carleton.ca/~pderbysh
- Raw text -