From: ao950 AT FreeNet DOT Carleton DOT CA (Paul Derbyshire) Newsgroups: comp.os.msdos.djgpp Subject: Re: Any tips on optimizing C code? Date: 13 May 1997 03:40:10 GMT Organization: The National Capital FreeNet Lines: 62 Message-ID: <5l8nqq$f5e@freenet-news.carleton.ca> References: <33775c59 DOT 19219875 AT news DOT cis DOT yale DOT edu> Reply-To: ao950 AT FreeNet DOT Carleton DOT CA (Paul Derbyshire) NNTP-Posting-Host: freenet2.carleton.ca To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Precedence: bulk jon (quacci AT vera DOT com) writes: > I'm interested in understanding what can be done to speed up straight > C code. In the specific thing I am writing, I've already done the > obvious things, like switched most calcs from FP to integer, using bit > shifting wherever possible for multiplying and dividing, etc. But is > there a complied source of information on just > what-is-faster-than-what? Like, does running a "for" loop by > decrementing rather than incrementing actually save a cycle? or does a > "case" command actually beat a series of "if"s? Do global variable > speed things up? I figure there must be something out there that has > the low-down on just this sort of nitty-gritty info. CASE statements: Put the most likely case first, and so on. This reduces the number of comparisons until it finds the right case. Loops: If possible, make a loop decrement, and make it stop on zero, since this saves an entire comparison instruction; GCC can optimize for (y=21; y; y--) { foo(y); bar(x) } by directly following the y-- with a jz or a jnz, testing if the decrement put up the zero flag. With (y=1; y<21; y++) there is a CMP %eax,21 or somesuch as well. That's a bunch of lost cycles. Use longs and ints, avoid shorts where speed is paramount. (Use shorts where shorts are enough and space is at a premium, such as on disk; I often store data that is "short" as shorts in files, but convert them to regular (in gcc, long) ints in ram and in calculation areas. Shorts require extra instructions here and there when the CPU operates in 32 bit protected mode. Compile with -m486, since anything speed-requiring should be run on 486 and up nowadays. This causes gcc to optimize for 486 and up chips. Compile with -O3. Also -fforce-addr might help, it sometimes saves speed here and there. -fomit-frame-pointer frees an extra *whole bloody register*. But, it makes debugging hairier in that a stack traceback after a segfault will only show the last couple of calls, rather than the entire subroutine nesting stack since launch. Usually the problem is discernible from the line of the crash, but sometimes that info f-omit-frame-pointer loses is needed to track down some of the more diabolical bugs. (Without this flag, GCC saves a register for storing pointers for this debugging info.) If bugs seem to appear and disappear or prove very elusive, compile without optimizations to debug, but be sure to test with optimization flags after! Also, read the info docs for special function attributes. A function called frequently, especially recursively or in inner loops, can be declared "register" and registers are used to pass parameters instead of the stack. If the function has only two or three arguments, they all get passed by register. This saves a lot of push and pop instructions, but it might cause gcc to juggle registers instead, so test your program's speed with and without this for each function. It of course does not affect functions with no arguments. -- .*. Where feelings are concerned, answers are rarely simple [GeneDeWeese] -() < When I go to the theater, I always go straight to the "bag and mix" `*' bulk candy section...because variety is the spice of life... [me] Paul Derbyshire ao950 AT freenet DOT carleton DOT ca, http://chat.carleton.ca/~pderbysh