Date: Mon, 9 Mar 1998 17:41:22 -0800 (PST) Message-Id: <199803100141.RAA17266@adit.ap.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" To: "Randy Sorensen" , djgpp AT delorie DOT com From: Nate Eldredge Subject: Re: Optimized code, comparing with Borland C++ 4.5 w/ Power Pack Precedence: bulk At 05:01 3/8/1998 -0700, Randy Sorensen wrote: >Here's my problem. When I do shut down out of DOS and run the exec's >included on the CD, they run up to 60 fps, where as the code that I ported >to DJGPP runs at 40 fps with the following optimizations: >"-O6 -ffast-math -funroll-loops -finline -m486". Is there any other >optimizations that will speed it up? Also, I've heard that using high >"-O"'s will cause problems.. should I bring it down to 4 or 3? Standard GCC only does optimization levels up to `-O3', so anything higher than that will be silently taken as `-O3'. (For PGCC, `-O6' or above enables Pentium-specific optimizations). GCC's `-O3' is frequently not a win. It enables inlining of *all* functions, which can make your code very large and cause it to overflow caches, slowing things down. Try `-O2'. There are reports that `-funroll-loops' is not advantageous, for a similar reason as mentioned above. `-finline' is turned on automatically by all `-O' levels (except of course 0), so that's redundant. `-m486' is probably only a good idea if you really have a 486. It uses liberal alignment for jump targets and various other things. Other CPU's don't take any performance hit for not having such alignment, so on them you just get bloated code. (The Cyrix 6x86 is also reported to be helped by `-m486', but it doesn't help mine...) You probably want to use the `-fomit-frame-pointer' switch if you don't plan to debug the code at all. This lets GCC get away without setting up a stack frame for each function (`pushl %ebp; movl %esp, %epb; ...) if it can. This is usually a win because it gives you both smaller code and an extra register. It's possible that YMMV. > >I should note that when I ported the code, the original author didn't do a >very "standard" job with it. Some of the matrix vector and point >transformation code was inlined (they were C++ methods) and I couldn't >figure out how to make DJGPP inline them. Also, since you can't write to >video memory in DJGPP by default, I went about doing so using >__djgpp_nearptr_enable() and adding __djgpp_conventional_base to the video >memory address. Is there a faster way of going about video memory writing? Probably not, although you are living dangerously by doing this. HTH. Nate Eldredge eldredge AT ap DOT net