Date: Mon, 9 Mar 1998 17:41:22 -0800 (PST)
Message-Id: <199803100141.RAA17266@adit.ap.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
To: "Randy Sorensen" <randy AT idcomm DOT com>, djgpp AT delorie DOT com
From: Nate Eldredge <eldredge AT ap DOT net>
Subject: Re: Optimized code, comparing with Borland C++ 4.5 w/ Power
  Pack
Precedence: bulk

At 05:01  3/8/1998 -0700, Randy Sorensen wrote:
>Here's my problem.  When I do shut down out of DOS and run the exec's
>included on the CD, they run up to 60 fps, where as the code that I ported
>to DJGPP runs at 40 fps with the following optimizations:
>"-O6 -ffast-math -funroll-loops -finline -m486".  Is there any other
>optimizations that will speed it up?  Also, I've heard that using high
>"-O"'s will cause problems.. should I bring it down to 4 or 3?

Standard GCC only does optimization levels up to `-O3', so anything higher
than that will be silently taken as `-O3'. (For PGCC, `-O6' or above enables
Pentium-specific optimizations).

GCC's `-O3' is frequently not a win. It enables inlining of *all* functions,
which can make your code very large and cause it to overflow caches, slowing
things down. Try `-O2'.

There are reports that `-funroll-loops' is not advantageous, for a similar
reason as mentioned above.

`-finline' is turned on automatically by all `-O' levels (except of course
0), so that's redundant.

`-m486' is probably only a good idea if you really have a 486. It uses
liberal alignment for jump targets and various other things. Other CPU's
don't take any performance hit for not having such alignment, so on them you
just get bloated code. (The Cyrix 6x86 is also reported to be helped by
`-m486', but it doesn't help mine...)

You probably want to use the `-fomit-frame-pointer' switch if you don't plan
to debug the code at all. This lets GCC get away without setting up a stack
frame for each function (`pushl %ebp; movl %esp, %epb; ...) if it can. This
is usually a win because it gives you both smaller code and an extra
register. It's possible that YMMV.
>
>I should note that when I ported the code, the original author didn't do a
>very "standard" job with it.  Some of the matrix vector and point
>transformation code was inlined (they were C++ methods) and I couldn't
>figure out how to make DJGPP inline them.  Also, since you can't write to
>video memory in DJGPP by default, I went about doing so using
>__djgpp_nearptr_enable() and adding __djgpp_conventional_base to the video
>memory address.  Is there a faster way of going about video memory writing?

Probably not, although you are living dangerously by doing this.

HTH.


Nate Eldredge
eldredge AT ap DOT net