From: baldo AT luna DOT internet DOT com DOT uy
Message-Id: <3.0.1.32.19971229014358.0069b790@mail.internet.com.uy>
Date: Mon, 29 Dec 1997 01:43:58 -0300
To: dave DOT nugent AT ns DOT sympatico DOT ca, djgpp AT delorie DOT com
Subject: Re: Help with optimizing for speed
In-Reply-To: <34A40538.F19@ns.sympatico.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Precedence: bulk

Hello!

At 11:27 AM 26/12/1997 -0800, you wrote:
>Hello, can anyone tell me if they see a way to optimize this code at
>all.
>I am trying to write a scrolling style game.  Nothing fancy.
>I have a large buffer set up around 1.2MB called screen_hold that holds
>the
>entire level (draw).  I then copy 160 lines*320 bytes of this to a
>secondary
>buffer called screen where I will then add sprites to the background and
>blast to vga memory screen mode13.  I am using 160 lines, because the
>bottom
>40 lines will be used for a score bar & other info that will not always
>need
>to be redrawn constantly.  I am just new to DJGPP and have been using
>Borland
>C++ v3.0 for DOS up til now, but figured I could really increase the
>speed
>with a 32 bit compiler, but with the code I am using, there is not much
>difference (speed wise) between the code generated by DJGPP and
>Borland's
>Turbo C++ 16bit code.  Is there a way that would be faster?  I'm trying
>to get speed similar to that in Jazz JackRabbit.. I can't think of a
>faster
>way than this in 32 bit!!

	DJGPP must be definitely faster. And I have a question: how you have done
to manipulate 1.2mb of memory in Borland?

>
>// xoxoxoxoxoxo    Snipped code... xoxoxoxoxoxoxoxoxoxoxoxox
>for(loop1=0;loop1<160;loop1++)
>memcpy(screen+loop1*320,screen_hold+offset+loop1*3200,320);
	Try to do it in this manner:
for(loop1=159; loop1>=0; loop1--)
memcpy(screen+loop1*320,screen_hold+offset+loop1*3200,320);

	Try to do all the loops in this manner, from a number to zero or in an
inverse form (decrecient --), it optimizes a little...
	Also, (not apearing in this case but..) you can optimize this class of code:
	for(Y=123; Y>=0; Y--) {
		for(X=321; X>=0; X--) {
			BUFFER[X][Y]=A_VALUE;
			}
		}
	changing it in this manner:
	for(X=321; X>=0; X--) {
		for(Y=123; Y>=0; Y--) {
			BUFFER[X][Y]=A_VALUE;
			}
		}
	It is faster to access the last index of an array first and then all the
others in order.

	This are some of the little optimizations I know. There are others but you
have to figure it out because are type of code dependent.
	
>// Now send the screen buffer to VGA Memory..
>_dosmemputl(screen, 16000, 0xa0000);  // Send "screen" buffer to VGA
>MEMORY
>
>// Ok.. firstly I know I can use shifts << for the multip's.  I just
>wrote
>it this way to make easier to understand.
	DJGPP and many compilers optimizes it for you, dont worry.

>I am copyiny 160 lines of 320 bytes from screen_hold to screen buffers
>the offset and 3200 are related to where the screen is located in the
>buffer screen_hold.
>
	Goodbye! HTH!


Ivan Baldo:
E-Mail: baldo AT internet DOT com DOT uy.
Alternate E-Mail: ibaldo AT usa DOT net.
Another alternate E-Mail: lubaldo AT adinet DOT com DOT uy.
Web page: http://xoom.com/baldo.
Phone: (598) (2) 613 3223.
Caldas 1781, Malvin, Montevideo, Uruguay, South America.

My WEB page:
	- Icd (a fast fuzzy directory changer for DOS,
          Freeware compiled with DJGPP and with full
          source code).
	- Some other silly thinks...
	- English (new) and Spanish languages.