From: vecna AT inlink DOT com ([vecna])
Subject: Optimization and bug smashing.. a lot of other questions too :)
Date: Mon, 11 Aug 1997 00:20:11 GMT
Message-ID: <33ee3f7f.4973504@news.inlink.com>
Newsgroups: comp.os.msdos.djgpp,rec.games.programmer
Lines: 216
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Precedence: bulk

Hello, 

I know that crossposting is the spawn of satan, but I think both of
these groups are relevant... :)

Basically, I'm the programmer of VERGE, a console-style RPG
construction kit. It have a ton of new features I'd love to add, but I
have three main issues that I need to ask for help on :)   (i imagine
this will be a tad long post)

First, optimization. gprof now seems to give totally random results
each time I run it on VERGE now, but once upon a time when gprof
worked, it told me that the biggest routines I should focus on
optimizing are my tcopysprite() routine and my vgadump() routine.
First of all, my scrolling technique is to used an oversized drawing
buffer and just draw the inside window rather than use a clipping
tilecopier. Not only does it seem easier, but I'd think it'd be faster
as well, but then I'm not really sure. Anyhow, without further ado,
here's some source:

This is the routine that blits the 352x232 virtual screen to the
320x200 real screen. This is one of those 'every cycle counts'
routines. Any optimization hints would be appreciated.

vgadump()
{
  if (waitvrt) wait();
  asm("movl _virscr, %%esi              \n\t"
      "addl $5648, %%esi                \n\t"
      "movl _screen, %%edi              \n\t"
      "xorl %%eax, %%eax                \n\t"
"lineloop:                              \n\t"
      "movl $80, %%ecx                  \n\t"
      "rep                              \n\t"
      "movsl                            \n\t"
      "addl $32, %%esi                  \n\t"
      "incl %%eax                       \n\t"
      "cmpl $201, %%eax                 \n\t"
      "jb lineloop                      \n\t"
      :
      :
      : "esi", "edi", "cc", "eax", "ecx");
}

This is the routine used for non-transparent 16x16 tiles in the
background. It's pretty fast already and isn't that big of an
optimization deal, but still, any optimization hints would be great. 

copytile(int x, int y, char *spr)
{ asm("movl $16, %%edx                  \n\t"
      "movl %2, %%esi                   \n\t"
      "movl %1, %%eax                   \n\t"
      "imul $352, %%eax                 \n\t"
      "addl %0, %%eax                   \n\t"
      "addl _virscr, %%eax              \n\t"
      "movl %%eax, %%edi                \n\t"
"ctl0:                                  \n\t"
      "movsl                            \n\t"
      "movsl                            \n\t"
      "movsl                            \n\t"
      "movsl                            \n\t"
      "addl $336, %%edi                 \n\t"
      "decl %%edx                       \n\t"
      "jnz ctl0                         \n\t"
      :
      : "m" (x), "m" (y), "m" (spr)
      : "eax","edx","esi","edi","ecx","cc" );
}

Okay, this is the single most important routine to optimize. It's the
transparent blitter. It's very important to optimize already, and it
will get VERY important to optimize in the next version... EVERY CYCLE
COUNTS in this one. 

tcopysprite(int x, int y, int width, int height, char *spr)
{ asm("movl %3, %%ecx                   \n\t"
      "movl %4, %%esi                   \n\t"
"tcsl0:                                 \n\t"
      "movl %1, %%eax                   \n\t"
      "imul $352, %%eax                 \n\t"
      "addl %0, %%eax                   \n\t"
      "addl _virscr, %%eax              \n\t"
      "movl %%eax, %%edi                \n\t"
      "movl %2, %%edx                   \n\t"
"drawloop:                              \n\t"
      "lodsb                            \n\t"
      "orb %%al, %%al                   \n\t"
      "jz nodraw                        \n\t"
      "stosb                            \n\t"
      "decl %%edx                       \n\t"
      "orl %%edx, %%edx                 \n\t"
      "jz endline                       \n\t"
      "jmp drawloop                     \n\t"
"nodraw:                                \n\t"
      "incl %%edi                       \n\t"
      "decl %%edx                       \n\t"
      "orl %%edx, %%edx                 \n\t"
      "jnz drawloop                     \n\t"
"endline:                               \n\t"
      "incl %1                          \n\t"
      "decl %%ecx                       \n\t"
      "jnz tcsl0                        \n\t"
      :
      : "m" (x), "m" (y), "m" (width), "m" (height), "m" (spr)
      : "eax","edx","esi","edi","ecx","cc" );
}

Okay, that's all the optimization I'll bug anyone about now... :) 

Second, one thing that's hindered VERGE's progress is that it has some
unfortunate bugs, tho. Too many small, extremely bizarre ones to list,
really... I sincerely doubt that it's a compiler error, particularly
because using optimizations or not makes no difference. But, I have
become rather convinced that I have some underlying culprit making
otherwise working code turn evil on me. :)  For example, one of my bug
is in my VSP loading code (where the tile data is stored),
particularly dealing with the tile animation data. The problem is that
the very first map loaded, no matter what map it is,  will not animate
at all, but the second map will. So, you could start out in a town and
animations wouldn't work, walk into the overworld, and animations
would work, and back into the town with the animations working now.
Now, I created a workaround that simply loaded the animation data a
second time, but all indications pointed to the fread simply NOT
working the first time. The thing that really miffed me is that if you
single-step through the code in RHIDE's debugger, the code DOES work.
Is this any indication of anything specific to check for? The code is
so simple, too, I don't see where it could be bugged. Note that this
bug is not on the top of my hitlist, it's just an example of
weirdness. Another bug is that when you near the bottom of a map, it
often crashes. Resizing the map to make it bigger vertically seems to
fix it. The weird part... VERGE uses the same size data buffer
regardless of map size. You wouldn't think making the map take up MORE
memory would fix it.. Idunno. Sometimes zones don't activate, but only
sometimes, and there's no pattern to it. Sometimes the menu system
behave in irrational matters, but again, there's no tangible pattern,
and debugging is virtually impossible. One thing that's come to mind:
I have no formal C training. I compile my code with -w .. without it,
a pretty decent amount of warnings are generated. Could this be a
source of the seeming unstable code generated? The thing is, most of
these warnings are stupid things like:

char *ptr;
ptr=malloc(65000);

instead of ptr = (char *) malloc(65000); This seems to make no
functional difference.. I'm just wondering if a buildup of these
warnings could give me unstable code (BTW, I only recently learned of
how to fix that particular warning.. there are a lot of other warnings
that I'm not even sure how to fix.. all of them seem to be stupid if
you ask me, but then, I'm an Assembly diehard :).

Third, I just have a general technique question.. the issue of bitmap
transparency (transleucency) in 8 bit color... :) I've seen both
Tyrian and Allegro do it pretty well, at least, better than I'm doing
it, so I know it's possible... I believe it's just that my
best-match-picker routine isn't as good as it could be, I think my
actual routines for color mixing are fine. I intended to look at the
Allegro source, but 1) I do want to write most of the code myself,
instead of just ripping off other people's code, when possible, and 2)
I simply can't find the code dealing with the color mixing in
Allegro's source code. :) Shawn is an amazing programmer, but the one
complaint I have about Allegro is that it's too much of a 'do
everything' lib, which wouldn't be that bad except that it's so
integrated that you can't really seperate any one, seemingly simply
routine from the rest of the library. :)

Also, I am not using the weighted color values.. I know that you're
supposed to weight certain colors more than others, and I tried that,
and all it did was make everything look more green. :).

Anyhow, I'm sure I look like a real idiot right about now with this
post.... so, go actually check out VERGE at
http://www.inlink.com/~vecna/crs.html to see that I may be an idiot,
but I've miraculously put together a pretty decent program with some,
IMHO, pretty snazzy features. Thanks in advance for any help... :)

- vecna AT inlink DOT com