From: vecna AT inlink DOT com ([vecna]) Subject: Optimization and bug smashing.. a lot of other questions too :) Date: Mon, 11 Aug 1997 00:20:11 GMT Message-ID: <33ee3f7f.4973504@news.inlink.com> Newsgroups: comp.os.msdos.djgpp,rec.games.programmer Lines: 216 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Precedence: bulk Hello, I know that crossposting is the spawn of satan, but I think both of these groups are relevant... :) Basically, I'm the programmer of VERGE, a console-style RPG construction kit. It have a ton of new features I'd love to add, but I have three main issues that I need to ask for help on :) (i imagine this will be a tad long post) First, optimization. gprof now seems to give totally random results each time I run it on VERGE now, but once upon a time when gprof worked, it told me that the biggest routines I should focus on optimizing are my tcopysprite() routine and my vgadump() routine. First of all, my scrolling technique is to used an oversized drawing buffer and just draw the inside window rather than use a clipping tilecopier. Not only does it seem easier, but I'd think it'd be faster as well, but then I'm not really sure. Anyhow, without further ado, here's some source: This is the routine that blits the 352x232 virtual screen to the 320x200 real screen. This is one of those 'every cycle counts' routines. Any optimization hints would be appreciated. vgadump() { if (waitvrt) wait(); asm("movl _virscr, %%esi \n\t" "addl $5648, %%esi \n\t" "movl _screen, %%edi \n\t" "xorl %%eax, %%eax \n\t" "lineloop: \n\t" "movl $80, %%ecx \n\t" "rep \n\t" "movsl \n\t" "addl $32, %%esi \n\t" "incl %%eax \n\t" "cmpl $201, %%eax \n\t" "jb lineloop \n\t" : : : "esi", "edi", "cc", "eax", "ecx"); } This is the routine used for non-transparent 16x16 tiles in the background. It's pretty fast already and isn't that big of an optimization deal, but still, any optimization hints would be great. copytile(int x, int y, char *spr) { asm("movl $16, %%edx \n\t" "movl %2, %%esi \n\t" "movl %1, %%eax \n\t" "imul $352, %%eax \n\t" "addl %0, %%eax \n\t" "addl _virscr, %%eax \n\t" "movl %%eax, %%edi \n\t" "ctl0: \n\t" "movsl \n\t" "movsl \n\t" "movsl \n\t" "movsl \n\t" "addl $336, %%edi \n\t" "decl %%edx \n\t" "jnz ctl0 \n\t" : : "m" (x), "m" (y), "m" (spr) : "eax","edx","esi","edi","ecx","cc" ); } Okay, this is the single most important routine to optimize. It's the transparent blitter. It's very important to optimize already, and it will get VERY important to optimize in the next version... EVERY CYCLE COUNTS in this one. tcopysprite(int x, int y, int width, int height, char *spr) { asm("movl %3, %%ecx \n\t" "movl %4, %%esi \n\t" "tcsl0: \n\t" "movl %1, %%eax \n\t" "imul $352, %%eax \n\t" "addl %0, %%eax \n\t" "addl _virscr, %%eax \n\t" "movl %%eax, %%edi \n\t" "movl %2, %%edx \n\t" "drawloop: \n\t" "lodsb \n\t" "orb %%al, %%al \n\t" "jz nodraw \n\t" "stosb \n\t" "decl %%edx \n\t" "orl %%edx, %%edx \n\t" "jz endline \n\t" "jmp drawloop \n\t" "nodraw: \n\t" "incl %%edi \n\t" "decl %%edx \n\t" "orl %%edx, %%edx \n\t" "jnz drawloop \n\t" "endline: \n\t" "incl %1 \n\t" "decl %%ecx \n\t" "jnz tcsl0 \n\t" : : "m" (x), "m" (y), "m" (width), "m" (height), "m" (spr) : "eax","edx","esi","edi","ecx","cc" ); } Okay, that's all the optimization I'll bug anyone about now... :) Second, one thing that's hindered VERGE's progress is that it has some unfortunate bugs, tho. Too many small, extremely bizarre ones to list, really... I sincerely doubt that it's a compiler error, particularly because using optimizations or not makes no difference. But, I have become rather convinced that I have some underlying culprit making otherwise working code turn evil on me. :) For example, one of my bug is in my VSP loading code (where the tile data is stored), particularly dealing with the tile animation data. The problem is that the very first map loaded, no matter what map it is, will not animate at all, but the second map will. So, you could start out in a town and animations wouldn't work, walk into the overworld, and animations would work, and back into the town with the animations working now. Now, I created a workaround that simply loaded the animation data a second time, but all indications pointed to the fread simply NOT working the first time. The thing that really miffed me is that if you single-step through the code in RHIDE's debugger, the code DOES work. Is this any indication of anything specific to check for? The code is so simple, too, I don't see where it could be bugged. Note that this bug is not on the top of my hitlist, it's just an example of weirdness. Another bug is that when you near the bottom of a map, it often crashes. Resizing the map to make it bigger vertically seems to fix it. The weird part... VERGE uses the same size data buffer regardless of map size. You wouldn't think making the map take up MORE memory would fix it.. Idunno. Sometimes zones don't activate, but only sometimes, and there's no pattern to it. Sometimes the menu system behave in irrational matters, but again, there's no tangible pattern, and debugging is virtually impossible. One thing that's come to mind: I have no formal C training. I compile my code with -w .. without it, a pretty decent amount of warnings are generated. Could this be a source of the seeming unstable code generated? The thing is, most of these warnings are stupid things like: char *ptr; ptr=malloc(65000); instead of ptr = (char *) malloc(65000); This seems to make no functional difference.. I'm just wondering if a buildup of these warnings could give me unstable code (BTW, I only recently learned of how to fix that particular warning.. there are a lot of other warnings that I'm not even sure how to fix.. all of them seem to be stupid if you ask me, but then, I'm an Assembly diehard :). Third, I just have a general technique question.. the issue of bitmap transparency (transleucency) in 8 bit color... :) I've seen both Tyrian and Allegro do it pretty well, at least, better than I'm doing it, so I know it's possible... I believe it's just that my best-match-picker routine isn't as good as it could be, I think my actual routines for color mixing are fine. I intended to look at the Allegro source, but 1) I do want to write most of the code myself, instead of just ripping off other people's code, when possible, and 2) I simply can't find the code dealing with the color mixing in Allegro's source code. :) Shawn is an amazing programmer, but the one complaint I have about Allegro is that it's too much of a 'do everything' lib, which wouldn't be that bad except that it's so integrated that you can't really seperate any one, seemingly simply routine from the rest of the library. :) Also, I am not using the weighted color values.. I know that you're supposed to weight certain colors more than others, and I tried that, and all it did was make everything look more green. :). Anyhow, I'm sure I look like a real idiot right about now with this post.... so, go actually check out VERGE at http://www.inlink.com/~vecna/crs.html to see that I may be an idiot, but I've miraculously put together a pretty decent program with some, IMHO, pretty snazzy features. Thanks in advance for any help... :) - vecna AT inlink DOT com