From: broeker AT acp3bf DOT knirsch DOT de (Hans-Bernhard Broeker) Newsgroups: comp.os.msdos.djgpp Subject: Re: Plague of the slow 'blit' routine :) Date: 21 Oct 1999 16:32:27 +0200 Organization: RWTH Aachen, III. physikalisches Institut B Lines: 74 Message-ID: <7un85r$2jm@acp3bf.knirsch.de> References: <380E497E DOT 273EE838 AT connect DOT ab DOT ca> NNTP-Posting-Host: acp3bf.physik.rwth-aachen.de X-Trace: nets3.rz.RWTH-Aachen.DE 940516351 16760 137.226.32.75 (21 Oct 1999 14:32:31 GMT) X-Complaints-To: abuse AT rwth-aachen DOT de NNTP-Posting-Date: 21 Oct 1999 14:32:31 GMT Cc: djgpp AT delorie DOT com X-Newsreader: TIN [version 1.2 PL2] To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com Tom Fjellstrom (tomcf AT connect DOT ab DOT ca) wrote: > I've been playing around with mode 13h, and the > only thing I've been really stuck on is my 'blit' > routine. Suffice it to say it is extremely slow. How slow is 'extremely'? Figures in MByte/second, or equivalently in blits/second (with a given size of blitted image), if possible, please. > It may have to do with '__djgpp_nearptr_enable()'ing > before blit and '__djgpp_nearptr_disable()'ing after, > for every bitmap (but that is only required when reading > or writing to the gfx card memory. as far as i know), [...] You only mention it in passing, but let me spell out a warning to you: _reading_ from the graphics card's memory is almost always _terribly_ slow, compared to writing to it. If you do read from the graphics card, ever, that may easily be the reason why things are so slow, for you. Avoid blitting 'from' screen using the CPU, at all costs. The only partly efficient way to implement blitting from screen to screen memory would be letting the graphics accelerator hardware do it for you. > #define put_pixel(bmp,x,y,c) (bmp)->dat[((y) << 8) + ((y) << 6) + (x)] = > (c) That put_pixel() of yours tries to be smart about multiplying y by 320. Well --- it isn't even as smart as gcc would have been, on its own. Let's face it: gcc is rather hard to outsmart by mere mortals, already, so you might just as well write _clean_ code, and leave the being smart to gcc, for starters. If you find slow spots, later on, you can still come back and try to beat gcc. > typedef struct BITMAP { > int w,h; > /* should add clip rect. */ > char *dat; > } BITMAP; > void blit(BITMAP *src, BITMAP *dest, int srcw, int srch, int x, int y) > { > register int i,j; > if(!src || !dest || !srcw || !srch) return; > if((x+srcw>dest->w) || (y+srch>dest->h)) return; > ENABLE(); > for(i=0; i!=srch+1; i++) { > if(y+i>dest->h) break; Repeating this test every time the loops repeats is very inefficient. Predetermine the real maximum i my reach, before entering the loop, instead. > for(j=0; j!=srcw+1; j++) { > if((x+j) > dest->w) break; Same as with the test above. > put_pixel(dest, j, i, src->dat[(i << 8) + (i << 6) + j]); Code duplication. You've written the same access calculation twice. You should consider having a 'pixel_addr(bitmap, x, y)' macro (or inline function) instead, which you could then use as pixel_addr(dest, j, i) = pixel_addr(src, i, j); As a general rule: check out each and every statement in the innermost loops of your program: is it *really* needed to be run each time this loop loops around? -- Hans-Bernhard Broeker (broeker AT physik DOT rwth-aachen DOT de) Even if all the snow were burnt, ashes would remain.