From: broeker AT acp3bf DOT knirsch DOT de (Hans-Bernhard Broeker)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Plague of the slow 'blit' routine :)
Date: 21 Oct 1999 16:32:27 +0200
Organization: RWTH Aachen, III. physikalisches Institut B
Lines: 74
Message-ID: <7un85r$2jm@acp3bf.knirsch.de>
References: <380E497E DOT 273EE838 AT connect DOT ab DOT ca>
NNTP-Posting-Host: acp3bf.physik.rwth-aachen.de
X-Trace: nets3.rz.RWTH-Aachen.DE 940516351 16760 137.226.32.75 (21 Oct 1999 14:32:31 GMT)
X-Complaints-To: abuse AT rwth-aachen DOT de
NNTP-Posting-Date: 21 Oct 1999 14:32:31 GMT
Cc: djgpp AT delorie DOT com
X-Newsreader: TIN [version 1.2 PL2]
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

Tom Fjellstrom (tomcf AT connect DOT ab DOT ca) wrote:
> I've been playing around with mode 13h, and the
> only thing I've been really stuck on is my 'blit'
> routine. Suffice it to say it is extremely slow.

How slow is 'extremely'? Figures in MByte/second, or equivalently in
blits/second (with a given size of blitted image), if possible,
please.

> It may have to do with '__djgpp_nearptr_enable()'ing
> before blit and '__djgpp_nearptr_disable()'ing after,
> for every bitmap (but that is only required when reading
> or writing to the gfx card memory. as far as i know),
[...]

You only mention it in passing, but let me spell out a warning to you:
_reading_ from the graphics card's memory is almost always _terribly_
slow, compared to writing to it. If you do read from the graphics
card, ever, that may easily be the reason why things are so slow, for
you. Avoid blitting 'from' screen using the CPU, at all costs. The
only partly efficient way to implement blitting from screen to screen
memory would be letting the graphics accelerator hardware do it for
you.

> #define put_pixel(bmp,x,y,c) (bmp)->dat[((y) << 8) + ((y) << 6) + (x)] =
> (c)

That put_pixel() of yours tries to be smart about multiplying y by
320. Well --- it isn't even as smart as gcc would have been, on its
own. Let's face it: gcc is rather hard to outsmart by mere mortals,
already, so you might just as well write _clean_ code, and leave the
being smart to gcc, for starters. If you find slow spots, later on,
you can still come back and try to beat gcc.

> typedef struct BITMAP {
>  int w,h;
>  /* should add clip rect. */
>  char *dat;
> } BITMAP;

> void blit(BITMAP *src, BITMAP *dest, int srcw, int srch, int x, int y)
> {
>  register int i,j;
>  if(!src || !dest || !srcw || !srch) return;
>  if((x+srcw>dest->w) || (y+srch>dest->h)) return;

>  ENABLE();

>  for(i=0; i!=srch+1; i++) {
>   if(y+i>dest->h) break;

Repeating this test every time the loops repeats is very inefficient.
Predetermine the real maximum i my reach, before entering the loop,
instead.

>   for(j=0; j!=srcw+1; j++) {
>    if((x+j) > dest->w) break;

Same as with the test above.

>    put_pixel(dest, j, i, src->dat[(i << 8) + (i << 6) + j]);

Code duplication. You've written the same access calculation twice.
You should consider having a 'pixel_addr(bitmap, x, y)' macro (or
inline function) instead, which you could then use as

	pixel_addr(dest, j, i) = pixel_addr(src, i, j);

As a general rule: check out each and every statement in the innermost
loops of your program: is it *really* needed to be run each time this
loop loops around?
-- 
Hans-Bernhard Broeker (broeker AT physik DOT rwth-aachen DOT de)
Even if all the snow were burnt, ashes would remain.