delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1997/11/29/15:02:35

From: "A. Sinan Unur" <sinan DOT unur AT cornell DOT edu>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Newbie Needs Help
Date: Sat, 29 Nov 1997 08:47:44 -0500
Organization: Cornell University (http://www.cornell.edu/)
Lines: 147
Sender: asu1 AT cornell DOT edu (Verified)
Message-ID: <34801D00.4AE9CC56@cornell.edu>
References: <19971129 DOT 101142 DOT 16006 DOT 1 DOT matthew DOT krause AT juno DOT com> <65n24n$q6h AT bgtnsc03 DOT worldnet DOT att DOT net> <34815eb2 DOT 62354295 AT news DOT cableol DOT co DOT uk>
NNTP-Posting-Host: cu-dialup-0091.cit.cornell.edu
Mime-Version: 1.0
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

Tom Robinson wrote:

> Quick note:
> 
> _farpokeb(_dos_ds, 0xA0000+(y<<8)+(y<<6)+x, c);
> 
> is a faster way of doing it...

i wouldn't try to outsmart the compiler's optimization on such a mundane
operation as multiplication. did you ever compile the two versions with
optimizations turned on and look at the assembler output? i recommend
doing it.

> 
> >That requires an odd header file, it might be <sys\farptr.h>, but you
> >should look it up in LIBC.INF as well.  That's the second fastest way 
> > I know how to.  (The fastest way I know how, requires disabling 
> > protected memory, and is more complicated to get started)

please mention the drawbacks of using near pointers, too. remember, you
are supposedly helping a newbie. let people learn the proper (in a lot
of environments), more portale way of doing things first.

take a look at the test program below. the results when i run it under
windows 95 on dx-4/75 with 16 mb (with recompilations in betweeen tests
to get rid of the contents of the cpu cache) are:

                         multiply   shift
-----------------------------------------
with no optimization:  |   127       129
with -O                |    46        47
with -O2               |    56        57
with -O3               |    19        22

  - times are measure using time()
  - 1000x320x200 putpixel operations
  - ATI Mach 64 CX with 2Mb DRAM

now, i also tried switching around the calls to the two versions. the
results were unchanged.

i do not claim to this to be the perfect test or anything. i am just
unable to see an overwhelming advantage to using shifts as opposed to a
straightforward multiply which also makes sense to a newbie who just
knows that the screen resolution is 320x200.

quite clearly, fruitful optimizations in putpixel routines lie in
judicuous use of _farns functions, converting 8-bit writes to 32-bit
write where appropriate, and above all, using movedata for 'blits'. for
example, after changing f1 and f1 in the following test routine to
_farns equivalents yielded between 25% to 30% speed increases in the no
optimization and O3 cases. again, i am not claiming that the numbers are
extremely accurate or anything, but they give information on relative
magnitudes. my sole point is that providing this "great" optimization to
a newbie is counter productive. the only real optimization method i know
of is think, measure, think, measure ...

finally, moving the calculation of the index to the buffer to the outer
loop rather than leaving it in the function call caused a 25% speed-up
in no optimization, and 5% speed-up in the O3 version.

here is the test (in my case it was compiled with
gcc xp.c -o xp.exe -Wall -DITERATIONS=1000
and the appropriate optimization switch)

/* the following code is for informational purposes
 * you can do whatever you want with it, so long as
 * you understand that there are no explicit or
 * implicit warranties. if you fry your computer
 * while running it, you are on your own.
 */

#include <sys/farptr.h>
#include <go32.h>
#include <dpmi.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define XRES 320
#define YRES 200
#define VGA13 0x13
#define CO80  0x03

void f1(int x, int y, unsigned char c)
{
 _farpokeb(_dos_ds, 0xa0000 + 320*y + x, c);
 return;
}

void f2(int x, int y, unsigned char c)
{
  _farpokeb(_dos_ds, 0xa0000 + (y<<8)+(y<<6)+x, c);
  return;
}

int set_gr_mode(int mode)
{
  __dpmi_regs r;
  memset(&r, 0, sizeof(r));
  r.x.ax = mode;
  return ( __dpmi_int(0x10, &r) ? -1 : mode );
}


unsigned char s[YRES*XRES];

int main(void)
{
  int i, x, y;
  time_t t1, t2;

  set_gr_mode(VGA13);

  for(y=0; y<YRES; y++)
   for(x=0; x<XRES; x++)
    s[y*YRES + x] = (unsigned char)(256.0*(rand()/(RAND_MAX+1.0)));

  time(&t1);
  for(i=0; i<ITERATIONS; i++)
   for(y=0; y<YRES; y++)
    for(x=0; x<XRES; x++)
     f1(x, y, s[y*YRES+x]);
  t1 = time(NULL) - t1;

  time(&t2);
  for(i=0; i<ITERATIONS; i++)
   for(y=0; y<YRES; y++)
    for(x=0; x<XRES; x++)
     f2(x, y, s[y*YRES+x]);

  t2 = time(NULL) - t2;

  set_gr_mode(CO80);
  printf("f1: %d\nf2: %d\n", t1, t2);

  return 0;
}
-- 
----------------------------------------------------------------------
A. Sinan Unur
Department of Policy Analysis and Management, College of Human Ecology,
Cornell University, Ithaca, NY 14853, USA

mailto:sinan DOT unur AT cornell DOT edu
http://www.people.cornell.edu/pages/asu1/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019