From: "A. Sinan Unur" Newsgroups: comp.os.msdos.djgpp Subject: Re: Newbie Needs Help Date: Sat, 29 Nov 1997 08:47:44 -0500 Organization: Cornell University (http://www.cornell.edu/) Lines: 147 Sender: asu1 AT cornell DOT edu (Verified) Message-ID: <34801D00.4AE9CC56@cornell.edu> References: <19971129 DOT 101142 DOT 16006 DOT 1 DOT matthew DOT krause AT juno DOT com> <65n24n$q6h AT bgtnsc03 DOT worldnet DOT att DOT net> <34815eb2 DOT 62354295 AT news DOT cableol DOT co DOT uk> NNTP-Posting-Host: cu-dialup-0091.cit.cornell.edu Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Precedence: bulk Tom Robinson wrote: > Quick note: > > _farpokeb(_dos_ds, 0xA0000+(y<<8)+(y<<6)+x, c); > > is a faster way of doing it... i wouldn't try to outsmart the compiler's optimization on such a mundane operation as multiplication. did you ever compile the two versions with optimizations turned on and look at the assembler output? i recommend doing it. > > >That requires an odd header file, it might be , but you > >should look it up in LIBC.INF as well. That's the second fastest way > > I know how to. (The fastest way I know how, requires disabling > > protected memory, and is more complicated to get started) please mention the drawbacks of using near pointers, too. remember, you are supposedly helping a newbie. let people learn the proper (in a lot of environments), more portale way of doing things first. take a look at the test program below. the results when i run it under windows 95 on dx-4/75 with 16 mb (with recompilations in betweeen tests to get rid of the contents of the cpu cache) are: multiply shift ----------------------------------------- with no optimization: | 127 129 with -O | 46 47 with -O2 | 56 57 with -O3 | 19 22 - times are measure using time() - 1000x320x200 putpixel operations - ATI Mach 64 CX with 2Mb DRAM now, i also tried switching around the calls to the two versions. the results were unchanged. i do not claim to this to be the perfect test or anything. i am just unable to see an overwhelming advantage to using shifts as opposed to a straightforward multiply which also makes sense to a newbie who just knows that the screen resolution is 320x200. quite clearly, fruitful optimizations in putpixel routines lie in judicuous use of _farns functions, converting 8-bit writes to 32-bit write where appropriate, and above all, using movedata for 'blits'. for example, after changing f1 and f1 in the following test routine to _farns equivalents yielded between 25% to 30% speed increases in the no optimization and O3 cases. again, i am not claiming that the numbers are extremely accurate or anything, but they give information on relative magnitudes. my sole point is that providing this "great" optimization to a newbie is counter productive. the only real optimization method i know of is think, measure, think, measure ... finally, moving the calculation of the index to the buffer to the outer loop rather than leaving it in the function call caused a 25% speed-up in no optimization, and 5% speed-up in the O3 version. here is the test (in my case it was compiled with gcc xp.c -o xp.exe -Wall -DITERATIONS=1000 and the appropriate optimization switch) /* the following code is for informational purposes * you can do whatever you want with it, so long as * you understand that there are no explicit or * implicit warranties. if you fry your computer * while running it, you are on your own. */ #include #include #include #include #include #include #include #define XRES 320 #define YRES 200 #define VGA13 0x13 #define CO80 0x03 void f1(int x, int y, unsigned char c) { _farpokeb(_dos_ds, 0xa0000 + 320*y + x, c); return; } void f2(int x, int y, unsigned char c) { _farpokeb(_dos_ds, 0xa0000 + (y<<8)+(y<<6)+x, c); return; } int set_gr_mode(int mode) { __dpmi_regs r; memset(&r, 0, sizeof(r)); r.x.ax = mode; return ( __dpmi_int(0x10, &r) ? -1 : mode ); } unsigned char s[YRES*XRES]; int main(void) { int i, x, y; time_t t1, t2; set_gr_mode(VGA13); for(y=0; y