From: Tal Lavi Newsgroups: comp.os.msdos.djgpp Subject: Re: MAJOR slowdowns in translating TP7 gfx code to DJGPP2: Suplement Date: Tue, 08 Sep 1998 17:41:06 -0700 Organization: Tel Aviv University Lines: 124 Message-ID: <35F5CEA2.15AF@post.tau.ac.il> References: NNTP-Posting-Host: slip-103.tau.ac.il Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Precedence: bulk Eli Zaretskii wrote: > > On Tue, 8 Sep 1998, Tal Lavi wrote: > > > The profiler says that __dpmi_int was used allot, but i swear I haven't > > put even one in the actual run-time part! > > Read section 13.5 of the DJGPP FAQ list, it explains that `__dpmi_int' is > called by every low-level library function that calls either DOS or > BIOS. So if you use, e.g., `printf' or `getc', `__dpmi_int' gets linked > into your program and gets called. well, you were right. making the running program to run by itself, without human interaction did all the difference. The __dpmi_int runtime gone down 20% ! > > > Alas, the profiler won't tell me from which function was the > > _dpmi_int(s) was called. > > Does anyone knows whether getch() or _far* calls __dpmi_int? > > Use the sources. v2/djlsr201.zip is the file with all the library > sources. Download it, and you can answer such questions yourself. > > _far* functions never call any other functions, they expand into 2-3 > inline assembly instructions that access a memory address (see > header file, it's all there). > > `getch' calls DOS, so it calls `__dpmi_int', as explained above. If your > program calls `getch' to read user's input, your profile will be totally > skewed because a normal human reaction to interactive prompts is so slow > that it will shadow the time spent in other functions. Replace `getch' > with a stub that feeds the program with some input, and then profile it > again. Only then will you see the real picture. > > > And yet another thing, the div function in stdlib.h takes allot of > > computation time too! I only do two division with it per loop cicle!!! > > what's wrong with that picture?!? > > How many times does `div' get called (it's in the profile)? Post here > how much time PER CALL does `div' take, and then we can discuss is > something's wrong with that. For all I know, you could call it > gazillions of times. You were right about another thing too, it isn't the screen writing(_far*) that is slow, it's the CastRay routine, that seem to be running so slow because of the four little integer divs... This is the thing that still puzzles me(and apparantly, you too). look at it! 70%(!?!) of the running time! As you see, I can't know the average running time. Am I using the Profiler wrong? It seems that every function that I did not implemented myself(div, __dpmi_int, and cos too) is not being profiled completely. Even though, I know that div is only called from CastRay, four times per call. That's a bit more then 50,000 calls. not THAT much to ask from a pentium 166. % cumulative self self total time seconds second calls us/call us/call name 69.81 2.06 2.06 div 18.87 2.61 0.56 12800 43.40 43.40 CastRay 3.77 2.72 0.11 20 5555.56 5555.56 FlushPage 3.77 2.83 0.11 __dpmi_int > > Why do you need `div', anyway? Are you sure you can get away with simple > 32-bit division? I need 'div' so I could calculate the quot AND rem at once, since I need them both. > > > div is a integer based division, right? > > Yes. > > > then why does it take over 40% of the computation time? > > The interesting thing is how much time per call does it take. And then > we need to know what kind of CPU do you have. As i said, pentium 166. > > > I'de like to try to inline my putpixel routine myself, instead of using > > _far* but I can't get it to work! > > This is dead end. _farptr functions are already written in inline > assembly, and they are as fast as you can get (you *did* compile with -O2, > did you?), so you won't find any faster way of doing that part. _farptr > is NOT your problem, look for the reasons of the slow-down elsewhere. I tried the -O2 before, but i havn't seen any differnce (probably because of the div slowing everything down). I don't usually trust a compiler to make my program faster. Is that thing safe, anyway? Even though the main slowdown is not in the _far*, inlining the memory writing myself, will make things easier for the compiler, and will eliminate any chance for error. Besides the stupid div, I could use some optimization with the FlushPage routine that fills the screen to a certain color in 640x480x64K mode. Any sugestions? void FlushPage(unsigned short C) { unsigned long i; _farsetsel(LFBSelector[ScreenNum]); for(i=0;i<614400;i+=2) _farnspokew(i,C); } where LFBSelector is array of two unsigned shorts that contains the values of the two screen selectors, and ScreenNum is a unsigned char that contains the value of the current screen being written.