delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1998/09/08/12:01:49

From: Tal Lavi <ranla AT post DOT tau DOT ac DOT il>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: MAJOR slowdowns in translating TP7 gfx code to DJGPP2: Suplement
Date: Tue, 08 Sep 1998 17:41:06 -0700
Organization: Tel Aviv University
Lines: 124
Message-ID: <35F5CEA2.15AF@post.tau.ac.il>
References: <Pine DOT SUN DOT 3 DOT 91 DOT 980908154235 DOT 28002B-100000 AT is>
NNTP-Posting-Host: slip-103.tau.ac.il
Mime-Version: 1.0
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

Eli Zaretskii wrote:
> 
> On Tue, 8 Sep 1998, Tal Lavi wrote:
> 
> > The profiler says that __dpmi_int was used allot, but i swear I haven't
> > put even one in the actual run-time part!
> 
> Read section 13.5 of the DJGPP FAQ list, it explains that `__dpmi_int' is
> called by every low-level library function that calls either DOS or
> BIOS.  So if you use, e.g., `printf' or `getc', `__dpmi_int' gets linked
> into your program and gets called.

well, you were right. making the running program to run by itself,
without human
interaction did all the difference. The __dpmi_int runtime gone down 20%
!

> 
> > Alas, the profiler won't tell me from which function was the
> > _dpmi_int(s) was called.
> > Does anyone knows whether getch() or _far* calls __dpmi_int?
> 
> Use the sources.  v2/djlsr201.zip is the file with all the library
> sources.  Download it, and you can answer such questions yourself.
> 
> _far* functions never call any other functions, they expand into 2-3
> inline assembly instructions that access a memory address (see
> <sys/farptr.h> header file, it's all there).
> 
> `getch' calls DOS, so it calls `__dpmi_int', as explained above.  If your
> program calls `getch' to read user's input, your profile will be totally
> skewed because a normal human reaction to interactive prompts is so slow
> that it will shadow the time spent in other functions.  Replace `getch'
> with a stub that feeds the program with some input, and then profile it
> again.  Only then will you see the real picture.
> 
> > And yet another thing, the div function in stdlib.h takes allot of
> > computation time too! I only do two division with it per loop cicle!!!
> > what's wrong with that picture?!?
> 
> How many times does `div' get called (it's in the profile)?  Post here
> how much time PER CALL does `div' take, and then we can discuss is
> something's wrong with that.  For all I know, you could call it
> gazillions of times.

You were right about another thing too, it isn't the screen
writing(_far*) 
that is slow, it's the CastRay routine, that seem to be running so slow 
because of the four little integer divs...
This is the thing that still puzzles me(and apparantly, you too).
look at it! 70%(!?!) of the running time!
As you see, I can't know the average running time. Am I using the
Profiler 
wrong? It seems that every function that I did not implemented
myself(div, __dpmi_int, and cos too) is not being profiled completely.
Even though, I know that div is only called from CastRay, four times per
call.
That's a bit more then 50,000 calls. not THAT much to ask from a pentium
166.


 %   cumulative   self             self     total    
time   seconds   second    calls  us/call  us/call  name
69.81     2.06     2.06                             div
18.87     2.61     0.56    12800    43.40    43.40  CastRay
3.77      2.72     0.11       20  5555.56  5555.56  FlushPage
3.77      2.83     0.11                             __dpmi_int

> 
> Why do you need `div', anyway?  Are you sure you can get away with simple
> 32-bit division?

I need 'div' so I could calculate the quot AND rem at once, since I need
them both.

> 
> > div is a integer based division, right?
> 
> Yes.
> 
> > then why does it take over 40% of the computation time?
> 
> The interesting thing is how much time per call does it take.  And then
> we need to know what kind of CPU do you have.

As i said, pentium 166.
> 
> > I'de like to try to inline my putpixel routine myself, instead of using
> > _far* but I can't get it to work!
> 
> This is dead end.  _farptr functions are already written in inline
> assembly, and they are as fast as you can get (you *did* compile with -O2,
> did you?), so you won't find any faster way of doing that part.  _farptr
> is NOT your problem, look for the reasons of the slow-down elsewhere.

I tried the -O2 before, but i havn't seen any differnce (probably
because 
of the div slowing everything down).
I don't usually trust a compiler to make my program faster. Is that
thing
safe, anyway?

Even though the main slowdown is not in the _far*, inlining the memory
writing myself,
will make things easier for the compiler, and will eliminate any chance
for error.

Besides the stupid div, I could use some optimization with the FlushPage 
routine that fills the screen to a certain color in 640x480x64K mode. 
Any sugestions?

	 void FlushPage(unsigned short C)
	 {
	   unsigned long i;
	   _farsetsel(LFBSelector[ScreenNum]);
	   for(i=0;i<614400;i+=2)
	     _farnspokew(i,C);
	 }

where LFBSelector is array of two unsigned shorts that contains the
values
of the two screen selectors, and ScreenNum is a unsigned char that
contains
the value of the current screen being written.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019