Mail Archives: djgpp/1996/12/22/11:01:28
On Fri, 20 Dec 1996, Jordy Potman wrote:
> I've tried to reproduce the problem in a smaller program but haven't
> succeeded yet. Small programs with only one simple function that is called
> repeatedly (my program calls the main rendering function repeatedly)
> don't show the problem. I think the problem is caused by repeatedly calling
> a lot of functions (in my main rendering function there are a lot of calls
> to other functions). This is not very easy to reproduce in a smaller
> program.
One situation where this might happen is if the code fits into the CPU
cache when compiled with v2.0, but is too large under v2.01, and so cache
thrashing happens. I'll admit I've never seen this happen except in test
cases specifically designed to exhibit the problem, but it might just be
the case.
> I have profiled my program compiled with v2.0 and v2.01. There are some
> differences in the output of gprof.
> Profile compiled with DJGPP v2.00:
> Each sample counts as 0.055556 seconds.
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 71.05 1.50 1.50 151 9.93 9.93 BitBlt
> 10.53 1.72 0.22 151 1.47 1.47 ScanEdges
> 5.26 1.83 0.11 94475 0.00 0.00 DotProduct
> 5.26 1.94 0.11 151 0.74 13.24 UpdateWorld
> 5.26 2.06 0.11 _int86
> 2.63 2.11 0.06 13139 0.00 0.01 ClipToPlane
> 0.00 2.11 0.00 1 0.00 1999.82 main
>
> Profile compiled with DJGPP v2.01:
> Each sample counts as 0.055556 seconds.
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 33.33 0.11 0.11 18 6.17 6.17 BitBlt
> 33.33 0.22 0.11 _int86
> 16.67 0.28 0.06 883 0.06 0.06 ClipToFrustum
> 16.67 0.33 0.06 18 3.09 3.09 ScanEdges
> 0.00 0.33 0.00 11394 0.00 0.00 DotProduct
> 0.00 0.33 0.00 1592 0.00 0.00 ClipToPlane
> 0.00 0.33 0.00 18 0.00 12.34 UpdateWorld
> 0.00 0.33 0.00 1 0.00 222.20 main
I'm a bit at a loss with these profiles. First, the v2.01 profile seems
to indicate *faster* runtime, not slower: the ms/call figures are smaller
(6.17 for BitBlt in v2.01 as opposed to 9.93 in v2.0; other functions are
also consistently faster in v2.01). Second, you obviously run both
versions for a *very* different elapsed times (a factor of 9 between 222ms
for the main function in v2.01 as opposed to 2000ms in v2.0), so this is
probably the cause of the different distribution of time in percents,
since the timer granularity is too crude to correctly measure short runs.
You should run your program for much longer (say, tens or hundreds of
seconds) in order to obtain a meaningful profile.
> I can't explain why _int86 takes such a large % of time in v2.01.
I submit that this is an artefact: the time is 0.22 seconds, which is
1/9th of the 2.0 seconds in the v2.0 run, which was almost exactly 9 times
longer in elapsed numbers (or do I miss something?). So the different
percentage is most probably due to the fact that functions which take much
less time weren't sampled at all in the much shorter v2.01 run.
> The
> profile also shows that the program is a lot slower in v2.01, take a look
> at the number of calls to UpdateWorld (the main rendering function).
I don't see what bothers you here. UpdateWorld is the only function
which has almost the same time per call in both versions. Its number of
calls (18 in v2.01 as opposed to 151 in v2.0) is consistent with the fact
that the total runtime of the v2.01 version was 9 times shorter than that
of the v2.0 version.
To sum up:
1) The two versions were run for a short (and unequal) time which
might make the profile inexact. Please run both versions for much longer
times and post the profiles then.
2) If anything, the profile indicates that the v2.01 version is
*faster*, not slower. Please explain why do you think that the v2.01
version is ``much slower''.
- Raw text -