Mail Archives: djgpp/2002/02/11/10:59:34
Eli Zaretskii wrote:
>
> On Mon, 11 Feb 2002, CBFalconer wrote:
>
> > gcc 2.953.
>
> I'm not sure -a is supported. In fact, I'm not sure -a is supported in
> _any_ DJGPP port of any GCC version.
>
> > Lots of time, see below. At least until I can get a
> > detailed look inside free.
>
> Well, I'm not sure: the 1.17 second taken by `free' is only about 20
> clock ticks, so you are still pretty much on the clock granularity
> level.
>
> > I suspect free is talking back to the dpmi memory manager.
>
> You have the source, and it should show that `free' doesn't issue any
> system calls (because DJGPP programs never release any memory to the DPMI
> server). So I think the calls to __dpmi_int are irrelevant to what
> `free' does, it's just that your program calls some system services
> elsewhere, and the profiling code catches some of these calls when it
> randomly probes the EIP at each clock tick.
>
> I think if the main loop of your program, the one that allocates and
> frees objects, runs longer, the percent of time spent inside __dpmi_int
> will go down (assuming it's more-or-less constant overhead during program
> initialization and shut-down).
>
> `free' calls 2 subroutines (`merge' and `b2bucket'), but they are both
> inlined. I'd suggest to edit malloc.c, delete "inline" from these two
> functions' declarations, and then rebuild and rerun the test program
> again. Perhaps that will show that one of them takes most of the time.
I made the makefile do a "-Dinline= " and the areas showed up.
Now, with a more detailed look, I turned up the time and got:
Flat profile:
Each sample counts as 0.0555556 seconds.
% cumulative self self total
time seconds seconds calls ns/call ns/call name
82.04 9.39 9.39 30001 312952.53 314921.80 merge
4.37 9.89 0.50 __dpmi_int
2.43 10.17 0.28 30066 9238.93 10851.34 malloc
2.43 10.44 0.28 inserted
1.46 10.61 0.17 mcount
0.97 10.72 0.11 53249 2086.63 2086.63 size2bucket
0.97 10.83 0.11 30015 3701.85 320446.04 free
0.97 10.94 0.11 putintbl
0.97 11.06 0.11 t1cmp
0.49 11.11 0.06 60016 925.68 1969.27 b2bucket
0.49 11.17 0.06 dotest4
0.49 11.22 0.06 getcounts
0.49 11.28 0.06 hshkill
0.49 11.33 0.06 randomMT
0.49 11.39 0.06 reorganize
0.49 11.44 0.06 t1undupe
0.00 11.44 0.00 6834 0.00 0.00 split_block
which puts the problem (going by name) precisely where it was
expected. Next I shall take a look at the code in merge, where
this seems to be the O(N*N) coding:
> bpp = freelist+bu;
> for (bp=freelist[bu]; bp; bpp=&(bp->next), bp=bp->next)
> {
> #if DEBUG
> printf(" %08x", bp);
> #endif
> if (bp == c)
> {
> #if DEBUG
> printf("\n snipping %u/%08x from freelist[%d]\n", bp->size, bp, bu);
> #endif
> *bpp = bp->next;
> break;
> }
> }
> CHECK(c);
This could probably be obviated at the application level by
freeing in the order in which things are malloced, but that is
onerous. I still don't know how you are organizing the free list,
but in principle the only checks needed for merging are against
the blocks immediately before and after the block being freed. If
something identifies such blocks as being free or not, independant
of the free list, this could be made linear. This implies that
all blocks contain fields for next, size, and one bit for flags.
I think it is inconceivable that size will ever exceed 32 bits, so
the free flag could be packed into the size bit of flags. If the
free list is maintained in order of memory location, there only
need be one walk of it, and the free flag is not needed. Such
maintenance again requires a single walk of the free list for each
free.
Where does malloc, realloc get the memory to manage, or what
describes it to the package?
--
Chuck F (cbfalconer AT yahoo DOT com) (cbfalconer AT XXXXworldnet DOT att DOT net)
Available for consulting/temporary embedded and systems.
(Remove "XXXX" from reply address. yahoo works unmodified)
mailto:uce AT ftc DOT gov (for spambots to harvest)
- Raw text -