X-Authentication-Warning: delorie.com: mailnull set sender to djgpp-bounces using -f Message-ID: <3C67DE6D.DEE8305C@yahoo.com> From: CBFalconer Organization: Ched Research X-Mailer: Mozilla 4.75 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 Newsgroups: comp.os.msdos.djgpp Subject: Re: Alignment problem References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Lines: 106 Date: Mon, 11 Feb 2002 15:10:03 GMT NNTP-Posting-Host: 12.90.176.218 X-Complaints-To: abuse AT worldnet DOT att DOT net X-Trace: bgtnsc05-news.ops.worldnet.att.net 1013440203 12.90.176.218 (Mon, 11 Feb 2002 15:10:03 GMT) NNTP-Posting-Date: Mon, 11 Feb 2002 15:10:03 GMT To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com Eli Zaretskii wrote: > > On Mon, 11 Feb 2002, CBFalconer wrote: > > > gcc 2.953. > > I'm not sure -a is supported. In fact, I'm not sure -a is supported in > _any_ DJGPP port of any GCC version. > > > Lots of time, see below. At least until I can get a > > detailed look inside free. > > Well, I'm not sure: the 1.17 second taken by `free' is only about 20 > clock ticks, so you are still pretty much on the clock granularity > level. > > > I suspect free is talking back to the dpmi memory manager. > > You have the source, and it should show that `free' doesn't issue any > system calls (because DJGPP programs never release any memory to the DPMI > server). So I think the calls to __dpmi_int are irrelevant to what > `free' does, it's just that your program calls some system services > elsewhere, and the profiling code catches some of these calls when it > randomly probes the EIP at each clock tick. > > I think if the main loop of your program, the one that allocates and > frees objects, runs longer, the percent of time spent inside __dpmi_int > will go down (assuming it's more-or-less constant overhead during program > initialization and shut-down). > > `free' calls 2 subroutines (`merge' and `b2bucket'), but they are both > inlined. I'd suggest to edit malloc.c, delete "inline" from these two > functions' declarations, and then rebuild and rerun the test program > again. Perhaps that will show that one of them takes most of the time. I made the makefile do a "-Dinline= " and the areas showed up. Now, with a more detailed look, I turned up the time and got: Flat profile: Each sample counts as 0.0555556 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 82.04 9.39 9.39 30001 312952.53 314921.80 merge 4.37 9.89 0.50 __dpmi_int 2.43 10.17 0.28 30066 9238.93 10851.34 malloc 2.43 10.44 0.28 inserted 1.46 10.61 0.17 mcount 0.97 10.72 0.11 53249 2086.63 2086.63 size2bucket 0.97 10.83 0.11 30015 3701.85 320446.04 free 0.97 10.94 0.11 putintbl 0.97 11.06 0.11 t1cmp 0.49 11.11 0.06 60016 925.68 1969.27 b2bucket 0.49 11.17 0.06 dotest4 0.49 11.22 0.06 getcounts 0.49 11.28 0.06 hshkill 0.49 11.33 0.06 randomMT 0.49 11.39 0.06 reorganize 0.49 11.44 0.06 t1undupe 0.00 11.44 0.00 6834 0.00 0.00 split_block which puts the problem (going by name) precisely where it was expected. Next I shall take a look at the code in merge, where this seems to be the O(N*N) coding: > bpp = freelist+bu; > for (bp=freelist[bu]; bp; bpp=&(bp->next), bp=bp->next) > { > #if DEBUG > printf(" %08x", bp); > #endif > if (bp == c) > { > #if DEBUG > printf("\n snipping %u/%08x from freelist[%d]\n", bp->size, bp, bu); > #endif > *bpp = bp->next; > break; > } > } > CHECK(c); This could probably be obviated at the application level by freeing in the order in which things are malloced, but that is onerous. I still don't know how you are organizing the free list, but in principle the only checks needed for merging are against the blocks immediately before and after the block being freed. If something identifies such blocks as being free or not, independant of the free list, this could be made linear. This implies that all blocks contain fields for next, size, and one bit for flags. I think it is inconceivable that size will ever exceed 32 bits, so the free flag could be packed into the size bit of flags. If the free list is maintained in order of memory location, there only need be one walk of it, and the free flag is not needed. Such maintenance again requires a single walk of the free list for each free. Where does malloc, realloc get the memory to manage, or what describes it to the package? -- Chuck F (cbfalconer AT yahoo DOT com) (cbfalconer AT XXXXworldnet DOT att DOT net) Available for consulting/temporary embedded and systems. (Remove "XXXX" from reply address. yahoo works unmodified) mailto:uce AT ftc DOT gov (for spambots to harvest)