X-Authentication-Warning: delorie.com: mailnull set sender to djgpp-workers-bounces using -f Message-ID: <3C72DA25.F1AE5FF0@yahoo.com> Date: Tue, 19 Feb 2002 18:05:09 -0500 From: CBFalconer Organization: Ched Research X-Mailer: Mozilla 4.75 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: djgpp-workers AT delorie DOT com CC: Eli Zaretskii Subject: Re: Malloc/free DJGPP code References: <3C723299 DOT 563B5B72 AT yahoo DOT com> <8971-Tue19Feb2002141104+0200-eliz AT is DOT elta DOT co DOT il> <3C72560A DOT EA7F5099 AT yahoo DOT com> <6048-Tue19Feb2002180659+0200-eliz AT is DOT elta DOT co DOT il> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Reply-To: djgpp-workers AT delorie DOT com Eli Zaretskii wrote: > > > Date: Tue, 19 Feb 2002 08:41:30 -0500 > > From: CBFalconer > > > > > > As long as merging requires an exhaustive search of a singly > > > > linked list for merge candidates, it is sooner or later going to > > > > require such an O(N**2) algorithm. > > > > > > I don't understand why; please explain. AFAICS, `merge' simply scans > > > the appropriate sublist linearly until it finds the block being freed. > > > > It has to search all the lists for adjacencies. O(n). And it has > > to search for each item freed. O(n) again, resulting in O(n*n). > > Can you please identify these two nested searches in malloc.c? I > don't see them. What I see is this (cleaned-up of debug code for > clarity): The outer loop isn't in malloc, it is in the application, which has the array or linked list or whatever of allocated memory to free. > for (bp=freelist[bu]; bp; bpp=&(bp->next), bp=bp->next) This is the only thing that matters. It makes each merge O(n), and must be executed O(n) times, thus O(n*n). > > This boils down to a single call to b2bucket, followed by a linear > scan of a single sublist. b2bucket loops up to 32 times to find the > closest integral power of 2, so at worst, this is O(n+32), not O(n*m); > and `n' here is typically smaller than the number of allocated blocks, > but can at worst be equal to it. Am I missing something? Yes. For a long list free is called for each of m items. merge takes time proportional to n, where n is the number of freed blocks. As a significant portion of the m items have been freed the number of freed blocks peaks (freeing is being done in random order). The effect will probably never show up when freeing is done in roughly the order (or reverse order) of allocation, because the count of free blocks will remain low. It takes the randomization to make it bite. This can be caused by many things (prime example the hashlib system I sent you) including sorting either as a pointer array or as a linked list. On my machine objectionable delays occur at about 10,000 items up, and are totally impossible at 50,000. VC6 doesn't have the problem and doesn't die until things slop over into disk thrashing at over 1,000,000 items. LCCWIN32 has the same problem. On the hashlib system I sent, run "hashtest 4 N" where N is the count of items. Odd N's run without calling the free mechanism at exit, while even values call it. The time difference belongs to merge. I don't care how fast your machine is, or how big memory is, sooner or later you will run into objectionable delays. I believe the program is actually limited to about 2**23 entries, but that is easily increased. You won't need to. > > The profile you posted last time indicates that b2bucket took about 4 > microseconds per call, while merge took 315 microsoeconds per call--80 > times longer! So evidently, only one loop, the one in merge, matters > here. Exactly. Once that becomes an O(1) in place of O(n) algorithm, the problem will go away. > > Since there's only one loop in merge, this would mean the number of > free blocks on the sublist is very large. Can you see how large is > it? Are all the 30K objects on the same sublist, perhaps, and is that > the reason for the slow search? If so, how can we modify the code to > run faster in this situation? At most dividing into sublists changes the constant factor. The O(n*n) will always dominate. The search has to go on until either an adjacency is found or the whole list has been searched and none has been found. Sublists don't change this. > > These are the kinds of questions, and in that order, that I'd > recommend to discuss. Jumping to rewriting the code with different > data structures design without understanding those issues first is not > a good way of dealing with such problems. I can easily make a portion of the revised memblock compatible with the original, in that the size and freelist pointer will be at the same (-ve) offsets from the user pointer. That would forbid the safety improvement I am making to guard against off-by-one indexed operations. Publishing the offsets of those two items in implementation space would allow gradual conversion of unclean software that uses them, so that eventually the module would be truly independant If the other software uses only the size that is better. I envision a globally available variable: const int __msize_offset = offsetof(memblock, data) - offsetof(memblock, size); and the header contains: extern const int __msize_offset; which is initially constrained (and documented) to be the same as the present. Have I answered your questions? BTW, please don't pass on the hashlib I sent, it has been revised, but is in abeyance because of this. If desired I will attach a later version on the mail-list. -- Chuck F (cbfalconer AT yahoo DOT com) (cbfalconer AT XXXXworldnet DOT att DOT net) Available for consulting/temporary embedded and systems. (Remove "XXXX" from reply address. yahoo works unmodified) mailto:uce AT ftc DOT gov (for spambots to harvest)