delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2000/09/16/13:30:11

From: "Charles Sandmann" <sandmann AT clio DOT rice DOT edu>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Malloc bug in DJGPP V2.03
Date: Sat, 16 Sep 2000 11:01:49
Organization: Aspen Technology, Inc.
Lines: 73
Message-ID: <39c3531d.sandmann@clio.rice.edu>
References: <20000916140347 DOT B284 AT ajax DOT netspace DOT net DOT au>
NNTP-Posting-Host: dcloan.hou.aspentech.com
X-Trace: selma.aspentech.com 969124317 6876 10.32.115.107 (16 Sep 2000 17:11:57 GMT)
X-Complaints-To: postmaster AT aspentech DOT com
NNTP-Posting-Date: 16 Sep 2000 17:11:57 GMT
X-NewsEditor: ED-1.5.8
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

> that DJGPP's malloc could be improved to cope with certain allocation
> limitations in DPMIs (like CWSDPMI, as I mentioned before changing its
> internal heap parameter affects how far the testmem program gets),
> unfortunately I don't have the time to work on it.

Here's the ugly problem with malloc, sbrk and CWSDPMI.

Malloc algorithms are typically designed with either a speed or memory 
efficiency goal as the first priority.  Since advanced operating systems
handle memory efficiently (being able to sparsely allocate memory in the
address space) they typically choose a malloc algorithm with speed in mind.
The DJGPP malloc is a "speed" algorithm and doesn't mind wasting memory.
For relatively small allocations, it keeps them in bins which are segregated
by factors of 2 (ie 256byte, 512byte, 1Kbyte, etc).  In the worst case
you have a factor of 2 memory overhead - unless you realloc to a bigger
bin and never use the small sized bin again.

But malloc() is just a bookkeeper - it actually gets the memory via sbrk().
In DJGPP sbrk() also tries to be smart and do bookkeeping too, since it
knows it's calling DPMI and wants to avoid performance and fragmentation
issues.  DJGPP's sbrk() tries to make sure each request is 64Kb in size or
larger.  In the days of 4Mb 386s this made lots of sense.  Today, with
64Mb being an entry level machine, we have some problems.

The problem comes in when you do lots of relatively small allocations - 
for example lets say you want to allocate 33Kb Kb structures.  malloc asks
sbrk() for 64Kb, sbrk() will ask CWSDPMI for 64Kb.  CWSDPMI has to keep
track of the memory, so it uses 14 bytes of DOS memory for each 64Kb 
request.  CWSPDMI has to get that from it's internal heap, and it's a 16-bit
application.  If it has 1000 paragraphs to work with (16Kb is a big chunk of
the 64Kb max data space) that will be 1170 memory zones - or around 75Mb of 
memory.  The default internal stack is less and you only get around 20Mb of 
memory.

I could increase the default internal CWSDPMI heap allocation which defers
the problem (but this hurts small systems which need the memory the most).
I could fix CWSDPMI to try decrease the storage pointer size.  I could try and
buffer more in DOS allocated far pointer memory.  I could even try to move
some structure storage to extended memory.  Or I could do nothing - since
I haven't used CWSDPMI myself on a real project in several years.

Another fix would be to make sbrk() smarter.  It internally stores a list
of memory zones it allocated for some other libc features.  That list is only
256 long if I remember correctly - which means some libc features break if
you have lots of little memory zones.  Since sbrk() is already broken, lets
fix it.  Instead of a static 64Kb chunk - lets make that a variable size,
and increase it the more you ask for.  So the first 32 are 64K, then they 
start to grow in 128K, 256K round off chunks.  By the time you get to sbrk()
chunk 224 or so, we ought to be up to a 8Mb minimum request per buffer.  For
example, this would allow you to allocate over 500Mb of memory in small
chunks before the internal sbrk() bookkeeping overflowed 256 zones.  For
current well behaved programs that allocate a mixture of big and small
chunks, there would be no change at all until they used more than 32 memory
zones.  I suspect performance would greatly improve for programs making
more than 32 memory zone requests also...

I prototyped something like this some years back, but it had bugs, and I don't
know what happened to it.  The crt0.s also changed, and there were merge
issues.  The fact that sbrk() is written in GAS is a real pain and has 
prevented it from getting fixed.

Memory table example:

Count	End	SizeK	T Bin	TotalK
32	32	64	2048	2048
32	64	128	4096	6144
32	96	256	8192	14336
32	128	512	16384	30720
32	160	1024	32768	63488
32	192	2048	65536	129024
32	224	4096	131072	260096
32	256	8192	262144	522240

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019