Mail Archives: cygwin/2002/12/31/19:47:33
Christopher Faylor wrote:
> On Tue, Dec 31, 2002 at 09:43:50AM -0500, Charles Wilson wrote:
>
>>Did gcc (pre 3.2) automatically initialize data to 0, while gcc-3.2 does
>>not? Hmmm...waitaminute, I do have gcc2 installed...
>
>
> If gcc/ld is not initializing static data to zero then there are some
> pretty serious problems. Neither gcc, nor any other compiler that I
> am aware of, is supposed to initialize automatic data to zero.
You're right, as usual. (Plus empirical evidence: no change when
compiling with gcc-2. Still get the SIGSEGV)
You're also right [I think] about the buffer overrun/whatever problem in
glib. I haven't found the specific, offending command yet, but it seems
pretty obvious from the postmortem that that is what has happened.
Single-stepping thru the code shows some interesting things. Basically,
we have a g_string (structure that contains a char* field, a length
field, plus some other fields that are hidden from the public interface.
The whole structure is dynamically allocated, and the char* field
points to additional dynamically allocation storage.
This g_string is initially allocated as a minimum size string, where the
char* points to a minimum size buffer (4 bytes, it appears), but
contains "" (e.g. *char = '\0', len = 0). [as far as glib is concerned,
the char* points to a chunk of memory 4 bytes long. But dlmalloc
actually uses a chunk that is 16 bytes long == 3 32bit words of dlmalloc
overhead, plus the user data]
Then, the testsuite attempts to append a char* that contains approx 20k
characters. This means that glib must realloc a larger buffer for the
g_string's internal storage. It rounds up to the nearest power of two
-- 32768 -- and attempts to realloc the space:
line 220: gstring.c (g_string_maybe_expand)
string->str = g_realloc (string->str, string->alloc);
"string" is the g_string, recast to GRealString* to reveal the
private members of the structure (like 'alloc', which is how
much storage glib originally requested for this buffer, even
though it only needed 1 byte for the 0-length string "".)
in g_realloc, we simply thunk to the system realloc
line 338: gmem.c (g_realloc)
p = (gpointer) realloc (mem, size);
where mem is the string->str argument above, and size is
the string->alloc argument (it equals 32768)
mem, at this point (or string->str, if you like) is a viable pointer to
a previously allocated chunk of memory (4 bytes long) that contains '\0'
and 3 unknown other values. It has not previously been freed.
But now, as I said, we're in the bowels of dlmalloc.
line 146: malloc_wrapper.cc (realloc)
res = dlrealloc (p, size);
again, p (== mem == string->str) is a valid pointer. size is
32768.
line 4078: malloc.cc (dlrealloc == rEALLOc)
newmem = mALLOc(nb - MALLOC_ALIGN_MASK);
we've already determined that the request cannot be satisfied by
expanding the allocation within the current chunk, nor can it be
satisfied by expanding the allocation into the next chunk. So, we fall
back on the "allocate-copy-free" algorithm, and start by allocating a
new chunk of memory.
line 3516: malloc.cc (dlmalloc == mALLOc)
bck->fd = unsorted_chunks(av);
How we got to this line: there have been (other pointers) freed,
so we have to first search the list of recently freed memory for
appropriately sized chunks.
1) this request is not a 'fast bin' size, so skip that codeblock
2) it's not a small request, so skip the smallbin check
3) so, we consolidate all fastbins. That succeeds.
4) We enter a while loop, as described by this comment:
/* Process recently freed or remaindered chunks, taking one only if it
is exact fit, or, if this a small request, the chunk is remainder from
the most recent non-exact fit. Place other traversed chunks in bins.
Note that this step is the only place in any routine where chunks are
placed in bins. */
while ( (victim = unsorted_chunks(av)->bk) != unsorted_chunks(av)) {
bck = victim->bk;
size = chunksize(victim);
...(not a small request, so skip the next codeblock)...
(now we pull "victim" out of the doubly-linked list.)
unsorted_chunks(av)->bk = bck;
bck->fd = unsorted_chunks(av);
And it's here that the error occurs: trying to hook up the forward
pointer in victim's predecessor to point to victim's successor.
The problem is, 'victim' is full of garbage. Supposedly this free'd
chunk of memory is 808,464,432 bytes long. Its predecessor is also
808,464,432 bytes long. And its predecessor and successor are both
located at 0x30303030. Which is bogus, becuase dlmalloc hasn't obtained
that region of memory from the system (and besides, 0x30303030 is
suspiciously like " ")
So it looks like glib's test program has clobbered out-of-bounds memory
in a given malloc'ed block -- but only a few bytes, enough to clobber
dlmalloc's bookkeeping but not enough to trigger an actual segfaut. And
then it freed the (clobbered) block -- but we don't notice THAT until
the NEXT time we try to malloc/realloc some memory. E.g. now.
So the error is not in
string-test.c (main) line 109: g_string_sprintf() -->
g_string_sprintfa_int() --> g_string_append()...
It might be in
string-test.c (main) line 109: g_string_sprintf() -->
g_string_sprintfa_int() --> g_strdup_vprintf(), which is immediately
before the line above,
or it might be in some earlier test in string-test.c (main).
Have I mentioned how much I hate memory allocation? I'm starting to
think, since the problem DOES seem to be in glib, that it'd be faster to
compile this mess on linux and use a debugging malloc like
electricfence. But I am having a hard time believing that of the
thousands of glib/gtk/gnome developers out there, nobody has noticed a
fundamental problem in [glib|glib-test-suite] before now.
[...but I can't reproduce the fault on linux. Even if I link in
dlmalloc. Bleah. ElectricFence on linux couldn't find anything
suspicious either.]
But I'm still confused as to why this worked in May. When cgf
introduced Doug Lea's malloc on 2002/08/16, what were we using before?
Lea's malloc is interesting, in that memory fragments obtained from the
systems memorymanager contain both dlmalloc's own bookkeeping inline and
user data. So **small** buffer overruns by a userland program will
screw up dlmalloc's own accounting without triggering a segfault
immediately. Other malloc algorithms maintain the bookkeeping
elsewhere, and buffer overruns will (usually) only clobber other user
data. (According to Lea's documentation, this is a feature of dlmalloc,
not a bug).
So if cygwin's earlier malloc implementation had "separate books" as it
were, then the buffer overrun (which I have not yet found) could have
occurred without creating an obvious runtime error.
But that still doesn't explain why it works without fault on linux.
Mebbe I'll try to build efence on cygwin. But not tonight; that's
enough. Happy New Year, everybody.
--Chuck
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -