delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2000/04/19/06:47:30

From: Hans-Bernhard Broeker <broeker AT physik DOT rwth-aachen DOT de>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: dead beef
Date: 19 Apr 2000 10:50:21 GMT
Organization: Aachen University of Technology (RWTH)
Lines: 83
Message-ID: <8dk31d$m3n$1@nets3.rz.RWTH-Aachen.DE>
References: <Pine DOT SUN DOT 3 DOT 91 DOT 1000418113426 DOT 28255U-100000 AT is> <38FC4A45 DOT 54C24CDF AT bigfoot DOT com> <8dhrpn$q3s$1 AT nets3 DOT rz DOT RWTH-Aachen DOT DE> <38FCB4D5 DOT B3BB6044 AT bigfoot DOT com>
NNTP-Posting-Host: acp3bf.physik.rwth-aachen.de
X-Trace: nets3.rz.RWTH-Aachen.DE 956141421 22647 137.226.32.75 (19 Apr 2000 10:50:21 GMT)
X-Complaints-To: abuse AT rwth-aachen DOT de
NNTP-Posting-Date: 19 Apr 2000 10:50:21 GMT
Originator: broeker@
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

J.P. Morris <doug-15 AT bigfoot DOT com> wrote:
> Hans-Bernhard Broeker wrote:

[Using the zeroing sbrk(), you get a NULL pointer dereference, which
is different behaviour from what the default, or the 'deadbeef' one
yield. ]

>> You should have tried that in a debugger, and checked where this NULL
>> pointer came from, to find the bug.

> Assuming I can get it to compile so that it will work inside the
> debugger, what then?  I've only ever used debuggers with faults
> that occur every time, not intermittent ones.  

It doesn't matter all that much. Just run the program inside the
debugger.  When the program would normally crash due to SIGSEGV or
NULL pointer dereference, the debugger will fire up, with the
necessary information about variables and source code positions so you
can see *which* pointer variable is bad, and what value it has, and
how the execution flow of the program got to that place.

Now, as Eli already pointed out, you'll want to repeat the program's
execution, but this time, set a watchpoint on the pointer variable
that contained the invalid data (NULL, or garbage), in the first run.
I'd advise using the 0xdeadbeef method, and maybe make the watchpoint
conditional so it only really triggers if the new value of the pointer
is really the 'dead beef' one.

> Since I will only know the bug has occurred when it has already happened,
> how can I trace this kind of bug?

If the bug is anywhere near reproduceable, things like breakpoints with
ignore-counts can be useful. I.e. if you know that the bug happens in a 
certain routine, but only on the 12345th invocation of it, you can

	break routine
	# let's say the breakpoint got number 11
	ignore 11 12344
	run

To find the right number of invocations, set the ignore count to a
very high number, and once the crash has happened, use 'info break' to
see how many times the breakpoint was ignored, before the crash.

[...]
> CHECK_OBJECT(object); // This bombs out if object pointer is invalid
> move_object(object,10,10);

> Now, the worst thing is that quite often, the pointer passes the
> CHECK_OBJECT() test OK, but when it reaches move_object(), it has
> turned into 0x203206 or something.

That would hint at CHECK_OBJECT() as the culprit. It may be modifying
the value of 'object', in certain, rare cases.

> Also, I have just found, the program works without crashing in a DOS
> box, but crashes in pure DOS.  None of the classic causes in the FAQ
> seem to be the problem, unless I'm missing something.

That's expected behaviour if you used the zeroing sbrk(). The
resulting NULL pointer dereference is not caught by the Windows DPMI
host. CWSDPMI, on the other hand, will catch it, so that's where that
difference would come from. And this _is_ in the FAQ, unless my memory
is betraying me even beyond it's usual lossyness.

>> Right. To detect overruns or underruns in arrays not coming from
>> malloc() (i.e. automatic ones on the stack, or static ones), you need
>> other tools. 
[...]
> I'll try that.  If there is a problem like this it should find it
> even if the game doesn't crash in linux.

Well, actually the way your bug changes behaviour as you switch from
one version of sbrk() to the other, the actual source of the bug must
at least partly be related to malloc()ed storage. It's using the
'random garbage' from malloc()ed blocks that haven't been filled with
anything sensible, yet. The only truly puzzling part of this is that
the crash manages to avoid happening for such a long runtime of the
program.

-- 
Hans-Bernhard Broeker (broeker AT physik DOT rwth-aachen DOT de)
Even if all the snow were burnt, ashes would remain.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019