Date: Mon, 22 Jan 2001 22:19:58 +0200
From: "Eli Zaretskii" <eliz AT is DOT elta DOT co DOT il>
Sender: halo1 AT zahav DOT net DOT il
To: Martin Str|mberg <ams AT ludd DOT luth DOT se>
Message-Id: <3405-Mon22Jan2001221957+0200-eliz@is.elta.co.il>
X-Mailer: Emacs 20.6 (via feedmail 8.3.emacs20_6 I) and Blat ver 1.8.6
CC: djgpp-workers AT delorie DOT com
In-reply-to: <200101221730.SAA15795@father.ludd.luth.se> (message from Martin
	Str|mberg on Mon, 22 Jan 2001 18:30:32 +0100 (MET))
Subject: Re: Debugging on 386
References:  <200101221730 DOT SAA15795 AT father DOT ludd DOT luth DOT se>
Reply-To: djgpp-workers AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: djgpp-workers AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com
Precedence: bulk

> From: Martin Str|mberg <ams AT ludd DOT luth DOT se>
> Date: Mon, 22 Jan 2001 18:30:32 +0100 (MET)
> 
> Well it seems gdb can go into loop if I insist on "n":

Patient: Doctor, it hurts when I do like this!
Doctor: Then don't do that.

Seriously, though: this issue of debugging under signals is
exceedingly tricky.  It is a miracle it at all works (mostly, no
thanks to me).  GDB gets all the exceptions, whether in its own code
or in the debuggee's code.  For each exception, GDB must decide where
did it originate from---non-trivial trick number one.  If it decides
that the exception originated from the debuggee, it needs to jump to
the debuggee's exception code---non-trivial trick number two (how do
you jump to code which is in exceptn.S and thus usually doesn't have
any debug info?).

So, if you need to debug a program and you know it will generate
exceptions, like in this case, you should do everything to get out of
the harm's way.  Set the related signal(s) to nostop noprint, and
don't step where you don't need to.  Then pray.

> -> r
> Program received signal SIGEMT, Emulation trap.
> 0x5535 in _control87 ()
> -> bt
> #0  0x5535 in _control87 ()
> #1  0x2ec3 in _npxsetup ()
> #2  0x3337 in __crt1_startup ()
> -> c
> Program received signal SIGEMT, Emulation trap.
> 0x554f in _control87 ()
> -> bt
> #0  0x554f in _control87 ()
> #1  0x2ec3 in _npxsetup ()
> #2  0x3337 in __crt1_startup ()

These two are expected: the startup code issues a couple of FP
instructions, for the reasons I explained earlier.

> -> c
> Breakpoint 1, main (argc=1, argv=0x905d4) at analyse_ints.c:129
> 129       if( argc != 2)
> -> bt
> #0  main (argc=1, argv=0x905d4) at analyse_ints.c:129
> #1  0x3368 in __crt1_startup ()
> -> n
> Exiting due to signal SIGFPE

This one is not.  What does "disassemble analyse_ints" print near the
EIP of the breakpoint (0x1a64)?  Do you see any FP instructions
anywhere around that?

> Coprocessor Error at eip=00001a64, x87 status=
> Program received signal SIGEMT, Emulation trap.
> 0x9611 in _status87 ()
> -> bt
> #0  0x9611 in _status87 ()
> #1  0x47da in do_faulting_finish_message ()
> #2  0x4d13 in __djgpp_traceback_exit ()
> #3  0x4da0 in raise ()
> #4  0x2c3a in nofpsig ()
> #5  0x4daa in raise ()
> #6  0x4e07 in __djgpp_exception_processor ()
> #7  0x1 in ?? ()
> #8  0x3368 in __crt1_startup ()

This is expected: the code which prints the traceback calls
_status87.  But what is that 0x1 on the stack?

> -> n
> Single stepping until exit from function _status87,
> which has no line number information.
> Exiting due to signal SIGFPE

I suspect that this happens because GDB is single-stepping the
program.  Doing so near DPMI calls is another non-trivial trick,
because you cannot have the TF flag set when issuing an INT xx
instruction.

Or maybe there's another bug.  The complexity of what happens there is
really mind-boggling.

> -> c
> 00c1
> eax=000000c1 ebx=00000010 ecx=00000000 edx=0004fa10 esi=00000000 edi=00010167
> ebp=0008f874 esp=0008f83c program=F:\HACKERY\STAT\NEW_STAT\ANALYSE_
> cs: sel=0167  base=104c0000  limit=0009ffff
> ds: sel=016f  base=104c0000  limit=0009ffff
> es: sel=016f  base=104c0000  limit=0009ffff
> fs: sel=015f  base=0004fa10  limit=00003fff
> gs: sel=017f  base=00000000  limit=0010ffff
> ss: sel=016f  base=104c0000  limit=0009ffff
> App stack: [000901ac..000101ac]  Exceptn stack: [00010120..0000e1e0]
> 
> Call frame traceback EIPs:
>   0x00009616 __status87+6

Not bad at all: it eventually got to printing the crash message and
exiting the program ``almost normally''.

You are welcome to work on this, if you feel like it.  I'd be happy to
give directions if you do.