delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/2001/04/05/23:34:25

Message-Id: <3.0.1.32.20010406113549.006c38e0@wingate>
X-Sender: n_abing#ns DOT roxas-online DOT net DOT ph AT wingate
X-Mailer: Windows Eudora Pro Version 3.0.1 (32)
Date: Fri, 06 Apr 2001 11:35:49 +0800
To: djgpp-workers AT delorie DOT com
From: "Nimrod A. Abing" <n_abing AT ns DOT roxas-online DOT net DOT ph>
Subject: Re: That crash message from the core dumper.
In-Reply-To: <Pine.SUN.3.91.1010405122711.11266D-100000@is>
References: <Pine DOT OSF DOT 4 DOT 30 DOT 0104050954110 DOT 13452-100000 AT sirppi DOT helsinki DOT fi>
Mime-Version: 1.0
Reply-To: djgpp-workers AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: djgpp-workers AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com

Hello. This is in response to all the replies to my last message about core
dumping crash. Sorry if I was not able to reply immediately, I was on
vacation. I've so much catching up to do, yes? Anyway, here we go...

Eli, you wanted a disassembly of __dj_movedata+33:

[--cut here--]
(gdb) disas __dj_movedata+33
Dump of assembler code for function big_move:
0x8eba <big_move>:      mov    %cl,%al
0x8ebc <big_move+2>:    shr    $0x2,%ecx
0x8ebf <big_move+5>:    and    $0x3,%al
0x8ec1 <big_move+7>:    repz movsl %ds:(%esi),%es:(%edi)
0x8ec3 <big_move+9>:    mov    %al,%cl
End of assembler dump.
(gdb)
[--cut here--]

This is disas from the test program sigabrt.exe. The results are what you
expected I believe. As for the AV software causing the crash, it is very
possible and the only probable cause. When I disable real-time scanning,
everything works fine, core dumps go without any errors. When I turn on the
AV program, the SIGSEGV rears its ugly head again, it occurs within the
signal handler (core dumper code). It doesn't matter what kind of program,
large or small. As long as it calls the core dumper, $#!+ happens. Below is
another crash dump for a very short program that causes a SIGFPE (division
by zero):

[--cut here--]
Division by Zero at eip=0000157eExiting due to signal SIGSEGV
An error occured while writing core file. (signal: 14, progress number: 11)
Page fault at eip=00008e91, error=0004
eax=00000000 ebx=00004000 ecx=00001000 edx=0000f620 esi=00030000 edi=0000f620
ebp=002f5cf4 esp=002f5ce4 program=C:\PROJECTS\PMDB\COREDUMP\SIGFPE1.EXE
cs: sel=00f7  base=830bf000  limit=002f5fff
ds: sel=00ff  base=830bf000  limit=002f5fff
es: sel=010f  base=00000000  limit=0010ffff
fs: sel=010f  base=00000000  limit=0010ffff
gs: sel=010f  base=00000000  limit=0010ffff
ss: sel=00ff  base=830bf000  limit=002f5fff
App stack: [002f6000..00276000]  Exceptn stack: [0000fd40..0000de00]

Call frame traceback EIPs:
  0x00008e91 ___dj_movedata+33
  0x0000869c ___dosmemput+44
  0x0000559b __write+123
  0x00002c28 _main+5816
  0x000034e5 ___djgpp_traceback_exit+177
  0x00003586 _raise+118
  0x0000364f ___djgpp_exception_processor+43
  0x00000001 0x1
  0x000050b8 ___crt1_startup+204
h:/pmdb/coredump $ symify sigfpe1.exe
[--cut here--]

This is the code for that program:

[--cut here--]
#include <stdlib.h>



int main(void)
{
        int i;

        i = 1/0;
        exit(0);
}
[--cut here--]

As you can see the crash still happens on __dj_movedata. This doesn't seem
to be __dj_movedata's fault, somehow the base addresses got mixed up. And
it all looks similar to the first one.

As for this one:

> Abort!
> Exiting due to signal SIGABRT
> Raised at eip=00003786
> writing contents of address 00237000
> block size 524288
> 
> writing contents of address 00020000
> block size 2228224

The first two lines indicates _the_ exception that was raised using
abort(). This is the original signal. The address is from the memory handle
data maintained by sbrk, which is copied by the core dumper into its
internal data structures. The block size is the chunk size calculated by
GF's core dumping code as it walks the dpmi memory handles. The code also
does some more fiddling with the chunk size to reduce the size of the core
dump and to avoid dumping redundant data. For instance, it takes advantage
of the fact that memory segments do not overlap and does some rithmetic on
them. I wrote some code to dump the address and the chunk size to stderr as
well, but I removed it from my last released code. The following patch
should put it back in...

[--cut here--]
*** dpmiexcp.c~ Thu Apr  5 22:16:50 2001
--- dpmiexcp.c  Thu Apr  5 22:18:02 2001
***************
*** 577,582 ****
--- 577,589 ----
  #endif
          if (mem_block_list[i].address >= 0x1000)
          {
+           err("\r\n");
+           err("writing contents of address ");
+           itox(mem_block_list[i].address, 8);
+           err("\r\n");
+           err("block size ");
+           itod(mem_block_list[i].chunks << 16);
+           err("\r\n");
            _write(corefile,
                   (void *)mem_block_list[i].address,
                   mem_block_list[i].chunks << 16);
[--cut here--]

As for this line in the crash message:

``An error occured while writing core file. (signal: 14, progress number:
11)''

This was part of GF's original code and I decided to keep it while the core
dumper is still in testing stage. So if it says ``progress number: 11'',
egrep -n "progress = 11" will tell me where to start looking. As for the
``signal: 14'' this is an exception number for SIGSEGV, maybe I should
rewrite it to say ``exception'' or ``DJGPP signal'' to differentiate from
signal 11 which is the _real_ signal number for SIGSEGV on Unix.

Eli, it gets weirder all the time. When I gdb (gdb 4.18 and 5.0) the test
program (with AV software running in the background), the SIGSEGV does
*not* happen. This is unqualified weirdness if you ask me. Again, let me
reiterate that the AV scanner I am using is InoculateIT Personal Edition
Version 5.2.9.0. Anyone out there with the same AV software but doesn't get
the same results when the test program is run in a DOS box? IIRC, there was
also a similar issue with Norton AV 95, but it was with GDB 4.18 then and I
have removed Norton and replaced it with InoculateIT because the latter is
``GDB friendly''.

About this AV thing, I guess it's caused by the real-time scanner when it
tries to read and examine the instructions used by the program. Maybe it
tries to move the chunk of memory (stupidly) to another location to examine
it. But then again, we would never know because we don't have the source
code for the AV scanner, eh?


nimrod_a_abing
--------------

+========================================+
|  Home page: www.geocities.com/n_abing  |
+========================================+

"Tinimbang ka ngunit kulang."
If you understand that phrase, i-email mo'ko. ;-)

- Raw text -


  webmaster     delorie software   privacy  
  Copyright � 2019   by DJ Delorie     Updated Jul 2019