Message-Id: <3.0.1.32.20010406113549.006c38e0@wingate> X-Sender: n_abing#ns DOT roxas-online DOT net DOT ph AT wingate X-Mailer: Windows Eudora Pro Version 3.0.1 (32) Date: Fri, 06 Apr 2001 11:35:49 +0800 To: djgpp-workers AT delorie DOT com From: "Nimrod A. Abing" Subject: Re: That crash message from the core dumper. In-Reply-To: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Reply-To: djgpp-workers AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: djgpp-workers AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk Hello. This is in response to all the replies to my last message about core dumping crash. Sorry if I was not able to reply immediately, I was on vacation. I've so much catching up to do, yes? Anyway, here we go... Eli, you wanted a disassembly of __dj_movedata+33: [--cut here--] (gdb) disas __dj_movedata+33 Dump of assembler code for function big_move: 0x8eba : mov %cl,%al 0x8ebc : shr $0x2,%ecx 0x8ebf : and $0x3,%al 0x8ec1 : repz movsl %ds:(%esi),%es:(%edi) 0x8ec3 : mov %al,%cl End of assembler dump. (gdb) [--cut here--] This is disas from the test program sigabrt.exe. The results are what you expected I believe. As for the AV software causing the crash, it is very possible and the only probable cause. When I disable real-time scanning, everything works fine, core dumps go without any errors. When I turn on the AV program, the SIGSEGV rears its ugly head again, it occurs within the signal handler (core dumper code). It doesn't matter what kind of program, large or small. As long as it calls the core dumper, $#!+ happens. Below is another crash dump for a very short program that causes a SIGFPE (division by zero): [--cut here--] Division by Zero at eip=0000157eExiting due to signal SIGSEGV An error occured while writing core file. (signal: 14, progress number: 11) Page fault at eip=00008e91, error=0004 eax=00000000 ebx=00004000 ecx=00001000 edx=0000f620 esi=00030000 edi=0000f620 ebp=002f5cf4 esp=002f5ce4 program=C:\PROJECTS\PMDB\COREDUMP\SIGFPE1.EXE cs: sel=00f7 base=830bf000 limit=002f5fff ds: sel=00ff base=830bf000 limit=002f5fff es: sel=010f base=00000000 limit=0010ffff fs: sel=010f base=00000000 limit=0010ffff gs: sel=010f base=00000000 limit=0010ffff ss: sel=00ff base=830bf000 limit=002f5fff App stack: [002f6000..00276000] Exceptn stack: [0000fd40..0000de00] Call frame traceback EIPs: 0x00008e91 ___dj_movedata+33 0x0000869c ___dosmemput+44 0x0000559b __write+123 0x00002c28 _main+5816 0x000034e5 ___djgpp_traceback_exit+177 0x00003586 _raise+118 0x0000364f ___djgpp_exception_processor+43 0x00000001 0x1 0x000050b8 ___crt1_startup+204 h:/pmdb/coredump $ symify sigfpe1.exe [--cut here--] This is the code for that program: [--cut here--] #include int main(void) { int i; i = 1/0; exit(0); } [--cut here--] As you can see the crash still happens on __dj_movedata. This doesn't seem to be __dj_movedata's fault, somehow the base addresses got mixed up. And it all looks similar to the first one. As for this one: > Abort! > Exiting due to signal SIGABRT > Raised at eip=00003786 > writing contents of address 00237000 > block size 524288 > > writing contents of address 00020000 > block size 2228224 The first two lines indicates _the_ exception that was raised using abort(). This is the original signal. The address is from the memory handle data maintained by sbrk, which is copied by the core dumper into its internal data structures. The block size is the chunk size calculated by GF's core dumping code as it walks the dpmi memory handles. The code also does some more fiddling with the chunk size to reduce the size of the core dump and to avoid dumping redundant data. For instance, it takes advantage of the fact that memory segments do not overlap and does some rithmetic on them. I wrote some code to dump the address and the chunk size to stderr as well, but I removed it from my last released code. The following patch should put it back in... [--cut here--] *** dpmiexcp.c~ Thu Apr 5 22:16:50 2001 --- dpmiexcp.c Thu Apr 5 22:18:02 2001 *************** *** 577,582 **** --- 577,589 ---- #endif if (mem_block_list[i].address >= 0x1000) { + err("\r\n"); + err("writing contents of address "); + itox(mem_block_list[i].address, 8); + err("\r\n"); + err("block size "); + itod(mem_block_list[i].chunks << 16); + err("\r\n"); _write(corefile, (void *)mem_block_list[i].address, mem_block_list[i].chunks << 16); [--cut here--] As for this line in the crash message: ``An error occured while writing core file. (signal: 14, progress number: 11)'' This was part of GF's original code and I decided to keep it while the core dumper is still in testing stage. So if it says ``progress number: 11'', egrep -n "progress = 11" will tell me where to start looking. As for the ``signal: 14'' this is an exception number for SIGSEGV, maybe I should rewrite it to say ``exception'' or ``DJGPP signal'' to differentiate from signal 11 which is the _real_ signal number for SIGSEGV on Unix. Eli, it gets weirder all the time. When I gdb (gdb 4.18 and 5.0) the test program (with AV software running in the background), the SIGSEGV does *not* happen. This is unqualified weirdness if you ask me. Again, let me reiterate that the AV scanner I am using is InoculateIT Personal Edition Version 5.2.9.0. Anyone out there with the same AV software but doesn't get the same results when the test program is run in a DOS box? IIRC, there was also a similar issue with Norton AV 95, but it was with GDB 4.18 then and I have removed Norton and replaced it with InoculateIT because the latter is ``GDB friendly''. About this AV thing, I guess it's caused by the real-time scanner when it tries to read and examine the instructions used by the program. Maybe it tries to move the chunk of memory (stupidly) to another location to examine it. But then again, we would never know because we don't have the source code for the AV scanner, eh? nimrod_a_abing -------------- +========================================+ | Home page: www.geocities.com/n_abing | +========================================+ "Tinimbang ka ngunit kulang." If you understand that phrase, i-email mo'ko. ;-)