Mail Archives: djgpp-workers/2001/04/06/05:56:46
> Date: Fri, 06 Apr 2001 11:35:49 +0800
> From: "Nimrod A. Abing" <n_abing AT ns DOT roxas-online DOT net DOT ph>
>
> Eli, you wanted a disassembly of __dj_movedata+33:
>
> [--cut here--]
> (gdb) disas __dj_movedata+33
> Dump of assembler code for function big_move:
> 0x8eba <big_move>: mov %cl,%al
> 0x8ebc <big_move+2>: shr $0x2,%ecx
> 0x8ebf <big_move+5>: and $0x3,%al
> 0x8ec1 <big_move+7>: repz movsl %ds:(%esi),%es:(%edi)
> 0x8ec3 <big_move+9>: mov %al,%cl
> End of assembler dump.
> (gdb)
> [--cut here--]
>
> This is disas from the test program sigabrt.exe. The results are what you
> expected I believe.
Yes: it crashes on this instruction:
0x8ec1 <big_move+7>: repz movsl %ds:(%esi),%es:(%edi)
So the question remains: what is the problem with the value of ESI and
maybe also EDI that causes the crash? See below.
> As for the AV software causing the crash, it is very
> possible and the only probable cause. When I disable real-time scanning,
> everything works fine, core dumps go without any errors.
I don't argue with facts; I agree that the AV somehow causes the
crashes. What I don't understand is HOW does it cause the crashes,
and the key to that is to undrestand why exactly does the crash
happen. In any case, your initial hypothesis that the segment loaded
into ES is somehow corrupted is not true: inside __dj_movedata, ES is
loaded with the _dos_ds selector, so ES's value printed in the crash
message is perfectly normal.
The reason I'm trying so hard to uinderstand this crash is that I'm
not sure it is limited to AV software. It's possible that there's a
real bug somewhere in the core dumper, which will show in other
circumstances as well. I think we will not be able to dismiss this
case until we gain some insight into why does it happen. Right now,
I'm clueless, and I don't like that ;-).
> Division by Zero at eip=0000157eExiting due to signal SIGSEGV
> An error occured while writing core file. (signal: 14, progress number: 11)
> Page fault at eip=00008e91, error=0004
> eax=00000000 ebx=00004000 ecx=00001000 edx=0000f620 esi=00030000 edi=0000f620
> ebp=002f5cf4 esp=002f5ce4 program=C:\PROJECTS\PMDB\COREDUMP\SIGFPE1.EXE
> cs: sel=00f7 base=830bf000 limit=002f5fff
> ds: sel=00ff base=830bf000 limit=002f5fff
> es: sel=010f base=00000000 limit=0010ffff
> fs: sel=010f base=00000000 limit=0010ffff
> gs: sel=010f base=00000000 limit=0010ffff
> ss: sel=00ff base=830bf000 limit=002f5fff
> App stack: [002f6000..00276000] Exceptn stack: [0000fd40..0000de00]
>
> Call frame traceback EIPs:
> 0x00008e91 ___dj_movedata+33
Yes, this is the same crash.
Since this is a Page Fault, and the error code is 4, the primary
suspect is the value of ESI, which points to the address where the
data is read from. Can you please see if the value shown above,
0x30000, is valid given the address and the size of the chunk of
memory the code is trying to dump at this stage?
As for the value of EDI, it should be compared with the value of
_go32_info_block.linear_address_of_transfer_buffer. Can you post the
address of the transfer buffer in that specific program when AV is
enabled.
> As for this line in the crash message:
>
> ``An error occured while writing core file. (signal: 14, progress number:
> 11)''
>
> This was part of GF's original code and I decided to keep it while the core
> dumper is still in testing stage. So if it says ``progress number: 11'',
> egrep -n "progress = 11" will tell me where to start looking. As for the
> ``signal: 14'' this is an exception number for SIGSEGV, maybe I should
> rewrite it to say ``exception'' or ``DJGPP signal''
I'd say "exception" is more accurate.
> Eli, it gets weirder all the time. When I gdb (gdb 4.18 and 5.0) the test
> program (with AV software running in the background), the SIGSEGV does
> *not* happen. This is unqualified weirdness if you ask me.
One more reason to dig deeper into this, I'd say.
> About this AV thing, I guess it's caused by the real-time scanner when it
> tries to read and examine the instructions used by the program.
That's possible, but it still doesn't explain why does the program
crash.
> Maybe it tries to move the chunk of memory (stupidly) to another
> location to examine it. But then again, we would never know because
> we don't have the source code for the AV scanner, eh?
The key to this problem is that the DJGPP program crashes. So
something inside _our_ code causes the crash. We need to try to
understand what that something is.
Charles, is it possible for another program, such as an antivirus,
cause a Page Fault by smething its code does, but have Windows abort
our program instead?
In other words, what could be a reason for a program to get a Page
Fault if the instruction is a perfectly valid one and all the
registers hold valid values?
Nimrod, one thing to try is to set up a signal handler for SIGSEGV,
around the code which dumps one chunk of memory, and have that SIGSEGV
handler longjmp to restart the dumping of that same chunk of memory.
This could help if the problem is not permanent and does not originate
in the core dumper's own code.
- Raw text -