Mail Archives: djgpp-workers/2001/04/23/14:30:21
> Date: Mon, 23 Apr 2001 15:03:51 +0200 (MET DST)
> From: Hans-Bernhard Broeker <broeker AT physik DOT rwth-aachen DOT de>
>
> The most curious one I observed was with two DOS boxes open, in
> Win98 (one with DJGPP environment set up, the other a plain DOS shell, but
> that's not an important detail, I think). Running the test program in one
> of the shells crashed (SIGSEGV, coredump progress level reported to be
> 11), but *only* iff another DOS shell was open. I.e. closing the other DOS
> window, the test program successfully dumped a correct core, opening the
> other window again and repeating the test in the first window caused it to
> crash, again. All the while, running the test in the _other_ window worked
> fine.
>
> I then went on and investigated a bit further. I found that switching to
> Unixy sbrk() algorithm via the crt0 startup flag fixed the bug. With
> non-moving sbrk() in use, the crash usually happened when it tried to dump
> the (large) memory block sitting between the stack and the memory space
> reserved by the stub/crt0.
This is typical to the case when the base address of out DS is near
the upper edge of 4GB, so that it wraps around into the low addresses,
and the DS limit is very large. We had in the past reports of weird
crashes in old versions of GCC, also due to Page Faults, which went
away when Unixy sbrk was used.
You should see the difference between these two cases in the segment
base address and limit printed in the crash message.
> I.e: the bug may be related to the fragmented memory layout created by
> non-Unix sbrk. It happens as the coredumper tries to dumps a rather large
> memory block (several megabytes, typically) that isn't actually all used
> by the program (the coredump, if it succeeds, is around 500 to 700 KB,
> altogether).
This seems to indicate that some of these pages are not mapped into
the program's address space, or become unmapped at Windows' whim.
Charles, any ideas how can this happen? I understand that
__djgpp_memory_handle_list[] only holds pages that must be mapped into
our address space, right? So touching that memory should never Page
Fault our application, it should at most Page Fault the Windows memory
manager.
If we cannot find any problem in our code, perhaps setting up a
SIGSEGV handler, that would simply skip a problematic page and longjmp
to continue with other pages, will be an okay work-around?
> The details of the bug depend on the status of Windows' memory management,
> too, it seems. E.g., I failed to reproduce it at all, for several days.
> But for some reason I don't know, it reappeared after another turn-on
> of the machine, and once it has appeared, it happens somewhat reliably
> until shutdown.
It probably depends on what exactly do you do since the bootstrap.
Try to record everything you do, each command you invoke and in what
order, and reproduce that exactly the next time.
- Raw text -