Date: Mon, 23 Apr 2001 20:58:04 +0300 From: "Eli Zaretskii" Sender: halo1 AT zahav DOT net DOT il To: djgpp-workers AT delorie DOT com Message-Id: <7263-Mon23Apr2001205804+0300-eliz@is.elta.co.il> X-Mailer: Emacs 20.6 (via feedmail 8.3.emacs20_6 I) and Blat ver 1.8.9 CC: Charles Sandmann , n_abing AT ns DOT roxas-online DOT net DOT ph In-reply-to: (message from Hans-Bernhard Broeker on Mon, 23 Apr 2001 15:03:51 +0200 (MET DST)) Subject: Re: Fixed core dumper in dpmiexcp.c References: Reply-To: djgpp-workers AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: djgpp-workers AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk > Date: Mon, 23 Apr 2001 15:03:51 +0200 (MET DST) > From: Hans-Bernhard Broeker > > The most curious one I observed was with two DOS boxes open, in > Win98 (one with DJGPP environment set up, the other a plain DOS shell, but > that's not an important detail, I think). Running the test program in one > of the shells crashed (SIGSEGV, coredump progress level reported to be > 11), but *only* iff another DOS shell was open. I.e. closing the other DOS > window, the test program successfully dumped a correct core, opening the > other window again and repeating the test in the first window caused it to > crash, again. All the while, running the test in the _other_ window worked > fine. > > I then went on and investigated a bit further. I found that switching to > Unixy sbrk() algorithm via the crt0 startup flag fixed the bug. With > non-moving sbrk() in use, the crash usually happened when it tried to dump > the (large) memory block sitting between the stack and the memory space > reserved by the stub/crt0. This is typical to the case when the base address of out DS is near the upper edge of 4GB, so that it wraps around into the low addresses, and the DS limit is very large. We had in the past reports of weird crashes in old versions of GCC, also due to Page Faults, which went away when Unixy sbrk was used. You should see the difference between these two cases in the segment base address and limit printed in the crash message. > I.e: the bug may be related to the fragmented memory layout created by > non-Unix sbrk. It happens as the coredumper tries to dumps a rather large > memory block (several megabytes, typically) that isn't actually all used > by the program (the coredump, if it succeeds, is around 500 to 700 KB, > altogether). This seems to indicate that some of these pages are not mapped into the program's address space, or become unmapped at Windows' whim. Charles, any ideas how can this happen? I understand that __djgpp_memory_handle_list[] only holds pages that must be mapped into our address space, right? So touching that memory should never Page Fault our application, it should at most Page Fault the Windows memory manager. If we cannot find any problem in our code, perhaps setting up a SIGSEGV handler, that would simply skip a problematic page and longjmp to continue with other pages, will be an okay work-around? > The details of the bug depend on the status of Windows' memory management, > too, it seems. E.g., I failed to reproduce it at all, for several days. > But for some reason I don't know, it reappeared after another turn-on > of the machine, and once it has appeared, it happens somewhat reliably > until shutdown. It probably depends on what exactly do you do since the bootstrap. Try to record everything you do, each command you invoke and in what order, and reproduce that exactly the next time.