delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/2001/04/24/06:55:14

X-Authentication-Warning: acp3bf.physik.rwth-aachen.de: broeker owned process doing -bs
Date: Tue, 24 Apr 2001 12:54:56 +0200 (MET DST)
From: Hans-Bernhard Broeker <broeker AT physik DOT rwth-aachen DOT de>
X-Sender: broeker AT acp3bf
To: djgpp-workers AT delorie DOT com
cc: Charles Sandmann <sandmann AT clio DOT rice DOT edu>, n_abing AT ns DOT roxas-online DOT net DOT ph
Subject: Re: Fixed core dumper in dpmiexcp.c
In-Reply-To: <7263-Mon23Apr2001205804+0300-eliz@is.elta.co.il>
Message-ID: <Pine.LNX.4.10.10104241240140.5316-100000@acp3bf>
MIME-Version: 1.0
Reply-To: djgpp-workers AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: djgpp-workers AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com

On Mon, 23 Apr 2001, Eli Zaretskii wrote:

> > Date: Mon, 23 Apr 2001 15:03:51 +0200 (MET DST)
> > From: Hans-Bernhard Broeker <broeker AT physik DOT rwth-aachen DOT de>
[...]

> > Unixy sbrk() algorithm via the crt0 startup flag fixed the bug.  With
> > non-moving sbrk() in use, the crash usually happened when it tried to dump
> > the (large) memory block sitting between the stack and the memory space
> > reserved by the stub/crt0.
> 
> This is typical to the case when the base address of out DS is near
> the upper edge of 4GB, so that it wraps around into the low addresses,
> and the DS limit is very large.  

Sounds familiar. Sorry I don't have a crash dump with me (my DJGPP box at
home is not connected to the net), but IIRC, the CS/DS base address in
crashing runs (but also in some of the non-crashing ones!) was slightly
above 0x80000000, with a limit of almost 4 GBytes. I.e. yes: this segment
does wrap around the linear 4GB border.

The crash happened a when it tried to dump the 2nd or 3rd of a computed 64
chunks (64KB each).

So yes, it may well be that the code computing the actual size of
individual memory blocks is wrong, and thus tries to dump unmapped memory.
As far as I understand the code, it assumes that the whole address range
spanned __djgpp_memory_handle_list[] is mapped, up to
__djgpp_selector_limit, with no holes. It computes sizes of individual
blocks by their distance from the block with the next highest start
address.

> You should see the difference between these two cases in the segment
> base address and limit printed in the crash message.

Indeed, in Unix-sbrk() mode, the DS limit is a *lot* smaller. I don't
remember clearly whether the base was affected, too...

> This seems to indicate that some of these pages are not mapped into
> the program's address space, or become unmapped at Windows' whim.

That could explain the effect, yes.

> Charles, any ideas how can this happen?  I understand that
> __djgpp_memory_handle_list[] only holds pages that must be mapped into
> our address space, right?  So touching that memory should never Page
> Fault our application, it should at most Page Fault the Windows memory
> manager.

One of the possible reasons why this never causes problems if CWSDPMI is
used might be if CWSDPMI succeeded the DPMI call in this chunk of code
inside the core dumper (make_decent_memory_block_list):

  /* Now try the DPMI call; if it works, we can override the previous 
   * data; however, I have yet to find a DPMI server that supports it
   */
  for (i = 0; i < num_mem_blocks; i++)
  {
    __dpmi_meminfo info = { 0, 0, 0 };
    info.handle = __djgpp_memory_handle_list[i].handle;
    if (__dpmi_get_memory_block_size_and_base (&info) != -1)
      if (info.size) mem_block_list[i].size = info.size;
  }

Or maybe it's just more well-behaved in nonmoving sbrk() mode than Win9x.

> If we cannot find any problem in our code, perhaps setting up a
> SIGSEGV handler, that would simply skip a problematic page and longjmp
> to continue with other pages, will be an okay work-around?

That figures --- if there's no other way to detect unmapped pages of
memory, that would be the way out.

> > The details of the bug depend on the status of Windows' memory management,
> > too, it seems. E.g., I failed to reproduce it at all, for several days.
> > But for some reason I don't know, it reappeared after another turn-on
> > of the machine, and once it has appeared, it happens somewhat reliably
> > until shutdown.
> 
> It probably depends on what exactly do you do since the bootstrap.
> Try to record everything you do, each command you invoke and in what
> order, and reproduce that exactly the next time.

Will try. But that's a very tiresome procedure, of course :-( And even
then, it still won't help anyone else reproduce the problem on another
machine, due to ever so slight difference in the collection of installed
software versions.

-- 
Hans-Bernhard Broeker (broeker AT physik DOT rwth-aachen DOT de)
Even if all the snow were burnt, ashes would remain.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019