From: sandmann AT clio DOT rice DOT edu (Charles Sandmann)
Message-Id: <10207201605.AA21517@clio.rice.edu>
Subject: Re: Emacs CVS and Windows NT 4
To: djgpp-workers AT delorie DOT com
Date: Sat, 20 Jul 2002 11:05:38 -0500 (CDT)
Cc: rich AT phekda DOT freeserve DOT co DOT uk
In-Reply-To: <3D396624.53AE0ADE@phekda.freeserve.co.uk> from "Richard Dawe" at Jul 20, 2002 02:31:16 PM
X-Mailer: ELM [version 2.5 PL2]
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Reply-To: djgpp-workers AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: djgpp-workers AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com
Precedence: bulk

> As I've said in another mail, it was crashing on a cli instruction in
> brk_common. 

Which was the fix which worked for me under Windows 2000 on a small
test program.  Didn't have time to test on NT4.

> I discovered that I wasn't running a "pure" copy of CVS. So I've
> reverted the changes to src/libc/crt0 and rebuilt. 

What changes were those?

> Now it looks like the crash is in crt0.S here:
> 
> 	LCALL(sbrk16_api_ofs) <--- CRASHES HERE

This is a wrapper around the DPMI resize call.  Crashing here usually
means a hardware interrupt happened while the memory was being moved;
but you need to be careful since you can't step into this call.  You
shouldn't ever get a hardware interrupt since we disabled them, but
this call doesn't work on NT4 or Win2K (seems OK on XP).  At least
on Win2K adding the CLI as mentioned above fixed it (not running
under a debugger).  This usually makes the entire NTVDM go away.

> This is with the same patten of calls to __default_morecore. I get backtraces
> like this, when I hit the crash in brk_common under gdb:
> 
> "Exiting due to signal SIGTRAP
> Debug at eip=00000012
> eax=005e0901 ebx=0000005e ecx=005e0000 edx=005b8000 esi=00590176 edi=01761da0
> ebp=0056ad28 esp=0056acf0 program=C:\DEVELOP\EMACS\BIN\EMACS.EXE
> cs: sel=02cf  base=000221c0  limit=0000026f

CS:EIP is based in DOS memory with a selector of limit of 623 bytes.  This
looks like the sbrk16 helper to me.  EIP=12 is the first exectuable
instruction (if I count right) - which should never fail.  This looks
like a debugger caused crash (inability to handle multi-selector code
debugging?)

> It seems to be sensitive to how much keyboard activity there is. I haven't
> worked out the relation between keypresses and crashes yet.

Right - it needs to be a hardware interrupt, and happen between the time
we resize/move the memory area and the selector is updated with the new
memory base.

Try the program I provided in email dated 19 May 2002 15:36 - small and
easy to debug.  The upstroke keyboard interrupt is very easy to cause an
interrupt in the sbrk and cause a crash.

> Has anyone got any ideas what's going on? Is there any more information I can
> provide?

See if you can make the simple test program crash under NT 4 with and without
the CLI change in crt0.