delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/2001/04/23/11:09:46.1

From: "The Owl" <theowl AT freemail DOT c3 DOT hu>
To: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
Date: Mon, 23 Apr 2001 17:05:57 +0200
MIME-Version: 1.0
Subject: Re: win2000/ntvdm/djgpp (fwd)
CC: sandmann AT clio DOT rice DOT edu, djgpp-workers AT delorie DOT com
Message-ID: <3AE460F5.18608.15FA3F3@localhost>
References: <3AE33D9C DOT 1078 DOT 1AC82EC3 AT localhost>
In-reply-to: <Pine.SUN.3.91.1010423162856.27048C-100000@is>
X-mailer: Pegasus Mail for Win32 (v3.12c)
Reply-To: djgpp-workers AT delorie DOT com

> This meand that Bash is also part of the picture, since the commands 
> involve pipes and quoting.  It might be interesting to see if setting 
> SHELL to point to COMMAND.COM will change the phenomena you observe, 
> because Bash participation in this means you have one more level of 
> nesting of DPMI programs.

anything that makes the nesting level less than 2 below 'make' (ie.
'make' would spawn 'gcc' directly, not via 'bash') should 'fix' the
problem in that we wouldn't trigger the bug in ntvdm (at least while
'make' executes, once 'make' exits, we get the problem again).

to demonstrate this, try out the following:

1. create a small makefile like this:

-----------------
all:
	-grep 1 dummy
	-grep 2 dummy
-----------------

2. start 'bash'

3. start 'make' at the 'bash' prompt

two things can happen now, depending on what your SHELL is and when
hardware interrupts are going to be reflected into ntvdm.

either 'make' itself crashes some time after the first 'grep' returned,
or if 'make' finishes, you will get a crash in 'bash' as soon as you
touch the keyboard.

all this is because the exception stack is gone as soon as the first
'grep' exits (if SHELL is some dpmi app) or when 'make' exits (if SHELL
is eg. command.com), and next time a hardware interrupt is going to be
reflected, you get the fault in ntvdm.

> I can easily produce such a change in dosexec.c and send you the
> diffs, but do you have a convenient way of rebuilding Make with the
> modified libc.a?

yes, i can pretty much rebuild everything, i just did not want to
patch libc/dosexec myself as i am not familiar with the djgpp port
and hence wanted to let the more knowledgable guys produce the fix.

> (If this works with Make and your cmd1 and cmd2, it will probbaly make
> sense to rebuild gcc and binutils as well, and maybe Bash, and try some
> deeply nested builds of complicated packages.)

yes, this is right, pretty much everything must be rebuilt that
statically links to libc and executes other dpmi apps ;-(. maybe i can
convince myself and will produce some easier to use workaround which
would be a patch or extension to ntvdm or dosx - i will see how much
time/mood i will have at the next weekend. or if there is someone else
interested in doing it, i would be more than happy to provide the
necessary information. but please note that this approach may raise
some eyebrows at microsoft, so i'd appreciate if someone more familiar
with legal issues would voice his opinion on this (i can imagine that
djgpp could not include such a patch/solution in the official distribution).

> From your description, it sounds like we will still have a (much
> narrower) window of opportunity--between the time NTVDM resets
> _CurrentPSPSelector to zero and the time the parent calls 21/50--where
> any interrupt that has to be reflected will crash.  Is that right?

no. this is because the exception stack gets freed only when a dpmi
app exits while _CurrentPSPSelector is 0, which in turn gets set
to 0 after any dpmi app exits. in plain english, you need *two* dpmi
apps exit right after each other to get ntvdm free the exception
stack. if we ensure that _CurrentPSPSelector is never 0 before a
dpmi app exits, we solved the problem, there will be no chance for
ntvdm to free the exception stack.

what my suggested change would do is that between two such dpmi app
exits there would always be a call to int 21/50 which would set
_CurrentPSPSelector to a non-0 value (and for that matter, it would
be the proper selector of the 'current dpmi app', therefore ntvdm
would free up the right guy's memory on exit).

of course, if a dpmi app itself directly executes other dpmi apps
without using libc, we get the problem - but going around libc
was never guaranteed to have a well-defined (system-independent)
behaviour anyway.

there is a certain kind of window of opportunity in the sense that
_CurrentPSPSelector will be 0 between the child's exit and our
call to set the PSP selector. as far as i can tell, there is no
critical code in ntvdm that could be called during this window.

> Also, I wonder how come this never happened in NT4.  What you describe
> seems like a very basic functionality of NTVDM, which I would not
> expect to change between v4 and v5 of the same OS.  Go figure...

well, i would never 'not expect' something from microsoft, especially
if it is some internal change to one of their officially not supported
code... actually, in this case someone wanted to be nice and decided that
he would clean up allocated memory that dpmi apps failed to, but at the
same time he did not consider that dpmi apps could be nested. this change
occured somewhere between nt4 and w2k (i had a brief look at various
ntvdm versions as far back as 3.51). while we are at it, i would like
to add that w2k sp1 does not change ntvdm/dosx, so that explains why the
bug is still there - and i bet it will be present in sp2/3 (whichever
comes out next ;-) as well.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019