From: "The Owl" To: Eli Zaretskii Date: Mon, 23 Apr 2001 17:05:57 +0200 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: Re: win2000/ntvdm/djgpp (fwd) CC: sandmann AT clio DOT rice DOT edu, djgpp-workers AT delorie DOT com Message-ID: <3AE460F5.18608.15FA3F3@localhost> References: <3AE33D9C DOT 1078 DOT 1AC82EC3 AT localhost> In-reply-to: X-mailer: Pegasus Mail for Win32 (v3.12c) Reply-To: djgpp-workers AT delorie DOT com > This meand that Bash is also part of the picture, since the commands > involve pipes and quoting. It might be interesting to see if setting > SHELL to point to COMMAND.COM will change the phenomena you observe, > because Bash participation in this means you have one more level of > nesting of DPMI programs. anything that makes the nesting level less than 2 below 'make' (ie. 'make' would spawn 'gcc' directly, not via 'bash') should 'fix' the problem in that we wouldn't trigger the bug in ntvdm (at least while 'make' executes, once 'make' exits, we get the problem again). to demonstrate this, try out the following: 1. create a small makefile like this: ----------------- all: -grep 1 dummy -grep 2 dummy ----------------- 2. start 'bash' 3. start 'make' at the 'bash' prompt two things can happen now, depending on what your SHELL is and when hardware interrupts are going to be reflected into ntvdm. either 'make' itself crashes some time after the first 'grep' returned, or if 'make' finishes, you will get a crash in 'bash' as soon as you touch the keyboard. all this is because the exception stack is gone as soon as the first 'grep' exits (if SHELL is some dpmi app) or when 'make' exits (if SHELL is eg. command.com), and next time a hardware interrupt is going to be reflected, you get the fault in ntvdm. > I can easily produce such a change in dosexec.c and send you the > diffs, but do you have a convenient way of rebuilding Make with the > modified libc.a? yes, i can pretty much rebuild everything, i just did not want to patch libc/dosexec myself as i am not familiar with the djgpp port and hence wanted to let the more knowledgable guys produce the fix. > (If this works with Make and your cmd1 and cmd2, it will probbaly make > sense to rebuild gcc and binutils as well, and maybe Bash, and try some > deeply nested builds of complicated packages.) yes, this is right, pretty much everything must be rebuilt that statically links to libc and executes other dpmi apps ;-(. maybe i can convince myself and will produce some easier to use workaround which would be a patch or extension to ntvdm or dosx - i will see how much time/mood i will have at the next weekend. or if there is someone else interested in doing it, i would be more than happy to provide the necessary information. but please note that this approach may raise some eyebrows at microsoft, so i'd appreciate if someone more familiar with legal issues would voice his opinion on this (i can imagine that djgpp could not include such a patch/solution in the official distribution). > From your description, it sounds like we will still have a (much > narrower) window of opportunity--between the time NTVDM resets > _CurrentPSPSelector to zero and the time the parent calls 21/50--where > any interrupt that has to be reflected will crash. Is that right? no. this is because the exception stack gets freed only when a dpmi app exits while _CurrentPSPSelector is 0, which in turn gets set to 0 after any dpmi app exits. in plain english, you need *two* dpmi apps exit right after each other to get ntvdm free the exception stack. if we ensure that _CurrentPSPSelector is never 0 before a dpmi app exits, we solved the problem, there will be no chance for ntvdm to free the exception stack. what my suggested change would do is that between two such dpmi app exits there would always be a call to int 21/50 which would set _CurrentPSPSelector to a non-0 value (and for that matter, it would be the proper selector of the 'current dpmi app', therefore ntvdm would free up the right guy's memory on exit). of course, if a dpmi app itself directly executes other dpmi apps without using libc, we get the problem - but going around libc was never guaranteed to have a well-defined (system-independent) behaviour anyway. there is a certain kind of window of opportunity in the sense that _CurrentPSPSelector will be 0 between the child's exit and our call to set the PSP selector. as far as i can tell, there is no critical code in ntvdm that could be called during this window. > Also, I wonder how come this never happened in NT4. What you describe > seems like a very basic functionality of NTVDM, which I would not > expect to change between v4 and v5 of the same OS. Go figure... well, i would never 'not expect' something from microsoft, especially if it is some internal change to one of their officially not supported code... actually, in this case someone wanted to be nice and decided that he would clean up allocated memory that dpmi apps failed to, but at the same time he did not consider that dpmi apps could be nested. this change occured somewhere between nt4 and w2k (i had a brief look at various ntvdm versions as far back as 3.51). while we are at it, i would like to add that w2k sp1 does not change ntvdm/dosx, so that explains why the bug is still there - and i bet it will be present in sp2/3 (whichever comes out next ;-) as well.