X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-1.0 required=5.0 tests=AWL,BAYES_50,J_CHICKENPOX_12,J_CHICKENPOX_52,J_CHICKENPOX_62,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: sourceware.org Message-ID: <4A293237.2010102@cwilson.fastmail.fm> Date: Fri, 05 Jun 2009 10:56:55 -0400 From: Charles Wilson User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.21) Gecko/20090302 Thunderbird/2.0.0.21 Mnenhy/0.7.6.666 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: ASLR sometimes stops working on Vista with 1.7? [was: Re: Cygwin 1.7 release (was ...)] References: <1244131746 DOT 30024 DOT 1318796263 AT webmail DOT messagingengine DOT com> <4A282063 DOT 9030804 AT users DOT sourceforge DOT net> <4A286B99 DOT 6020702 AT users DOT sourceforge DOT net> <20090605120936 DOT GD23519 AT calimero DOT vinschen DOT de> In-Reply-To: <20090605120936.GD23519@calimero.vinschen.de> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Corinna Vinschen wrote: > I can reproduce the "unable to remap" on W7RC by running `cygport > automake1.11-1.11-10 compile'. Uhhh...I'm glad to hear that? Or not... > The culprit in my case is always the > same DLL, a run-time loaded perl DLL called Cwd.dll. Even after > rebaseall, it still doesn't work because the Windows Loader tries to > load the DLL into an entirely different address. You did reboot, right? IIRC Windows only calculates the new base address(es) for dynbase-marked DLLs loaded after a boot. Also to "turn off" dynbase for a particular DLL, you have to reboot after removing the dynbase flag, so that the OS "forgets" about its randomized Image Base computed for the current boot session, when it had the +dynbase flag. > When examining the memory layout of the parent, it stands out that > Cwd.dll was already loaded into another address than the DLLs base > address. The base addr of Cwd.dll is 0x6ee00000, the end address would > be 0x6ee08000. Is that the ASLR-computed Image Base, or the one reported by objdump from the file on disk? See below. > There's no other DLL in this memory area according to > the memory map. Nevertheless the DLL has been loaded into the rather > low address 0xa00000 in the parent. When trying to map this DLL into > the same address in the child, it fails. Right. ASLR isn't putting the DLL in the correct location *in the parent*. (See below). > When I rebase Cwd.dll to some other address like 0x65000000, then it > works for me. > > Probably the memory at 0x6ee00000 is actually used by some Windows DLL > at that time. The fact that the DLL got rebased already in the parent > is not exactly surprising, just very annoying. > > I don't think that this has anything to do with ASLR. It's not the way > ASLR is documented to work. Setting or resetting the ASLR flag should > have no effect from all I can tell. If anything, setting the ASLR > flag in the executable should make things worse in case of fork(). ASLR works on DLLs, not EXEs IIRC. peflagsall by default doesn't even set +dynbase on EXEs. echo "Note: peflagsall will NOT set the dynamicbase flag on executables, nor will" echo " it set the tsaware flag on dlls. If you must do this, use peflags itself" At each boot, an entirely new random starting location is computed (call it the "uber-base" address). Then, as each dynbase-marked DLL is first loaded into memory for this boot session, it is "rebased" to the next available memory location starting at the "uber-base" address. These new base addresses are remembered for each DLL. In effect, the OS "pretends" that the Image Base is the new randomized Base address; obviously for this initial load of any given dynbase-marked DLL, the Base address will be equal to the (random) Image Base. Since the new (random) Image Base is hopefully NOT the same as the actual, on-disk Image Base, the DLL will have had relocations updated even though a Process Viewer will report that the in-memory DLL's Base addr == its "Image Base" -- it's just that the (new) Image Base is the ASLR-directed lie. So, *after* a reboot following marking cygwin DLLs with dynbase, IF you have *ever* successfully loaded Cwd.dll into memory, then that "slot" -- the new "random base address" for Cwd -- is forever [*] reserved for Cwd.dll and no other (dynbase-marked) DLL, including windows ones, should ever be allowed by the OS to conflict with that reserved slot. At least, until you run out of ASLR-tracked memory addrs. You can tell what the ASLR-reserved base address for a DLL is by looking at a running process (e.g. in sysinterals process viewer) that has loaded the DLL, and look at what PV reports is the Image Base. (It ought to match the reported 'Base' address, if ASLR is working). However, this 'Image Base' will be *different* from what objdump reports when examining the DLL on disk. That's your proof that ASLR has computed a new (random) base address for this DLL, for this boot session: the claimed Image Base of an in-memory DLL doesn't match the objdump-reported Image Base of the DLL on disk. [*] at least until the next reboot. > This is entirely the good old fork() problem trying to get the memory > layout of the child into the same shape as in the parent. Right -- but the problem is that ASLR is not setting up the memory layout the way that it promised to, in the parent. > This is really a bad problem since it seem to have gotten even worse > with W7. Crap. > I think I'm going to ask MSFT if there's any workaround for > this problem. If my understanding of ASLR is correct, then ASLR *ought* to have solved this problem, except for systems with a LOT of dynbase-marked DLLs that have been loaded during the same boot session, such that you "run out" of ASLR-tracked addresses (The ASLR mappings are shared across all processes, are persistent for the entire logon session, I think -- so you could eventually run out). But IMO it is not working, for some reason, with the perl DLLs. Note that it's not always Cwd.dll. If you reboot, rename Cwd.dll to something else, and keep going, a few things will happen: 1) perl won't work quite right, because the Cwd.dll really is needed by the scripts that 'use Cwd;' 2) ignoring that, keep going. Eventually the remap problem will hit another perl DLL. In my case, Posix.dll. I don't think this is the fault of these DLLs, per se. It's just that perl has a lot of 'em, and when repetitively running the autotools it always uses the same set. Plus, you tend to have a TON of individual perl processes that run, and each time, that perl is going to dlopen those DLLs. This ought to be where ASLR shines...but one of those times, the DLL gets LoadLibraried to the wrong location. This is fine, until THAT process happens to fork. Bang, you're dead. Could it be possible that cygwin's dlopen (or fork) implementation is doing something that occasionally defeats ASLR, such that eventually a perl parent process [**] dlopen's Cwd.dll at the wrong memory location? [**] obviously this perl "parent" process was itself invoked as a fork/exec from, say, bash, but we've long since gotten past the exec() for perl, if we're down to dlopening DLLs needed by virtue of 'use' statements in a particular .pl script Hmmm...what if it's a race condition in fork/exec during a chain of perl's? Let's take a look at what happens in autoreconf...(note that this is all supposition. I hope it is accurate, and believe it is reasonable so, but I haven't explicitly straced the process) perl1 -- This one was fork/exec'ed by bash. So, after fork() you have a child process whose memory looks just like the parent bash, with all of its DLLs. Then, you exec() perl, which eventually causes CreateProcess to do its thing -- the windows loader loads cygwin1.dll, cygperl5_10.dll, and various dependent DLLs (which do NOT include any of the little perl DLLs like Cwd). Eventually, we get to perl's main(), and it parses some script. First thing it does is see some 'use' statements that (may) force it to dlopen some "little" perl DLLs like Cwd. This works as expected. dynbase and all. Then, suppose the script that perl1 is interpreting (autoreconf, in this case) has a fork(). Maybe to run an OS command, or one of the other autotools like aclocal (here's an example from autoreconf-2.63): xsystem ("$aclocal $flags"); xsystem is an AutoM4te function that eventually calls "system (@_)", where system() is obviously implemented by cygperl5_10.dll Now, aclocal is also a perl script. So, deep in the bowels of perl's system() implementation, first thing that happens is a call to cygwin's fork() implementation -- cygwin successfully reproduces the memory layout of the autoreconf perl1, Cwd.dll and all. But, this is in fact a different process than perl1 -- call it perl2. Then, there's an exec() in perl's system() implementation. Cygwin eventually figures out that aclocal is a perl script, with a #!/bin/perl shebang, and realizes it needs to CreateProcess perl.exe with a command line containing the original argv[0]. In this third process (perl3), only cygwin1.dll, cygperl5_10.dll, and dependencies are initially loaded (by the windows Runtime Loader). NOT Cwd.dll or any of the "little" DLLs that both perl1 and perl2 have. Then some magic [A] happens, and perl2 goes away, and cygwin magically connects perl1 and perl3 as parent and child. Next, perl3 starts parsing the aclocal script...and eventually hits the 'use' statements, including 'use Cwd;' THIS time, however, when it dlopen's Cwd.dll the library is loaded into the "wrong" address in perl3's virtual memory layout. Now, this is not an immediate problem, because we're not (yet) trying to "match" any existing memory layout. The question is -- WHY does it happen? I'm wondering if somewhere in the magic [A] above, in the perl2 process, cygwin is marking the memory in perl3 at the location of the dlopen'ed libraries in perl1/2 as used, and hasn't yet gotten around to realizing that those locations are, in fact, not used by perl3? So that when dlopen() is called, the virtual memory map of perl3 has the memory needed by Cwd.dll (that is, the ASLR-computed "fake" Image Base) marked as used, such that when dlopen eventually delegates to LoadLibrary, LoadLibrary has no choice but to find somewhere else to put it. If so, this would be a possible problem for ANY dll dlopened by both parent (perl1) and child (perl3). Given the sporadic nature, I'm wondering about a race condition in the fork/exec [A] magic... In that case, it wouldn't be ASLR's fault. It (and the apparent sensitivity of cygwin-1.7 to the issue as compared to cygwin-1.5) could be explained by a combination of changes in cygwin-1.7's fork() implementation, coupled with Vista/W7's CreateProcess not behaving in exactly the way we expect given XP and older's behavior. [***] Anyway, perl3 (the aclocal) process, continues merrily until it, too, hits a fork/exec -- maybe because of this line: xsystem ('cp', $src, $dest); At this point, it's a standard remap problem: cygwin does the fork() which creates perl4, and while manually loading all the DLLs currently in use by perl3 and trying to ensure the memory map of perl3 and perl4 match, Cwd.dll is loaded -- perhaps into the "correct" (ASLR-directed) Image Base. But perl3 had Cwd in the "wrong" (low) address...and bang. We never get to the exec("cp",...) step in perl4, and never try to create the cp5 process. [***] if this sounds reasonable, I'll take a look at the 1.7 changes in fork() but it'll have to wait until Sunday or Monday 'cause of other commitments. Unless somebody beats me to it. -- Chuck -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/