X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Fri, 5 Jun 2009 18:35:10 +0200 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: ASLR sometimes stops working on Vista with 1.7? [was: Re: Cygwin 1.7 release (was ...)] Message-ID: <20090605163510.GF23519@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <1244131746 DOT 30024 DOT 1318796263 AT webmail DOT messagingengine DOT com> <4A282063 DOT 9030804 AT users DOT sourceforge DOT net> <4A286B99 DOT 6020702 AT users DOT sourceforge DOT net> <20090605120936 DOT GD23519 AT calimero DOT vinschen DOT de> <4A293237 DOT 2010102 AT cwilson DOT fastmail DOT fm> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4A293237.2010102@cwilson.fastmail.fm> User-Agent: Mutt/1.5.19 (2009-02-20) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com [Caution, another long reply. For those just looking for a simple workaround, skip to line 86 of the mail body.] On Jun 5 10:56, Charles Wilson wrote: > Corinna Vinschen wrote: > > I can reproduce the "unable to remap" on W7RC by running `cygport > > automake1.11-1.11-10 compile'. > > Uhhh...I'm glad to hear that? Or not... > > > The culprit in my case is always the > > same DLL, a run-time loaded perl DLL called Cwd.dll. Even after > > rebaseall, it still doesn't work because the Windows Loader tries to > > load the DLL into an entirely different address. > > You did reboot, right? IIRC Windows only calculates the new base No. Because *none* of my DLLs is marked to be ASLR compatible. I'm testing what happens OOTB. The entire problem starts already in the parent when the DLL base address is 0x6ee00000. The parent inevitably rebases the DLL to a very low address like 0xa00000 or even 0x900000 at load time. The child then fails to map the DLL to the same address. However, if I rebase the DLL to some other spot, like 0x65000000, then the DLL is loaded at that address exactly, and everything works fine. I still don't think this has anything to do with ASLR. ASLR only complicates the picture. AFAICS, there's no guarantee that the address computed by ASLR will help forever. It only eases the underlying problem by chance if the addresses happen to have a low chance for collision. The problem is not the fact that the DLL is rebased at all in the parent. Even though in my case the address range 0x6ee00000-0x6ee08000 isn't taken by another DLL, it could be taken by memory dynamically allocated by one of the formerly loaded DLLs. The real shit starts with the fact that W7 (and Vista, too, apparently) rebases the DLL to an address which is so very low in the address space of the application. This is uncomfortably near to where the process heap is expected to be. When Vista was new, we had the problem already in a somewhat different way. Note this comment in Cygwin's heap.cc: /* For some obscure reason Vista and 2003 sometimes reserve space after calls to CreateProcess overlapping the spot where the heap has been allocated. This apparently spoils fork. The behaviour looks quite arbitrary. Experiments on Vista show a memory size of 0x37e000 or 0x1fd000 overlapping the usual heap by at most 0x1ed000. So what we do here is to allocate the heap with an extra slop of (by default) 0x400000 and set the appropriate pointers to the start of the heap area + slop. A forking child then creates its heap at the new start address and without the slop factor. Since this is not entirely foolproof we add a registry setting "heap_slop_in_mb" so the slop factor can be influenced by the user if the need arises. */ This problem with dynamically linked DLLs looks quite similar. Some space after the heap is reserved in the child which wasn't reserved in the parent. If Vista/W7 would refrain from using the lowest available address in the parent already, the entire problem might go away (aka "occurs very, very seldom") > > I think I'm going to ask MSFT if there's any workaround for > > this problem. > > If my understanding of ASLR is correct, then ASLR *ought* to have solved > this problem, except for systems with a LOT of dynbase-marked DLLs that > have been loaded during the same boot session, such that you "run out" > of ASLR-tracked addresses (The ASLR mappings are shared across all > processes, are persistent for the entire logon session, I think -- so > you could eventually run out). As I mentioned above, I don't think that ASLR can solve this problem once and for all. Whereever any DLL is rebased to by the ASLR mechanism, there's a chance that the address is already taken in the child when LoadLibrary is called for the dynamically loaded DLL. > But IMO it is not working, for some reason, with the perl DLLs. Note > that it's not always Cwd.dll. If you reboot, rename Cwd.dll to > something else, and keep going, a few things will happen: > 1) perl won't work quite right, because the Cwd.dll really is needed by > the scripts that 'use Cwd;' > 2) ignoring that, keep going. Eventually the remap problem will hit > another perl DLL. In my case, Posix.dll. Here's another thought: I examined the address layout of the perl process again, and it struck me as weird that the base addresses of all the DLLs which get dynamically loaded by perl are so near together. It looks like the problem is actually tightened by the order in which the DLLs are rebased by rebaseall, and the order in which the DLLs are loaded into the running process. Some perl DLL (Dumper.dll?) allocates additional memory and that's right after it's own image. That's where Cwd.dll is based to. Cwd.dll gets rebased and ... poof. What I did then was to change the offset to rebaseall: ash$ rebaseall -o 0x20000 (default is 0x10000) Then I reinstalled /bin/cyggmp-3.dll and reran cygport. This time it ran fine. This is still w/o ASLR flags. In this configuration, I can reproduce running cygport successfully every time. > Could it be possible that cygwin's dlopen (or fork) implementation is > doing something that occasionally defeats ASLR, such that eventually a > perl parent process [**] dlopen's Cwd.dll at the wrong memory location? Not that I can see. The memory for the data storing the loaded DLLs is loaded from the parent memory into a stack slot. There's no other memory allocation going on. Well, except when LoadLibrary already failed. > [**] obviously this perl "parent" process was itself invoked as a > fork/exec from, say, bash, but we've long since gotten past the exec() > for perl, if we're down to dlopening DLLs needed by virtue of 'use' > statements in a particular .pl script > > Hmmm...what if it's a race condition in fork/exec during a chain of > perl's? Let's take a look at what happens in autoreconf...(note that > this is all supposition. I hope it is accurate, and believe it is > reasonable so, but I haven't explicitly straced the process) What I see only affects one single perl parent and the forked child. There's not a single perl process involved which had the dynamically loaded DLLs loaded at the correct (aka "desired") spot in memory. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/