X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=1.6 required=5.0 tests=AWL,BAYES_50,RCVD_IN_DNSWL_NONE,RCVD_IN_HOSTKARMA_YE,RCVD_IN_SORBS_WEB,SPF_HELO_PASS X-Spam-Check-By: sourceware.org From: "James Johnston" To: Subject: Two probable basing issues causing fork failures: (1) cygreadline7.dll has ASLR enabled, (2) default base address conflicts with ASLR-relocated/system DLLs Date: Fri, 20 Apr 2012 17:44:38 -0000 Message-ID: <00f201cd1f1d$43430230$c9c90690$@motionview3d.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Hi, Before I describe the issues I think I've found, I'd like to clarify my understanding. In the past, I have periodically run into random issues when starting new programs from the bash shell. The errors are always random and cryptic ("bad address", etc. - that sort of thing). Today this issue came to a head on one installation of Cygwin 1.7.9, where simply opening the bash shell could not happen without a large number of these errors, due to commands in the /etc/profile failing. Having been reading on this mailing list, my understanding is that this is an issue with basing of DLLs - is that correct? Specifically, that Cygwin requires that a child process created by fork() to have its Cygwin DLLs loaded into the child process's address space in the exact same places as they are loaded in the parent process, correct? And it is also correct that Cygwin does not care where Windows / operating system DLLs are loaded, provided that they do not conflict with Cygwin DLLs or any heap/stack allocated by the Cygwin parent process? I assume the answer to all these questions is yes. Please correct me if I am wrong! So, I then decided to analyze the address space of bash.exe to see if I could determine the source of the problem, using VMMap at http://technet.microsoft.com/en-us/sysinternals/dd535533 to see where each DLL is loaded. (ListDLLs, another SysInternals tool is handy, too). I have checked it against a clean Cygwin installation with default packages installed, which I just installed today in a clean virtual machine. The issues still exist in that virtual machine, so I don't feel it is something specific to my particular configuration. In both cases, rebaseall was run as part of setup. An analysis of bash.exe in VMMap on a clean Windows 7 computer shows the following PE images were loaded in bash's address space (I have cut out most operating system DLLs at the end of address space): * 0x00040000: C:\Windows\System32\apisetschema.dll (ASLR) * 0x00400000: C:\cygwin\bin\bash.exe * 0x61000000: C:\cygwin\bin\cygwin1.dll * 0x6F4A0000: C:\Windows\SysWOW64\winrnr.dll (ASLR) * 0x6F500000: C:\cygwin\bin\cygncursesw-10.dll * 0x6F710000: C:\cygwin\bin\cygreadline7.dll (ASLR) * 0x6F740000: C:\Windows\SysWOW64\pnrpnsp.dll (ASLR) * 0x6F760000: C:\Windows\SysWOW64\NapiNSP.dll (ASLR) * 0x6F780000: C:\cygwin\bin\cygintl-8.dll * 0x6F7E0000: C:\cygwin\bin\cygiconv-2.dll * 0x6FA80000: C:\cygwin\bin\cyggcc_s-1.dll * 0x73720000: C:\Windows\System32\wow64cpu.dll (ASLR) * .... rest of system DLLs .... Here is the same thing, but on a Windows XP computer (remember, XP does not support ASLR): * 0x00400000: C:\cygwin\bin\bash.exe * 0x61000000: C:\cygwin\bin\cygwin1.dll * 0x629C0000: C:\WINDOWS\system32\lpk.dll * 0x6FB70000: C:\cygwin\bin\cygreadline7.dll * 0x6FC20000: C:\cygwin\bin\cygncursesw-10.dll * 0x6FDA0000: C:\cygwin\bin\cygintl-8.dll * 0x6FDB0000: C:\cygwin\bin\cygiconv-2.dll * 0x6FF90000: C:\cygwin\bin\cyggcc_s-1.dll * 0x71A50000: C:\WINDOWS\system32\mswsock.dll * .... rest of system DLLs .... Now, right off I see two problems with the Cygwin DLLs; here they are with some possible solutions: 1. cygreadline7.dll was compiled with /DYNAMICBASE, as indicated by the ASLR flag (you can confirm via Visual C++ dumpbin command). See http://msdn.microsoft.com/en-us/library/bb384887.aspx for more. What's really important to note, is that by default this is turned ON in newer Visual C++ linkers. I'm not sure why this flag was set, but perhaps the package maintainer forgot to turn it off? Anyway, the DYNAMICBASE flag means that the operating system is free to randomly rebase the DLL anywhere it feels like: a sure recipe for disaster if fork() requires the DLL to be loaded in the same place every time!! In practice this is not usually an issue because the operating system only rebases once so that the image memory pages can still be reused across multiple processes. But I think that isn't set in stone: it's free to base it in one place for one process, and then base it somewhere else for a different process (I have observed this on the problem computer where bash couldn't even run /etc/profile). It seems to me that the cygreadline7 maintainer needs to be sure this flag is turned off. And maybe there are other EXE or DLL files in Cygwin that have this flag turned on? I haven't checked. In that case, perhaps the rebaseall command could also turn the DYNAMICBASE flag off while it is doing the rebase? Additionally, rebaseall could operate on EXE files too, just to turn off the DYNAMICBASE flag. That way there would be no concern about package maintainers who forget to turn off this flag. 2. Cygwin DLL basing seems to start at 0x61000000, and continues until almost 0x70000000. This is too close to the upper end of address space: it has a high risk of collision with operating system DLLs and ASLR-relocated images. In my opinion, starting the rebase at an address like 0x30000000 or 0x40000000 would be a lot safer. You can see that some system DLLs are being placed in the same general address range that Cygwin is using. It's just by luck there wasn't a collision! Are there good reasons for not rebasing at an earlier address? In general, I have observed ASLR-relocated DLLs as early as between 0x50000000 and 0x60000000 on my main development computer, which has a lot of stuff on it. And also several Windows XP system DLLs (i.e. no ASLR) being placed in the 0x60000000 range as well (with an example being shown above). Thoughts, anyone? Best regards, James Johnston -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple