X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-1.6 required=5.0 tests=AWL,BAYES_40,RCVD_IN_DNSWL_LOW,TW_BJ,TW_GP,TW_YG X-Spam-Check-By: sourceware.org Message-ID: <4D7BB586.4050003@dronecode.org.uk> Date: Sat, 12 Mar 2011 18:03:50 +0000 From: Jon TURNEY User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: Debugging help for fork failure: resource temporarily unavailable References: <20110309102257 DOT GN12899 AT calimero DOT vinschen DOT de> <4D77B326 DOT 5050401 AT ece DOT cmu DOT edu> In-Reply-To: <4D77B326.5050401@ece.cmu.edu> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Note-from-DJ: This may be spam On 07/03/2011 15:10, Ryan Johnson wrote: > Actually, a follow-up question: what is the difference between the fork(e.g. > resource unavailable) failures vs. the errors about 'failed to remap dll...' > ? Looking at the code in dll_init.cc, if failure to remap a dll were really > the source of fork failing, the error message should say so. Is there some > other issue due to BLODA that also causes forks to fail? First of all, I don't know the answer to this question :-) EAGAIN is the error which fork() returns when a remap failure occurs, so if you don't have the stderr output, I guess that might be all you'll see? Secondly, I'm by no means an expert on this issue, these are just my observations. On 09/03/2011 17:04, Ryan Johnson wrote > BTW, while looking at the code I noticed a potential source of remap problems: > if B depends on A, and we remap A first, then only A's location will be > checked carefully; B will be pulled in wherever it happens to end up when we > do the full load of A. The code seems to assume that every DLL we try to remap > is currently not loaded. > > I'm actually not sure what would happen when time came to remap B, because > loading it would just return the handle we didn't know we had, and closing > that handle wouldn't take its reference count to zero. I too have idly mused that there might be an issue with dependent DLLs here. But, since dll_list::load_after_fork() walks the dll list in the same order as the dlopen() calls occur, I've never been able to convince myself there is a real problem, barring esoteric scenarios like: B depends on A, C depends on A, load B, load C (C collides with A so loads at non-preferred address), unload B, fork That doesn't match what really happens though: where problems are seen it's often with python or perl, which dynamically load libraries when modules are imported, but won't unload them in normal use. > Incidentally, this > same problem would arise if a BLODA injected a DLL into the process -- that > DLL would be on the todo list for fork() to process (because it was also > injected into the parent process), but would already be loaded by the time we > try to remap it. Also, if we do want to force Windows not to put a dll in a > certain address, wouldn't it make more sense to reserve the (wrong) space it > went into on the first try? Right now if the offending location is higher than > the one we want, nothing stops Windows from just putting it right back in its > old spot because the code only reserves locations lower than the desired one. > > Is this accurate or am I missing something here? I'm not sure that particular scenario with injected DLLs is possible, as the list traversed in dll_list::load_after_fork() is only of dynamically loaded cygwin-based DLLs? I think some more investigation of what the actual memory layout of DLLs is when a remap failure occurs is needed before trying to fix the problem. So, here is an actual test case, distilled from a failure I have observed running the twisted test suite: cygpyglib-2.0-python2.6-0.dll (glib bindings for python) depends on cygglib-2.0-0.dll, but also has the same preferred base address, as there is a collision in the hash of the filename used by --enable-auto-image-base. $ objdump -p /usr/bin/cygpyglib-2.0-python2.6-0.dll | grep ^ImageBase ImageBase 6aa40000 $ objdump -p /usr/bin/cygglib-2.0-0.dll | grep ^ImageBase ImageBase 6aa40000 a small python program which just loads the DLLs and then forks: #!/usr/bin/env python import os import glib # comment this line out to succeed pid = os.fork() if pid: # wait for child to exit os.waitpid(pid, 0) here's the failure (with a little extra debugging output inserted into cygwin1.dll to make it a little clearer what it's trying to do) $ ./testcase.py 431 [main] python 3008 dll_list::load_after_fork: LoadLibrary C:\cygwin\bin\cygiconv-2.dll @ 0x674C0000 using DONT_RESOLVE_DLL_REFERENCES 719 [main] python 3008 dll_list::load_after_fork: LoadLibrary C:\cygwin\bin\cygintl-8.dll @ 0x6F5C0000 using DONT_RESOLVE_DLL_REFERENCES 979 [main] python 3008 dll_list::load_after_fork: LoadLibrary C:\cygwin\bin\cygpcre-0.dll @ 0x64240000 using DONT_RESOLVE_DLL_REFERENCES 1227 [main] python 3008 dll_list::load_after_fork: LoadLibrary C:\cygwin\bin\cygglib-2.0-0.dll @ 0x6AA40000 using DONT_RESOLVE_DLL_REFERENCES 1263 [main] python 3008 dll_list::load_after_fork: reserve_upto 0x18C40000 to try to force it to load there 1473 [main] python 3008 dll_list::load_after_fork: LoadLibrary C:\cygwin\bin\cygglib-2.0-0.dll @ 0x6AA40000 using DONT_RESOLVE_DLL_REFERENCES 1620 [main] python 3008 C:\cygwin\bin\python.exe: *** fatal error - unable to remap C:\cygwin\bin\cygglib-2.0-0.dll to same address as parent: 0x18C40000 != 0x6AA40000 and I've confirmed that in the parent, cygpyglib-2.0-python2.6-0.dll loads at 0x6AA40000 and cygglib-2.0-0.dll loads at 0x18C40000. At a wild guess, it looks like LoadLibraryEx() maps DLLs into memory starting from the top of the dependency chain, but then calls the DLL's entry point starting from the bottom of the dependency chain (which makes all kinds of sense, but leads to this inversion of the load order in the child) This is trivially worked around by rebasing one of the conflicting DLLs to a different address, e.g.: $ rebase -b 0x60000000 /usr/bin/cygglib-2.0-0.dll $ ./testcase.py 2 [main] python 2916 dll_list::load_after_fork: C:\cygwin\bin\cyggcc_s-1.dll (type 1743781888) expected @ 0x0 138 [main] python 2916 dll_list::load_after_fork: C:\cygwin\bin\libpython2.6.dll (type 1741422592) expected @ 0x0 728 [main] python 2916 dll_list::load_after_fork: LoadLibrary C:\cygwin\bin\cygiconv-2.dll @ 0x674C0000 using DONT_RESOLVE_DLL_REFERENCES 1049 [main] python 2916 dll_list::load_after_fork: LoadLibrary C:\cygwin\bin\cygintl-8.dll @ 0x6F5C0000 using DONT_RESOLVE_DLL_REFERENCES 1421 [main] python 2916 dll_list::load_after_fork: LoadLibrary C:\cygwin\bin\cygpcre-0.dll @ 0x64240000 using DONT_RESOLVE_DLL_REFERENCES 1945 [main] python 2916 dll_list::load_after_fork: LoadLibrary C:\cygwin\bin\cygglib-2.0-0.dll @ 0x60000000 using DONT_RESOLVE_DLL_REFERENCES 2791 [main] python 2916 dll_list::load_after_fork: LoadLibrary C:\cygwin\bin\cyggthread-2.0-0.dll @ 0x6C000000 using DONT_RESOLVE_DLL_REFERENCES 3110 [main] python 2916 dll_list::load_after_fork: LoadLibrary C:\cygwin\bin\cygpyglib-2.0-python2.6-0.dll @ 0x6AA40000 using DONT_RESOLVE_DLL_REFERENCES 3647 [main] python 2916 dll_list::load_after_fork: LoadLibrary \\?\C:\cygwin\lib\python2.6\site-packages\gtk-2.0\glib\_glib.dll @ 0x61680000 using DONT_RESOLVE_DLL_REFERENCES This perhaps explains some remap issues which rebaseall fixes as that avoids the possibility of dependent DLLs with colliding preferred base addresses. I'm not sure what can be done fix this programmatically. > I assume there's a way to enumerate the dlls loaded in a given process; would > it make sense to use a three-step algorithm? > 1. Unload all currently-loaded dlls, complaining loudly to stderr or a log > file (these are due to BLODA and deserve to be called out) > 2. Load without deps every DLL and make sure it lands at the right address > (using memory reservation tricks if needed) > 3. Reload with deps every DLL. Presumably once it has landed correctly once it > will do so thereafter (the current code assumes this, at least) Doing 2 & 3 is an interesting idea, the first call to let you pin it at a particular address and the second to make it executable. I've no idea what happens, but unfortunately, the comments in dll_list::load_after_fork() seem to suggest this doesn't work, as the DLLs entry point doesn't get called the second time it's loaded. > In theory, the first step might allow cygwin to resist dll injection (maybe on > an opt-out basis?), though I don't know what the consequences of that choice > would be. > > The third step would be significantly easier if we had a dependency graph so > that we could ensure dependencies always get processed before they're needed, > but I don't know if that's feasible. How expensive/embeddable is cygcheck? Another idea (assuming my guess about LoadLibrary() behaviour above is correct) would be to have dlopen() rather than simply call LoadLibrary() on a DLL, construct the dependency tree of the DLL it's been asked to open and load the DLLs starting from the bottom, so that the order of loading into memory matches the order which entry points are called (and hence the order in dll_list)? (This would have the advantage of not making fork() even more heavyweight) Alternatively, maybe all that is needed is a slightly more complex approach to forcing the DLL to load at a particular address? If reserve_upto() has been called, but it loads higher than that, can we assume load order inversion has occurred, and try to to block it from loading at it's preferred address by VirtualAlloc()-ing there as well? I think I might even try to write a patch to do that... -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple