Mail Archives: cygwin/2011/04/04/09:01:01
On 15/03/2011 13:53, Ryan Johnson wrote:
> All of this assumes Windows is consistent in choosing locations when conflicts
It's assumed that CreateProcess() produces the same layout, yes.
> are involved. IOW, consider the case that B depends on A, with A and B both
> conflicting with a later-loaded C. The first time A and C load Windows will
> choose alternate locations for them, and if that order changes in the child,
> it's totally possible that A ends up in the child where C was in the parent.
I'm not sure what you mean here: all that matters is duplicating the layout at
the point in time when fork() occurs.
>>> Incidentally, this
>>> same problem would arise if a BLODA injected a DLL into the process -- that
>>> DLL would be on the todo list for fork() to process (because it was also
>>> injected into the parent process), but would already be loaded by the time we
>>> try to remap it. Also, if we do want to force Windows not to put a dll in a
>>> certain address, wouldn't it make more sense to reserve the (wrong) space it
>>> went into on the first try? Right now if the offending location is higher than
>>> the one we want, nothing stops Windows from just putting it right back in its
>>> old spot because the code only reserves locations lower than the desired one.
>>>
>>> Is this accurate or am I missing something here?
>> I'm not sure that particular scenario with injected DLLs is possible, as the
>> list traversed in dll_list::load_after_fork() is only of dynamically loaded
>> cygwin-based DLLs?
> Oh, so injected dlls, though not statically linked in, still wouldn't be on
> this list?
>
> BTW, I found a good way to identify, if not fix, BLODA: given an app which
> loads no libraries at runtime -- such as 'ls' -- any dlls mentioned in
> /proc/$$/maps which cygcheck does not mention are probably dodgy. In my case,
> Windows Live (which I didn't think was even installed on my machine) has
> injected a WLIDNSP.DLL ("Microsoft Windows Live ID Namespace Provider") in all
> my processes.
>
>> $ objdump -p /usr/bin/cygpyglib-2.0-python2.6-0.dll | grep ^ImageBase
>> ImageBase 6aa40000
>>
>> $ objdump -p /usr/bin/cygglib-2.0-0.dll | grep ^ImageBase
>> ImageBase 6aa40000
>>
>> C:\cygwin\bin\cygglib-2.0-0.dll @ 0x6AA40000 using DONT_RESOLVE_DLL_REFERENCES
>> 1263 [main] python 3008 dll_list::load_after_fork: reserve_upto 0x18C40000
>> to try to force it to load there
>> 1473 [main] python 3008 dll_list::load_after_fork: LoadLibrary
>> C:\cygwin\bin\cygglib-2.0-0.dll @ 0x6AA40000 using DONT_RESOLVE_DLL_REFERENCES
>> 1620 [main] python 3008 C:\cygwin\bin\python.exe: *** fatal error - unable
>> to remap C:\cygwin\bin\cygglib-2.0-0.dll to same address as parent: 0x18C40000
>> != 0x6AA40000
>>
>> and I've confirmed that in the parent, cygpyglib-2.0-python2.6-0.dll loads at
>> 0x6AA40000 and cygglib-2.0-0.dll loads at 0x18C40000.
>>
>> At a wild guess, it looks like LoadLibraryEx() maps DLLs into memory starting
>> from the top of the dependency chain, but then calls the DLL's entry point
>> starting from the bottom of the dependency chain (which makes all kinds of
>> sense, but leads to this inversion of the load order in the child)
>>
> So the problem basically arises because dlls in the child are not actually
> loaded in the same order as in the parent? In this case I assume that
> cygpyglib depends on cygglib,
There's no need to assume it because I wrote "cygpyglib-2.0-python2.6-0.dll
(glib bindings for python) depends on cygglib-2.0-0.dll"
> which suggests that we could avoid a lot of
> trouble by handling dependent children first.
The problem in the particular case I've looked at. I wouldn't assume that all
or even most remap failures are caused by that scenario.
> Also, it looks like the above is exactly the case I suspected -- the offending
> dll attempts to load *higher* than where we want it, so reserving space below
> does nothing for us.
>>> I assume there's a way to enumerate the dlls loaded in a given process; would
>>> it make sense to use a three-step algorithm?
>>> 1. Unload all currently-loaded dlls, complaining loudly to stderr or a log
>>> file (these are due to BLODA and deserve to be called out)
>>> 2. Load without deps every DLL and make sure it lands at the right address
>>> (using memory reservation tricks if needed)
>>> 3. Reload with deps every DLL. Presumably once it has landed correctly once it
>>> will do so thereafter (the current code assumes this, at least)
>> Doing 2& 3 is an interesting idea, the first call to let you pin it at a
>> particular address and the second to make it executable.
>>
>> I've no idea what happens, but unfortunately, the comments in
>> dll_list::load_after_fork() seem to suggest this doesn't work, as the DLLs
>> entry point doesn't get called the second time it's loaded.
> The code currently unloads the library completely and the reloads it normally,
> which I assumed was to ensure entry points get called.
>
>>> In theory, the first step might allow cygwin to resist dll injection (maybe on
>>> an opt-out basis?), though I don't know what the consequences of that choice
>>> would be.
>>>
>>> The third step would be significantly easier if we had a dependency graph so
>>> that we could ensure dependencies always get processed before they're needed,
>>> but I don't know if that's feasible. How expensive/embeddable is cygcheck?
>> Another idea (assuming my guess about LoadLibrary() behaviour above is
>> correct) would be to have dlopen() rather than simply call LoadLibrary() on a
>> DLL, construct the dependency tree of the DLL it's been asked to open and load
>> the DLLs starting from the bottom, so that the order of loading into memory
>> matches the order which entry points are called (and hence the order in
>> dll_list)? (This would have the advantage of not making fork() even more
>> heavyweight)
> Some variant of objdump -p $THE_DLL | grep 'DLL Name' ?
>
> It might also make sense for the parent process to record some ordering
> information at dlopen time in case it forks later. Given that the dlls are
> opening anyway it would probably be cheap to do it then. Just build a tree of
> all dlls which the current dlopen() triggers dlopen() calls for. Alternatively
> (simpler?) just make dlopen() add dlls to its list just before it returns.
> That way, any recursive calls will add the dependencies to the list first. No
> special data structures needed. Only problem is, I can't see where in the
> source this magical list is generated in the first place :(
I suggest you read how-startup-shutdown-works.txt and then observe that
dll_list:alloc() is called by dll_dllcrt0_1()
>> Alternatively, maybe all that is needed is a slightly more complex approach to
>> forcing the DLL to load at a particular address? If reserve_upto() has been
>> called, but it loads higher than that, can we assume load order inversion has
>> occurred, and try to to block it from loading at it's preferred address by
>> VirtualAlloc()-ing there as well? I think I might even try to write a patch to
>> do that...
> The second approach might be easier to hack together quickly, but the first
> would actually make fork() more efficient and eliminate a lot of code: it's
> likely that all the rebasing/remapping fallbacks could disappear.
>
> A third alternative would be to traverse the remaining list of dlls and find
> the one that we should have loaded first. This would have to be recursive to
> handle the case where several dlls map to the same base, but might otherwise
> be workable.
I look forward to reading your patches :-)
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
- Raw text -