From: root AT jacob DOT remcomp DOT fr (root) Subject: The bug in ld: some results: 28 Mar 1997 19:22:35 -0800 Sender: daemon AT cygnus DOT com Approved: cygnus DOT gnu-win32 AT cygnus DOT com Distribution: cygnus Message-ID: Original-To: jqb AT netcom DOT com (Jim Balter) Original-Cc: gnu-win32 AT cygnus DOT com In-Reply-To: <333B7E1A.6654@netcom.com> from "Jim Balter" at Mar 28, 97 00:15:22 am Content-Type: text Original-Sender: owner-gnu-win32 AT cygnus DOT com Jim was kind enough to send me two executables, named GOOD.EXE and BAD.EXE, that contain this now famous 'lkernel32' in the command line 'ld' bug. I decided to investigate further this problem, and to present you this results. Maybe this information will be useful for you and for the poor guy(s) trying to maintain 'ld'. I really feel sorry for them ! :-) The main differences between the executables are the following: 1. Good.exe has the imports entries from cygwin.dll FIRST, then follow the cygwin.dll entries. 2. Bad.exe has the imports entries from kernel32 first, then cygwin.dll. This explains the observed behaviour of command line/not command line crashes. It seems that the order which ld uses is very significant. Why? In BAD.EXE the entry for the kernel32.dll imports is WRONG: it is missing the import for GetCommandLine !!!!! Since that import is wrong, the program will crash at startup, when that function is called. This explains the behaviour Jim observed under the debugger. The entry for GetModuleHandle is present, and is the only entry for kernel32. The entry for GetCommandLine is not entirely missing: just its address is missing in the import table. Its Ascii name is there, but the loader will never find it, because after the entry of 'GetModuleHandle' there is a NULL. I think here a little explanation is required: The import table begins with an array of IMAGE_IMPORT_DESCRIPTOR's for each dll used by the program. There is one of them for each dll. This IMAGE_IMPORT_DESCRIPTORs contain a field that points to the data needed by the loader for each function in the corresponding dll. This is OK in BAD.EXE. So far so good. The information pointed to by the IMAGE_IMPORT_DESCRIPTORS is an array of IMAGE_THUNK_DATA: one for each imported function from the corresponding dll. This arrays are finished by a NULL entry. This IMAGE_THUNK_DATA structures contain an RVA (relative) pointer to the names of the functions. (Another array). IMAGE_IMPORT_DESCRIPTOR[] IMAGE_THUNK_DATA NAME/Hint -------------------- | import descriptor |______________ Thunk data for function 1 | dllNr1 | of first dll --------------->Ascii name -------------------- Thunk data for function 2 of first dll --------------->Ascii name -------------------- | import descriptor |______________ Thunk data for function 1 | dllNr2 | of second dll --------------Ascii name -------------------- In BAD.EXE there is a correct entry for GetModuleHandle, and its u1.AddressOfData field points correctly to an ascii string 'GetModuleHandle'. The problem is, that the next entry in the IMAGE_THUNK_DATA contains a NULL instead of an entry for the next function imported from kernel32.dll: GetCommandLine. This NULL is interpreted by the loader as the sign for the end of the table and it will never get to the ASCII string 'GetModuleHandle', that is there, in the import table, even if its NOT WHERE IT SHOULD BE. That ASCII string is at the END of the ASCII strings of the OTHER dll the program uses: cygwin.dll. Instead of following immediately the string of GetModuleHandle, The ascii string got somehow at the end of another completely unrelated DLL!!!!! This means that 'ld' mixes up the ASCII strings for the functions imported by the dlls, what could prepare bad surprises to users that call one function in their source code, and end up calling something completely different at run time!!! Why does 'ld' crash? As somebody from Cygnus pointed out in this thread, I do not know much about ld. But to write a linker I was forced to learn something about this dammed table, the most difficult part of the whole linking process. There are three possibilities: 1) The import library for kernel32.dll has a bug that confuses 'ld'. 2) 'ld' has a bug independent of the libraries. 3) Both 'ld' and the import libraries are buggy. Let's examine the pros/cons of each possibility. 1) The import library is buggy. If we assume this, we would have to explain why the same import library works if it is not the first import library and specified in the command line. 2) 'ld' has a bug that is library independent. This is highly probable since it would explain all observed behaviour. 3) Both DLLTOOL and ld are buggy. This is a REAL possibility. Where to look in 'ld' sources? 'ld' sorts the sections (as my linker does) to accomodate the order of the subsections (idata$2 should come before idata$3). Is this sorting done before or after the default libraries are loaded? The mess in ld could be the result of sorting after some command line libraries were loaded already... I think this is the most promising avenue of investigation. ---------------------------------------------------------------------------- The 'not owner' bug in 'ld'. When I studied 'ld', I remarked the following problem. If you try to link an object file generated with MSVC using 'ld', I saw that the executable image contained a HOLE, i.e. it wasn't contiguous. Could somebody that has an executable that has that 'not owner' bug look at the sequence of the sections in memory? If they are NOT CONTIGUOS, the NT loader will refuse to load the program. I would be interested to know if that is the case. I think that the Windows95 loader is more liberal, and may load the executable with holes and all... This could explain some things. Sorry for this long message -- Jacob Navia Logiciels/Informatique 41 rue Maurice Ravel Tel 01 48.23.51.44 93430 Villetaneuse Fax 01 48.23.95.39 France - For help on using this list, send a message to "gnu-win32-request AT cygnus DOT com" with one line of text: "help".