Mailing-List: contact cygwin-apps-help AT sourceware DOT cygnus DOT com; run by ezmlm Sender: cygwin-apps-owner AT sourceware DOT cygnus DOT com List-Subscribe: List-Archive: List-Post: List-Help: , Delivered-To: mailing list cygwin-apps AT sources DOT redhat DOT com Message-ID: <3B68C879.1070809@ece.gatech.edu> Date: Wed, 01 Aug 2001 23:26:49 -0400 From: Charles Wilson User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2) Gecko/20010713 X-Accept-Language: en-us MIME-Version: 1.0 To: DJ Delorie CC: binutils AT sources DOT redhat DOT com, cygwin-apps AT cygwin DOT com Subject: Re: [RFA] pei386 dll: auto-import patch References: <3B670087 DOT 7090102 AT ece DOT gatech DOT edu> <200108011735 DOT NAA32231 AT envy DOT delorie DOT com> <3B6846D2 DOT 9040206 AT ece DOT gatech DOT edu> <200108011847 DOT OAA32757 AT envy DOT delorie DOT com> Content-Type: multipart/mixed; boundary="------------060507050606040700080303" This is a multi-part message in MIME format. --------------060507050606040700080303 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Okay, here's the revised patch and Changelog. Differences from earlier version: 1) --enable-gory-debug ==> --enable-extra-pe-debug 2) added --enable-auto-import option 3) made --disable-auto-import the default 4) encapsulated pe_dll_auto_import into bfd_link_info structure: new field "pei386_auto_import 5) changed pe_em(auto_export) filters to be table-based (*) 6) moved detailed docs into ldint.texi 7) removed "bugger" 8) followed GNU coding style 9) removed ldlang.c patch 10) removed C++ comments (*) Actually, because of context, I had to use five different tables (libs to exclude, object files to exclude, complete symbol names to exclude, symbol prefixes to exclude, and symbol suffixes to exclude) It builds, compiles, and passes my preliminary tests. I'll have a binary package for others to test with up on my website soon. (Note that you'll have to explicitly specify --enable-auto-import now). --Chuck --------------060507050606040700080303 Content-Type: text/plain; name="ChangeLog" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="ChangeLog" 2001-08-01 Paul Sokolovsky * bfd/cofflink.c (coff_link_check_ar_symbols): also search for __imp__symbol as well as _symbol. * bfd/linker.c (_bfd_generic_link_add_archive_symbols): also search for __imp__symbol as well as _symbol. 2001-08-01 Charles Wilson * include/bfdlink.h (struct bfd_link_info): add new boolean field pei386_auto_import. * ld/ldmain.c (main): initialize link_info.pei386_auto_import * ld/pe-dll.c: new tables for auto-export filtering (auto_export): change API, pass abfd for contextual filtering. Loop thru tables of excluded symbols instead of comparing "by hand". 2001-08-01 Paul Sokolovsky * ld/pe-dll.c: new variable pe_dll_enable_extra_debug. New static variable current_sec (static struct sec *). Add forward declaration for add_bfd_to_link. (process_def_file): Don't export undefined symbols. Do not export symbols starting with "_imp__". Call auto_export() with new API. (pe_walk_relocs_of_symbol): New function. (generate_reloc): add optional extra debugging (pe_dll_generate_def_file): eliminate extraneous initial blank line in output (make_one): enlarge symtab to make room for __nm__ symbols (DATA auto-import support). (make_singleton_name_thunk): New function. (make_import_fixup_mark): New function. (make_import_fixup_entry): New function. (pe_create_import_fixup): New function. (add_bfd_to_link): make this function non-static. Specify that name argument is a CONST char *. * ld/pe-dll.h: declare new variable pe_dll_extra_pe_debug; declare new functions pe_walk_relocs_of_symbol and pe_create_import_fixup. * ld/emultempl/pe.em: add new options --enable-auto-import, --disable-auto-import, and --enable-extra-pe-debug. (make_import_fixup): New function. (pe_find_data_imports): New function. (pr_sym): New function. (gld_${EMULATION_NAME}_after_open): Add optional extra pe debugging. Call pe_find_data_imports. Mark .idata as DATA, not CODE. 2001-08-01 Charles Wilson * ld/ld.texinfo: add additional documentation for --export-all-symbols. Document --out-implib, --enable-auto-image-base, --disable-auto-image-base, --dll-search-prefix, --enable-auto-import, and --disable-auto-import. * ld/ldint.texinfo: Add detailed documentation on auto-import implementation. --------------060507050606040700080303 Content-Type: text/plain; name="auto-import.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="auto-import.patch" Index: bfd/cofflink.c =================================================================== RCS file: /cvs/src/src/bfd/cofflink.c,v retrieving revision 1.24 diff -u -r1.24 cofflink.c --- cofflink.c 2001/07/03 15:49:46 1.24 +++ cofflink.c 2001/08/02 02:45:36 @@ -277,6 +277,16 @@ return false; h = bfd_link_hash_lookup (info->hash, name, false, false, true); + /* auto import */ + if (!h && info->pei386_auto_import) + { + if (!strncmp (name,"__imp_", 6)) + { + h = + bfd_link_hash_lookup (info->hash, name + 6, false, false, + true); + } + } /* We are only interested in symbols that are currently undefined. If a symbol is currently known to be common, COFF linkers do not bring in an object file which defines Index: bfd/linker.c =================================================================== RCS file: /cvs/src/src/bfd/linker.c,v retrieving revision 1.10 diff -u -r1.10 linker.c --- linker.c 2001/07/05 22:40:16 1.10 +++ linker.c 2001/08/02 02:45:40 @@ -1003,10 +1003,20 @@ arh = archive_hash_lookup (&arsym_hash, h->root.string, false, false); if (arh == (struct archive_hash_entry *) NULL) { - pundef = &(*pundef)->next; - continue; + /* If we haven't found very symbol, let's look for its + import thunk */ + if (info->pei386_auto_import) + { + char *buf = alloca (strlen (h->root.string) + 10); + sprintf (buf, "__imp_%s", h->root.string); + arh = archive_hash_lookup (&arsym_hash, buf, false, false); + } + if (arh == (struct archive_hash_entry *) NULL) + { + pundef = &(*pundef)->next; + continue; + } } - /* Look at all the objects which define this symbol. */ for (l = arh->defs; l != (struct archive_list *) NULL; l = l->next) { Index: include/bfdlink.h =================================================================== RCS file: /cvs/src/src/include/bfdlink.h,v retrieving revision 1.10 diff -u -r1.10 bfdlink.h --- bfdlink.h 2001/06/15 12:57:02 1.10 +++ bfdlink.h 2001/08/02 02:45:57 @@ -274,6 +274,10 @@ /* May be used to set DT_FLAGS_1 for ELF. */ bfd_vma flags_1; + + /* true if auto-import thunks for DATA items in pei386 DLLs + should be generated/linked against. */ + boolean pei386_auto_import; }; /* This structures holds a set of callback functions. These are Index: ld/ld.texinfo =================================================================== RCS file: /cvs/src/src/ld/ld.texinfo,v retrieving revision 1.42 diff -u -r1.42 ld.texinfo --- ld.texinfo 2001/07/30 18:12:07 1.42 +++ ld.texinfo 2001/08/02 02:46:05 @@ -1601,8 +1601,22 @@ explicitly exported via DEF files or implicitly exported via function attributes, the default is to not export anything else unless this option is given. Note that the symbols @code{DllMain@@12}, -@code{DllEntryPoint@@0}, and @code{impure_ptr} will not be automatically -exported. +@code{DllEntryPoint@@0}, @code{DllMainCRTStartup@@12}, and +@code{impure_ptr} will not be automatically +exported. Also, symbols imported from other DLLs will not be +re-exported, nor will symbols specifying the DLL's internal layout +such as those beginning with @code{_head_} or ending with +@code{_iname}. In addition, no symbols from @code{libgcc}, +@code{libstd++}, @code{libmingw32}, or @code{crtX.o} will be exported. +Symbols whose names begin with @code{__rtti_} or @code{__builtin_} will +not be exported, to help with C++ DLLs. Finally, there is an +extensive list of cygwin-private symbols that are not exported +(obviously, this applies on when building DLLs for cygwin targets). +These cygwin-excludes are: @code{_cygwin_dll_entry@@12}, +@code{_cygwin_crt0_common@@8}, @code{_cygwin_noncygwin_dll_entry@@12}, +@code{_fmode}, @code{_impure_ptr}, @code{cygwin_attach_dll}, +@code{cygwin_premain0}, @code{cygwin_premain1}, @code{cygwin_premain2}, +@code{cygwin_premain3}, and @code{environ}. @kindex --exclude-symbols @item --exclude-symbols @var{symbol},@var{symbol},... @@ -1671,6 +1685,59 @@ (which should be called @code{*.def}) may be used to create an import library with @code{dlltool} or may be used as a reference to automatically or implicitly exported symbols. + +@cindex DLLs, creating +@kindex --out-implib +@item --out-implib @var{file} +The linker will create the file @var{file} which will contain an +import lib corresponding to the DLL the linker is generating. This +import lib (which should be called @code{*.dll.a} or @code{*.a} +may be used to link clients against the generated DLL; this behavior +makes it possible to skip a separate @code{dlltool} import library +creation step. + +@cindex DLLs, creating +@kindex --enable-auto-image-base +@item --enable-auto-image-base +Automatically choose the image base for DLLs, unless one is specified +using the @code{--image-base} argument. By using a hash generated +from the dllname to create unique image bases for each DLL, in-memory +collisions and relocations which can delay program execution are +avoided. + +@cindex DLLs, creating +@kindex --disable-auto-image-base +@item --disable-auto-image-base +Do not automatically generate a unique image base. If there is no +user-specified image base (@code{--image-base}) then use the platform +default. + +@cindex DLLs, linking to +@kindex --dll-search-prefix +@item --dll-search-prefix @var{string} +When linking dynamically to a dll without an import library, i +search for @code{.dll} in preference to +@code{lib.dll}. This behavior allows easy distinction +between DLLs built for the various "subplatforms": native, cygwin, +uwin, pw, etc. For instance, cygwin DLLs typically use +@code{--dll-search-prefix=cyg}. + +@cindex DLLs, linking to +@kindex --enable-auto-import +@item --enable-auto-import +Do sophisticalted linking of @code{_symbol} to @code{__imp__symbol} for +DATA imports from DLLs, and create the necessary thunking symbols when +building the DLLs with those DATA exports. + +@cindex DLLs, linking to +@kindex --disable-auto-import +@item --disable-auto-import +Do not attempt to do sophisticalted linking of @code{_symbol} to +@code{__imp__symbol} for DATA imports from DLLs. + +@kindex --enable-extra-pe-debug +@item --enable-extra-pe-debug +Show additional debug info related to auto-import symbol thunking. @kindex --section-alignment @item --section-alignment Index: ld/ldint.texinfo =================================================================== RCS file: /cvs/src/src/ld/ldint.texinfo,v retrieving revision 1.5 diff -u -r1.5 ldint.texinfo --- ldint.texinfo 2001/03/13 06:14:27 1.5 +++ ldint.texinfo 2001/08/02 02:46:07 @@ -84,6 +84,7 @@ * README:: The README File * Emulations:: How linker emulations are generated * Emulation Walkthrough:: A Walkthrough of a Typical Emulation +* Architecture Specific:: Some Architecture Specific Notes * GNU Free Documentation License:: GNU Free Documentation License @end menu @@ -570,6 +571,105 @@ @item output bfd is written to disk @end itemize + +@node Architecture Specific +@chapter Some Architecture Specific Notes + +This is the place for notes on the behavior of @code{ld} on +specific platforms. Currently, only Intel x86 is documented (and +of that, only the auto-import behavior for DLLs). + +@menu +* ix86:: Intel x86 +@end menu + +@node ix86 +@section Intel x86 + +@table @emph +@code{ld} can create DLLs that operate with various runtimes available +on a common x86 operating system. These runtimes include native (using +the mingw "platform"), cygwin, and pw. + +@item auto-import from DLLs +@enumerate +@item +With this feature on, DLL clients can import variables from DLL +without any concern from their side (for example, without any source +code modifications). Auto-import can be enabled using the +@code{--enable-auto-import} flag, or disabled via the +@code{--disable-auto-import} flag. Auto-import is disabled by default. + +@item +This is done completely in bounds of the PE specification (to be fair, +there's a minor violation of the spec at one point, but in practice +auto-import works on all known variants of that common x86 operating +system) So, the resulting DLL can be used with any other PE +compiler/linker. + +@item +Auto-import is fully compatible with standard import method, in which +variables are decorated using attribute modifiers. Libraries of either +type may be mixed together. + +@item +Overhead (space): 8 bytes per imported symbol, plus 20 for each +reference to it; Overhead (load time): negligible; Overhead +(virtual/physical memory): should be less than effect of DLL +relocation. +@end enumerate + +Motivation + +The obvious and only way to get rid of dllimport insanity is +to make client access variable directly in the DLL, bypassing +the extra dereference imposed by ordinary DLL runtime linking. +I.e., whenever client contains someting like + +@code{mov dll_var,%eax,} + +address of dll_var in the command should be relocated to point +into loaded DLL. The aim is to make OS loader do so, and than +make ld help with that. Import section of PE made following +way: there's a vector of structures each describing imports +from particular DLL. Each such structure points to two other +parellel vectors: one holding imported names, and one which +will hold address of corresponding imported name. So, the +solution is de-vectorize these structures, making import +locations be sparse and pointing directly into code. + +Implementation + +For each reference of data symbol to be imported from DLL (to +set of which belong symbols with name , if __imp_ is +found in implib), the import fixup entry is generated. That +entry is of type IMAGE_IMPORT_DESCRIPTOR and stored in .idata$3 +subsection. Each fixup entry contains pointer to symbol's address +within .text section (marked with __fuN_ symbol, where N is +integer), pointer to DLL name (so, DLL name is referenced by +multiple entries), and pointer to symbol name thunk. Symbol name +thunk is singleton vector (__nm_th_) pointing to +IMAGE_IMPORT_BY_NAME structure (__nm_) directly containing +imported name. Here comes that "om the edge" problem mentioned above: +PE specification rambles that name vector (OriginalFirstThunk) should +run in parallel with addresses vector (FirstThunk), i.e. that they +should have same number of elements and terminated with zero. We violate +this, since FirstThunk points directly into machine code. But in +practice, OS loader implemented the sane way: it goes thru +OriginalFirstThunk and puts addresses to FirstThunk, not something +else. It once again should be noted that dll and symbol name +structures are reused across fixup entries and should be there +anyway to support standard import stuff, so sustained overhead is +20 bytes per reference. Other question is whether having several +IMAGE_IMPORT_DESCRIPTORS for the same DLL is possible. Answer is yes, +it is done even by native compiler/linker (libth32's functions are in +fact resident in windows9x kernel32.dll, so if you use it, you have +two IMAGE_IMPORT_DESCRIPTORS for kernel32.dll). Yet other question is +whether referencing the same PE structures several times is valid. +The answer is why not, prohibiting that (detecting violation) would +require more work on behalf of loader than not doing it. + +@end table @node GNU Free Documentation License @chapter GNU Free Documentation License Index: ld/ldmain.c =================================================================== RCS file: /cvs/src/src/ld/ldmain.c,v retrieving revision 1.25 diff -u -r1.25 ldmain.c --- ldmain.c 2001/07/11 19:50:56 1.25 +++ ldmain.c 2001/08/02 02:46:13 @@ -243,6 +243,7 @@ link_info.new_dtags = false; link_info.flags = (bfd_vma) 0; link_info.flags_1 = (bfd_vma) 0; + link_info.pei386_auto_import = false; ldfile_add_arch (""); Index: ld/pe-dll.c =================================================================== RCS file: /cvs/src/src/ld/pe-dll.c,v retrieving revision 1.23 diff -u -r1.23 pe-dll.c --- pe-dll.c 2001/03/13 06:14:27 1.23 +++ pe-dll.c 2001/08/02 02:46:16 @@ -54,6 +54,85 @@ ************************************************************************/ +/************************************************************************ + + Auto-import feature by Paul Sokolovsky + + Quick facts: + + 1. With this feature on, DLL clients can import variables from DLL + without any concern from their side (for example, without any source + code modifications). + + 2. This is done completely in bounds of the PE specification (to be fair, + there's a place where it pokes nose out of, but in practise it works). + So, resulting module can be used with any other PE compiler/linker. + + 3. Auto-import is fully compatible with standard import method and they + can be mixed together. + + 4. Overheads: space: 8 bytes per imported symbol, plus 20 for each + reference to it; load time: negligible; virtual/physical memory: should be + less than effect of DLL relocation, and I sincerely hope it doesn't affect + DLL sharability (too much). + + Idea + + The obvious and only way to get rid of dllimport insanity is to make client + access variable directly in the DLL, bypassing extra dereference. I.e., + whenever client contains someting like + + mov dll_var,%eax, + + address of dll_var in the command should be relocated to point into loaded + DLL. The aim is to make OS loader do so, and than make ld help with that. + Import section of PE made following way: there's a vector of structures + each describing imports from particular DLL. Each such structure points + to two other parellel vectors: one holding imported names, and one which + will hold address of corresponding imported name. So, the solution is + de-vectorize these structures, making import locations be sparse and + pointing directly into code. Before continuing, it is worth a note that, + while authors strives to make PE act ELF-like, there're some other people + make ELF act PE-like: elfvector, ;-) . + + Implementation + + For each reference of data symbol to be imported from DLL (to set of which + belong symbols with name , if __imp_ is found in implib), the + import fixup entry is generated. That entry is of type + IMAGE_IMPORT_DESCRIPTOR and stored in .idata$3 subsection. Each + fixup entry contains pointer to symbol's address within .text section + (marked with __fuN_ symbol, where N is integer), pointer to DLL name + (so, DLL name is referenced by multiple entries), and pointer to symbol + name thunk. Symbol name thunk is singleton vector (__nm_th_) + pointing to IMAGE_IMPORT_BY_NAME structure (__nm_) directly + containing imported name. Here comes that "om the edge" problem mentioned + above: PE specification rambles that name vector (OriginalFirstThunk) + should run in parallel with addresses vector (FirstThunk), i.e. that they + should have same number of elements and terminated with zero. We violate + this, since FirstThunk points directly into machine code. But in practise, + OS loader implemented the sane way: it goes thru OriginalFirstThunk and + puts addresses to FirstThunk, not something else. It once again should be + noted that dll and symbol name structures are reused across fixup entries + and should be there anyway to support standard import stuff, so sustained + overhead is 20 bytes per reference. Other question is whether having several + IMAGE_IMPORT_DESCRIPTORS for the same DLL is possible. Answer is yes, it is + done even by native compiler/linker (libth32's functions are in fact reside + in windows9x kernel32.dll, so if you use it, you have two + IMAGE_IMPORT_DESCRIPTORS for kernel32.dll). Yet other question is whether + referencing the same PE structures several times is valid. The answer is why + not, prohibitting that (detecting violation) would require more work on + behalf of loader than not doing it. + + + See also: ld/emultempl/pe.em + + ************************************************************************/ + +void +add_bfd_to_link (bfd *abfd, CONST char *name, + struct bfd_link_info *link_info); + /* for emultempl/pe.em */ def_file *pe_def_file = 0; @@ -63,6 +142,7 @@ int pe_dll_stdcall_aliases = 0; int pe_dll_warn_dup_exports = 0; int pe_dll_compat_implib = 0; +int pe_dll_extra_pe_debug = 0; /************************************************************************ @@ -86,6 +166,11 @@ int underscored; } pe_details_type; +typedef struct { + char *name; + int len; +} autofilter_entry_type; + #define PE_ARCH_i386 1 #define PE_ARCH_sh 2 #define PE_ARCH_mips 3 @@ -129,6 +214,50 @@ static pe_details_type *pe_details; +static autofilter_entry_type autofilter_symbollist[] = { + { "DllMain AT 12", 10 }, + { "DllEntryPoint AT 0", 15 }, + { "DllMainCRTStartup AT 12", 20 }, + { "_cygwin_dll_entry AT 12", 20 }, + { "_cygwin_crt0_common AT 8", 21 }, + { "_cygwin_noncygwin_dll_entry AT 12", 30 }, + { "impure_ptr", 10 }, + { NULL, 0 } +}; +/* Do not specify library suffix explicitly, to allow for dllized versions */ +static autofilter_entry_type autofilter_liblist[] = { + { "libgcc.", 7 }, + { "libstdc++.", 10 }, + { "libmingw32.", 11 }, + { NULL, 0 } +}; +static autofilter_entry_type autofilter_objlist[] = { + { "crt0.o", 6 }, + { "crt1.o", 6 }, + { "crt2.o", 6 }, + { NULL, 0 } +}; +static autofilter_entry_type autofilter_symbolprefixlist[] = { +/* { "__imp_", 6 }, */ +/* Do __imp_ explicitly to save time */ + { "__rtti_", 7 }, + { "__builtin_", 10 }, + { "_head_", 6 }, /* don't export symbols specifying internal DLL layout */ + { "_fmode", 6 }, + { "_impure_ptr", 11 }, + { "cygwin_attach_dll", 17 }, + { "cygwin_premain0", 15 }, + { "cygwin_premain1", 15 }, + { "cygwin_premain2", 15 }, + { "cygwin_premain3", 15 }, + { "environ", 7 }, + { NULL, 0 } +}; +static autofilter_entry_type autofilter_symbolsuffixlist[] = { + { "_iname", 6 }, + { NULL, 0 } +}; + #define U(str) (pe_details->underscored ? "_" str : str) void @@ -231,24 +360,100 @@ free (local_copy); } +/* + abfd is a bfd containing n (or NULL) + It can be used for contextual checks. +*/ static int -auto_export (d, n) +auto_export (abfd, d, n) + bfd *abfd; def_file *d; const char *n; { int i; struct exclude_list_struct *ex; + autofilter_entry_type *afptr; + + /* we should not re-export imported stuff */ + if (strncmp (n, "_imp__", 6) == 0) + return 0; + for (i = 0; i < d->num_exports; i++) if (strcmp (d->exports[i].name, n) == 0) return 0; if (pe_dll_do_default_excludes) { - if (strcmp (n, "DllMain AT 12") == 0) - return 0; - if (strcmp (n, "DllEntryPoint AT 0") == 0) - return 0; - if (strcmp (n, "impure_ptr") == 0) + if (pe_dll_extra_pe_debug) + { + printf ("considering exporting: %s, abfd=%x, abfd->my_arc=%x\n", + n, abfd, abfd->my_archive); + } + + /* First of all, make context checks: + Don't export anything from libgcc */ + if (abfd && abfd->my_archive) + { + afptr = autofilter_liblist; + while (afptr->name) + { + if (strstr (abfd->my_archive->filename, afptr->name)) + return 0; + afptr++; + } + } + + /* Next, exclude symbols from certain startup objects */ + { + char *p; + afptr = autofilter_objlist; + while (afptr->name) + { + if (abfd && + (p = strstr (abfd->filename, afptr->name)) && + (*(p + afptr->len - 1) == 0)) + return 0; + afptr++; + } + } + +#if 0 + /* Don't export any 'reserved' symbols */ + if (*n && *n == '_' && n[1] == '_') return 0; +#endif + + /* Then, exclude specific symbols */ + afptr = autofilter_symbollist; + while (afptr->name) + { + if (strcmp (n, afptr->name) == 0) + return 0; + afptr++; + } + + /* Next, exclude symbols starting with ... */ + afptr = autofilter_symbolprefixlist; + while (afptr->name) + { + if (strncmp (n, afptr->name, afptr->len) == 0) + return 0; + afptr++; + } + + /* Finally, exclude symbols ending with ... */ + { + int len = strlen(n); + afptr = autofilter_symbolsuffixlist; + while (afptr->name) + { + if ((len >= afptr->len) && + /* add 1 to insure match with trailing '\0' */ + strncmp (n + len - afptr->len, afptr->name, + afptr->len + 1) == 0) + return 0; + afptr++; + } + } } for (ex = excludes; ex; ex = ex->next) if (strcmp (n, ex->string) == 0) @@ -302,20 +507,36 @@ for (j = 0; j < nsyms; j++) { /* We should export symbols which are either global or not - anything at all. (.bss data is the latter) */ - if ((symbols[j]->flags & BSF_GLOBAL) - || (symbols[j]->flags == BSF_NO_FLAGS)) + anything at all. (.bss data is the latter) + We should not export undefined symbols + */ + if (symbols[j]->section != &bfd_und_section + && ((symbols[j]->flags & BSF_GLOBAL) + || (symbols[j]->flags == BFD_FORT_COMM_DEFAULT_VALUE))) { const char *sn = symbols[j]->name; + + /* we should not re-export imported stuff */ + { + char *name = (char *) xmalloc (strlen (sn) + 2 + 6); + sprintf (name, "%s%s", U("_imp_"), sn); + blhe = bfd_link_hash_lookup (info->hash, name, + false, false, false); + free (name); + + if (blhe && blhe->type == bfd_link_hash_defined) + continue; + } + if (*sn == '_') sn++; - if (auto_export (pe_def_file, sn)) - { - def_file_export *p; - p=def_file_add_export (pe_def_file, sn, 0, -1); - /* Fill data flag properly, from dlltool.c */ - p->flag_data = !(symbols[j]->flags & BSF_FUNCTION); - } + if (auto_export (b, pe_def_file, sn)) + { + def_file_export *p; + p=def_file_add_export (pe_def_file, sn, 0, -1); + /* Fill data flag properly, from dlltool.c */ + p->flag_data = !(symbols[j]->flags & BSF_FUNCTION); + } } } } @@ -350,9 +571,10 @@ { char *tmp = xstrdup (pe_def_file->exports[i].name); *(strchr (tmp, '@')) = 0; - if (auto_export (pe_def_file, tmp)) + if (auto_export (NULL, pe_def_file, tmp)) def_file_add_export (pe_def_file, tmp, - pe_def_file->exports[i].internal_name, -1); + pe_def_file->exports[i].internal_name, + -1); else free (tmp); } @@ -731,6 +953,57 @@ } } + +static struct sec *current_sec; + +void +pe_walk_relocs_of_symbol (info, name, cb) + struct bfd_link_info *info; + CONST char *name; + int (*cb) (arelent *); +{ + bfd *b; + struct sec *s; + + for (b = info->input_bfds; b; b = b->link_next) + { + arelent **relocs; + int relsize, nrelocs, i; + + for (s = b->sections; s; s = s->next) + { + asymbol **symbols; + int nsyms, symsize; + int flags = bfd_get_section_flags (b, s); + + /* Skip discarded linkonce sections */ + if (flags & SEC_LINK_ONCE + && s->output_section == bfd_abs_section_ptr) + continue; + + current_sec=s; + + symsize = bfd_get_symtab_upper_bound (b); + symbols = (asymbol **) xmalloc (symsize); + nsyms = bfd_canonicalize_symtab (b, symbols); + + relsize = bfd_get_reloc_upper_bound (b, s); + relocs = (arelent **) xmalloc ((size_t) relsize); + nrelocs = bfd_canonicalize_reloc (b, s, relocs, symbols); + + for (i = 0; i < nrelocs; i++) + { + struct symbol_cache_entry *sym = *relocs[i]->sym_ptr_ptr; + if (!strcmp(name,sym->name)) cb(relocs[i]); + } + free (relocs); + /* Warning: the allocated symbols are remembered in BFD and reused + later, so don't free them! */ + /* free (symbols); */ + } + } +} + /************************************************************************ Gather all the relocations and build the .reloc section @@ -758,7 +1031,8 @@ for (s = b->sections; s; s = s->next) total_relocs += s->reloc_count; - reloc_data = (reloc_data_type *) xmalloc (total_relocs * sizeof (reloc_data_type)); + reloc_data = + (reloc_data_type *) xmalloc (total_relocs * sizeof (reloc_data_type)); total_relocs = 0; bi = 0; @@ -801,6 +1075,11 @@ for (i = 0; i < nrelocs; i++) { + if (pe_dll_extra_pe_debug) + { + struct symbol_cache_entry *sym = *relocs[i]->sym_ptr_ptr; + printf("rel: %s\n",sym->name); + } if (!relocs[i]->howto->pc_relative && relocs[i]->howto->type != pe_details->imagebase_reloc) { @@ -1039,7 +1318,7 @@ if (pe_def_file->num_exports > 0) { - fprintf (out, "\nEXPORTS\n\n"); + fprintf (out, "EXPORTS\n"); for (i = 0; i < pe_def_file->num_exports; i++) { def_file_export *e = pe_def_file->exports + i; @@ -1445,7 +1724,7 @@ bfd_set_arch_mach (abfd, pe_details->bfd_arch, 0); symptr = 0; - symtab = (asymbol **) xmalloc (10 * sizeof (asymbol *)); + symtab = (asymbol **) xmalloc (11 * sizeof (asymbol *)); tx = quick_section (abfd, ".text", SEC_CODE|SEC_HAS_CONTENTS, 2); id7 = quick_section (abfd, ".idata$7", SEC_HAS_CONTENTS, 2); id5 = quick_section (abfd, ".idata$5", SEC_HAS_CONTENTS, 2); @@ -1455,6 +1734,9 @@ quick_symbol (abfd, U (""), exp->internal_name, "", tx, BSF_GLOBAL, 0); quick_symbol (abfd, U ("_head_"), dll_symname, "", UNDSEC, BSF_GLOBAL, 0); quick_symbol (abfd, U ("_imp__"), exp->internal_name, "", id5, BSF_GLOBAL, 0); + /* symbol to reference ord/name of imported symbol, used to implement + auto-import */ + quick_symbol (abfd, U("_nm__"), exp->internal_name, "", id6, BSF_GLOBAL, 0); if (pe_dll_compat_implib) quick_symbol (abfd, U ("__imp_"), exp->internal_name, "", id5, BSF_GLOBAL, 0); @@ -1553,6 +1835,170 @@ return abfd; } +static bfd * +make_singleton_name_thunk (import, parent) + char *import; + bfd *parent; +{ + /* name thunks go to idata$4 */ + + asection *id4; + unsigned char *d4; + char *oname; + bfd *abfd; + + oname = (char *) xmalloc (20); + sprintf (oname, "nmth%06d.o", tmp_seq); + tmp_seq++; + + abfd = bfd_create (oname, parent); + bfd_find_target (pe_details->object_target, abfd); + bfd_make_writable (abfd); + + bfd_set_format (abfd, bfd_object); + bfd_set_arch_mach (abfd, pe_details->bfd_arch, 0); + + symptr = 0; + symtab = (asymbol **) xmalloc (3 * sizeof (asymbol *)); + id4 = quick_section (abfd, ".idata$4", SEC_HAS_CONTENTS, 2); + quick_symbol (abfd, U ("_nm_thnk_"), import, "", id4, BSF_GLOBAL, 0); + quick_symbol (abfd, U ("_nm_"), import, "", UNDSEC, BSF_GLOBAL, 0); + + bfd_set_section_size (abfd, id4, 8); + d4 = (unsigned char *) xmalloc (4); + id4->contents = d4; + memset (d4, 0, 8); + quick_reloc (abfd, 0, BFD_RELOC_RVA, 2); + save_relocs (id4); + + bfd_set_symtab (abfd, symtab, symptr); + + bfd_set_section_contents (abfd, id4, d4, 0, 8); + + bfd_make_readable (abfd); + return abfd; +} + +char * +make_import_fixup_mark (rel) + arelent *rel; +{ + /* we convert reloc to symbol, for later reference */ + static int counter; + static char fixup_name[300]; + + struct symbol_cache_entry *sym = *rel->sym_ptr_ptr; + + bfd *abfd = bfd_asymbol_bfd (sym); + struct coff_link_hash_entry *myh = NULL; + + sprintf (fixup_name, "__fu%d_%s", counter++, sym->name); + bfd_coff_link_add_one_symbol (&link_info, abfd, fixup_name, BSF_GLOBAL, + current_sec, /* sym->section, */ + rel->address, NULL, true, false, + (struct bfd_link_hash_entry **) &myh); + +/* + printf("type:%d\n",myh->type); + printf("%s\n",myh->root.u.def.section->name); +*/ + return fixup_name; +} + + +/* + * .section .idata$3 + * .rva __nm_thnk_SYM (singleton thunk with name of func) + * .long 0 + * .long 0 + * .rva __my_dll_iname (name of dll) + * .rva __fuNN_SYM (pointer to reference (address) in text) + * + */ + +static bfd * +make_import_fixup_entry (name, fixup_name, dll_symname,parent) + char *name; + char *fixup_name; + char *dll_symname; + bfd *parent; +{ + asection *id3; + unsigned char *d3; + char *oname; + bfd *abfd; + + oname = (char *) xmalloc (20); + sprintf (oname, "fu%06d.o", tmp_seq); + tmp_seq++; + + abfd = bfd_create (oname, parent); + bfd_find_target (pe_details->object_target, abfd); + bfd_make_writable (abfd); + + bfd_set_format (abfd, bfd_object); + bfd_set_arch_mach (abfd, pe_details->bfd_arch, 0); + + symptr = 0; + symtab = (asymbol **) xmalloc (6 * sizeof (asymbol *)); + id3 = quick_section (abfd, ".idata$3", SEC_HAS_CONTENTS, 2); +/* + quick_symbol (abfd, U("_head_"), dll_symname, "", id2, BSF_GLOBAL, 0); +*/ + quick_symbol (abfd, U ("_nm_thnk_"), name, "", UNDSEC, BSF_GLOBAL, 0); + quick_symbol (abfd, U (""), dll_symname, "_iname", UNDSEC, BSF_GLOBAL, 0); + quick_symbol (abfd, "", fixup_name, "", UNDSEC, BSF_GLOBAL, 0); + + bfd_set_section_size (abfd, id3, 20); + d3 = (unsigned char *) xmalloc (20); + id3->contents = d3; + memset (d3, 0, 20); + + quick_reloc (abfd, 0, BFD_RELOC_RVA, 1); + quick_reloc (abfd, 12, BFD_RELOC_RVA, 2); + quick_reloc (abfd, 16, BFD_RELOC_RVA, 3); + save_relocs (id3); + + bfd_set_symtab (abfd, symtab, symptr); + + bfd_set_section_contents (abfd, id3, d3, 0, 20); + + bfd_make_readable (abfd); + return abfd; +} + +void +pe_create_import_fixup (rel) + arelent *rel; +{ + char buf[300]; + struct symbol_cache_entry *sym = *rel->sym_ptr_ptr; + struct bfd_link_hash_entry *name_thunk_sym; + CONST char *name = sym->name; + char *fixup_name = make_import_fixup_mark (rel); + + sprintf (buf, U ("_nm_thnk_%s"), name); + + name_thunk_sym = bfd_link_hash_lookup (link_info.hash, buf, 0, 0, 1); + + if (!name_thunk_sym || name_thunk_sym->type != bfd_link_hash_defined) + { + bfd *b = make_singleton_name_thunk (name, output_bfd); + add_bfd_to_link (b, b->filename, &link_info); + + /* If we ever use autoimport, we have to cast text section writable */ + config.text_read_only = false; + } + + { + extern char *data_import_dll; + bfd *b = make_import_fixup_entry (name, fixup_name, data_import_dll, + output_bfd); + add_bfd_to_link (b, b->filename, &link_info); + } +} + + void pe_dll_generate_implib (def, impfilename) def_file *def; @@ -1628,10 +2074,10 @@ } } -static void +void add_bfd_to_link (abfd, name, link_info) bfd *abfd; - char *name; + CONST char *name; struct bfd_link_info *link_info; { lang_input_statement_type *fake_file; Index: ld/pe-dll.h =================================================================== RCS file: /cvs/src/src/ld/pe-dll.h,v retrieving revision 1.3 diff -u -r1.3 pe-dll.h --- pe-dll.h 2001/03/13 06:14:27 1.3 +++ pe-dll.h 2001/08/02 02:46:16 @@ -33,6 +33,7 @@ extern int pe_dll_stdcall_aliases; extern int pe_dll_warn_dup_exports; extern int pe_dll_compat_implib; +extern int pe_dll_extra_pe_debug; extern void pe_dll_id_target PARAMS ((const char *)); extern void pe_dll_add_excludes PARAMS ((const char *)); @@ -45,4 +46,9 @@ extern void pe_dll_fill_sections PARAMS ((bfd *, struct bfd_link_info *)); extern void pe_exe_fill_sections PARAMS ((bfd *, struct bfd_link_info *)); +extern void pe_walk_relocs_of_symbol PARAMS ((struct bfd_link_info * info, + CONST char *name, + int (*cb) (arelent *))); + +extern void pe_create_import_fixup PARAMS ((arelent * rel)); #endif /* PE_DLL_H */ Index: ld/emultempl/pe.em =================================================================== RCS file: /cvs/src/src/ld/emultempl/pe.em,v retrieving revision 1.45 diff -u -r1.45 pe.em --- pe.em 2001/07/11 08:11:16 1.45 +++ pe.em 2001/08/02 02:46:22 @@ -146,7 +146,9 @@ ldfile_output_architecture = bfd_arch_${ARCH}; output_filename = "${EXECUTABLE_NAME:-a.exe}"; #ifdef DLL_SUPPORT + config.dynamic_link = true; config.has_shared = 1; +/* link_info.pei386_auto_import = true; */ #if (PE_DEF_SUBSYSTEM == 9) || (PE_DEF_SUBSYSTEM == 2) #if defined TARGET_IS_mipspe || defined TARGET_IS_armpe @@ -191,6 +193,9 @@ #define OPTION_DISABLE_AUTO_IMAGE_BASE (OPTION_ENABLE_AUTO_IMAGE_BASE + 1) #define OPTION_DLL_SEARCH_PREFIX (OPTION_DISABLE_AUTO_IMAGE_BASE + 1) #define OPTION_NO_DEFAULT_EXCLUDES (OPTION_DLL_SEARCH_PREFIX + 1) +#define OPTION_DLL_ENABLE_AUTO_IMPORT (OPTION_NO_DEFAULT_EXCLUDES + 1) +#define OPTION_DLL_DISABLE_AUTO_IMPORT (OPTION_DLL_ENABLE_AUTO_IMPORT + 1) +#define OPTION_ENABLE_EXTRA_PE_DEBUG (OPTION_DLL_DISABLE_AUTO_IMPORT + 1) static struct option longopts[] = { /* PE options */ @@ -228,6 +233,9 @@ {"disable-auto-image-base", no_argument, NULL, OPTION_DISABLE_AUTO_IMAGE_BASE}, {"dll-search-prefix", required_argument, NULL, OPTION_DLL_SEARCH_PREFIX}, {"no-default-excludes", no_argument, NULL, OPTION_NO_DEFAULT_EXCLUDES}, + {"enable-auto-import", no_argument, NULL, OPTION_DLL_ENABLE_AUTO_IMPORT}, + {"disable-auto-import", no_argument, NULL, OPTION_DLL_DISABLE_AUTO_IMPORT}, + {"enable-extra-pe-debug", no_argument, NULL, OPTION_ENABLE_EXTRA_PE_DEBUG}, #endif {NULL, no_argument, NULL, 0} }; @@ -313,6 +321,11 @@ fprintf (file, _(" --dll-search-prefix= When linking dynamically to a dll witout an\n")); fprintf (file, _(" importlib, use .dll \n")); fprintf (file, _(" in preference to lib.dll \n")); + fprintf (file, _(" --enable-auto-import Do sophistcated linking of _sym to \n")); + fprintf (file, _(" __imp_sym for DATA references\n")); + fprintf (file, _(" --disable-auto-import Do not auto-import DATA items from DLLs\n")); + fprintf (file, _(" --enable-extra-pe-debug Enable verbose debug output when building\n")); + fprintf (file, _(" or linking to DLLs (esp. auto-import)\n")); #endif } @@ -583,6 +596,15 @@ case OPTION_NO_DEFAULT_EXCLUDES: pe_dll_do_default_excludes = 0; break; + case OPTION_DLL_ENABLE_AUTO_IMPORT: + link_info.pei386_auto_import = true; + break; + case OPTION_DLL_DISABLE_AUTO_IMPORT: + link_info.pei386_auto_import = false; + break; + case OPTION_ENABLE_EXTRA_PE_DEBUG: + pe_dll_extra_pe_debug = 1; + break; #endif } return 1; @@ -733,6 +755,11 @@ static int gave_warning_message = 0; struct bfd_link_hash_entry *undef, *sym; char *at; + if (pe_dll_extra_pe_debug) + { + printf (__FUNCTION__"\n"); + } + for (undef = link_info.hash->undefs; undef; undef=undef->next) if (undef->type == bfd_link_hash_undefined) { @@ -791,11 +818,122 @@ } } } + +static int +make_import_fixup (rel) + arelent *rel; +{ + struct symbol_cache_entry *sym = *rel->sym_ptr_ptr; +/* + bfd *b; +*/ + + if (pe_dll_extra_pe_debug) + { + printf ("arelent: %s@%#x: add=%li\n", sym->name, + (int) rel->address, rel->addend); + } + pe_create_import_fixup (rel); + return 1; +} + +char *data_import_dll; + +static void +pe_find_data_imports () +{ + struct bfd_link_hash_entry *undef, *sym; + for (undef = link_info.hash->undefs; undef; undef=undef->next) + { + if (undef->type == bfd_link_hash_undefined) + { + /* C++ symbols are *long* */ + char buf[4096]; + if (pe_dll_extra_pe_debug) + { + printf (__FUNCTION__":%s\n", undef->root.string); + } + sprintf (buf, "__imp_%s", undef->root.string); + + sym = bfd_link_hash_lookup (link_info.hash, buf, 0, 0, 1); + if (sym && sym->type == bfd_link_hash_defined) + { + einfo (_("Warning: resolving %s by linking to %s (auto-import)\n"), + undef->root.string, buf); + { + bfd *b = sym->u.def.section->owner; + asymbol **symbols; + int nsyms, symsize, i; + + symsize = bfd_get_symtab_upper_bound (b); + symbols = (asymbol **) xmalloc (symsize); + nsyms = bfd_canonicalize_symtab (b, symbols); + + for (i = 0; i < nsyms; i++) + { + if (memcmp(symbols[i]->name, "__head_", + sizeof ("__head_") - 1)) + continue; + if (pe_dll_extra_pe_debug) + { + printf ("->%s\n", symbols[i]->name); + } + data_import_dll = (char*) (symbols[i]->name + + sizeof ("__head_") - 1); + break; + } + } + + pe_walk_relocs_of_symbol (&link_info, undef->root.string, + make_import_fixup); + + /* let's differentiate it somehow from defined */ + undef->type = bfd_link_hash_defweak; + /* we replace original name with __imp_ prefixed, this + 1) may trash memory 2) leads to duplicate symbol generation. + Still, IMHO it's better than having name poluted. */ + undef->root.string = sym->root.string; + undef->u.def.value = sym->u.def.value; + undef->u.def.section = sym->u.def.section; + } + } + } +} #endif /* DLL_SUPPORT */ +static boolean +pr_sym (h, string) + struct bfd_hash_entry *h; + PTR string; +{ + if (pe_dll_extra_pe_debug) + { + printf("+%s\n",h->string); + } + return true; +} + + static void gld_${EMULATION_NAME}_after_open () { + + if (pe_dll_extra_pe_debug) + { + bfd *a; + struct bfd_link_hash_entry *sym; + printf (__FUNCTION__"()\n"); + + for (sym = link_info.hash->undefs; sym; sym=sym->next) + printf ("-%s\n", sym->root.string); + bfd_hash_traverse (&link_info.hash->table, pr_sym,NULL); + + for (a = link_info.input_bfds; a; a = a->link_next) + { + printf("*%s\n",a->filename); + } + } + /* Pass the wacky PE command line options into the output bfd. FIXME: This should be done via a function, rather than by including an internal BFD header. */ @@ -810,6 +948,8 @@ if (pe_enable_stdcall_fixup) /* -1=warn or 1=disable */ pe_fixup_stdcalls (); + pe_find_data_imports (output_bfd, &link_info); + pe_process_import_defs(output_bfd, &link_info); if (link_info.shared) pe_dll_build_sections (output_bfd, &link_info); @@ -1251,6 +1391,16 @@ if (pe_out_def_filename) pe_dll_generate_def_file (pe_out_def_filename); #endif /* DLL_SUPPORT */ + + /* I don't know where .idata gets set as code, but it shouldn't be */ + { + asection *asec = bfd_get_section_by_name (output_bfd, ".idata"); + if (asec) + { + asec->flags &= ~SEC_CODE; + asec->flags |= SEC_DATA; + } + } } --------------060507050606040700080303--