Mail Archives: cygwin/2001/08/26/18:35:51
DJ Delorie wrote:
>>Well, that's interesting. Since arrays ARE pointers(*), then perhaps
>>it's enough to change gcc's behavior from
>>
>
> Not from gcc's perspective. From C's perspective, array symbols and
> pointer symbols are mostly interchangeable, but they are not the same.
> For example, these two declarations:
>
> extern char *foo;
> extern char foo[];
>
> are *not* the same, and using the wrong one results in a broken
> program.
Thanks for not ridiculing my thinko. I knew this.
> For our purposes, a pointer is a symbol referencing a four-byte range
> of memory that holds the address of a range of memory that holds a
> sequence of characters, and an array is a symbol referencing a range
> of memory that holds a sequence of characters. Because a pointer
> requires an extra indirection, gcc is limited in the optimizations it
> can do on it, but dealing with imports becomes simpler because the
> address occurs in exactly one place.
>
> Since a symbol is always a constant (regardless of what it refers to),
> offsetting it by a constant results in a sum that can always be
> computed at compile time (well, link time) and gcc will always do it
> that way. This is a fairly fundamental concept in gcc, and I doubt it
> would be practical to tell gcc to do it otherwise.
>
AHA! But that the auto-import code replaces the extra indirection (for
DATA access into a DLL) with the actual address in the loaded DLL. (see
docs pasted below). Perhaps the auto-import needs to create additional
pseudo-symbols for index-array access. E.g.
hwstr
hwstr[1]
hwstr[2]
hwstr[12]
could each be mapped to *different* "fake" symbols. The, the runtime
loader would just replace them as before -- but this time, with the
correct (offset) address in the DLL.
Downside: could lead to an explosion of symbols, if there's a lot of
constant-offset indexing into arrays exported by the DLL. (Variable
offsets are computed at runtime, of course. No problem there. And it
seems that ONLY arrays are subject to this problem...if I understand
correctly)
Oh shoot. I just realized that the above is garbage. How will the DLL
know *which* fake symbols to export? It can't know how an external
client will access an array variable, so the DLL has to export fake
symbols for every conceivable constant index. This is *possible* --
since we're talking about arrays (e.g. with fixed length; these are
*not* pointers <g>) -- but not really practical. A simple array
foo[4096] leads to 4097 exported symbols. No, that's just silly.
I'm going back to square one on this problem. I'm out of ideas on this
one. Paul? Paaauuulll?
FWIW, this is what a disassembly of hello.exe looks like (no declspec
decorators, using the auto-import stuff. Notice the "fixup" labels
__fuN__symbol):
00401044 <_main>:
401044: 55 push %ebp
401045: 89 e5 mov %esp,%ebp
401047: 83 ec 18 sub $0x18,%esp
40104a: e8 8d 00 00 00 call 4010dc <___main>
40104f: c6 05 04 41 40 00 21 movb $0x21,0x404104
^^^^^^^^
this is off by 12
00401051 <__fu0__hwstr1>:
401051: 04 41 add $0x41,%al
401053: 40 inc %eax
401054: 00 21 add %ah,(%ecx)
401056: c7 45 fc fc 40 40 00 movl $0x4040fc,0xfffffffc(%ebp)
00401059 <__fu2__hwstr2>:
401059: fc cld
40105a: 40 inc %eax
40105b: 40 inc %eax
40105c: 00 8b 45 fc 83 c0 add %cl,0xc083fc45(%ebx)
401062: 0c c6 or $0xc6,%al
401064: 00 21 add %ah,(%ecx)
401066: 83 c4 f4 add $0xfffffff4,%esp
401069: 68 f8 40 40 00 push $0x4040f8
0040106a <__fu1__hwstr1>:
40106a: f8 clc
40106b: 40 inc %eax
40106c: 40 inc %eax
40106d: 00 e8 add %ch,%al
40106f: 71 00 jno 401071 <__fu1__hwstr1+0x7>
401071: 00 00 add %al,(%eax)
401073: 83 c4 10 add $0x10,%esp
401076: 83 c4 f4 add $0xfffffff4,%esp
401079: 68 fc 40 40 00 push $0x4040fc
0040107a <__fu3__hwstr2>:
40107a: fc cld
40107b: 40 inc %eax
40107c: 40 inc %eax
40107d: 00 e8 add %ch,%al
40107f: 61 popa
401080: 00 00 add %al,(%eax)
401082: 00 83 c4 10 31 c0 add %al,0xc03110c4(%ebx)
401088: eb 02 jmp 40108c <__fu3__hwstr2+0x12>
40108a: 89 f6 mov %esi,%esi
40108c: 89 ec mov %ebp,%esp
40108e: 5d pop %ebp
40108f: c3 ret
Funky, huh?
--Chuck
Quoting from the pe-dll.c:
------------------------------------
Auto-import feature by Paul Sokolovsky
Quick facts:
1. With this feature on, DLL clients can import variables from DLL
without any concern from their side (for example, without any source
code modifications).
2. This is done completely in bounds of the PE specification (to be
fair, there's a place where it pokes nose out of, but in practise it
works). So, resulting module can be used with any other PE compiler/linker.
3. Auto-import is fully compatible with standard import method and
they can be mixed together.
4. Overheads: space: 8 bytes per imported symbol, plus 20 for each
reference to it; load time: negligible; virtual/physical memory: should
be less than effect of DLL relocation, and I sincerely hope it doesn't
affect DLL sharability (too much).
Idea
The obvious and only way to get rid of dllimport insanity is to make
client access variable directly in the DLL, bypassing extra dereference.
I.e., whenever client contains someting like
mov dll_var,%eax,
address of dll_var in the command should be relocated to point into
loaded DLL. The aim is to make OS loader do so, and than make ld help
with that. Import section of PE made following way: there's a vector of
structures each describing imports from particular DLL. Each such
structure points to two other parellel vectors: one holding imported
names, and one which will hold address of corresponding imported name.
So, the solution is de-vectorize these structures, making import
locations be sparse and pointing directly into code. Before continuing,
it is worth a note that, while authors strives to make PE act ELF-like,
there're some other people make ELF act PE-like: elfvector, ;-) .
Implementation
For each reference of data symbol to be imported from DLL (to set of
which belong symbols with name <sym>, if __imp_<sym> is found in
implib), the import fixup entry is generated. That entry is of type
IMAGE_IMPORT_DESCRIPTOR and stored in .idata$3 subsection. Each fixup
entry contains pointer to symbol's address within .text section (marked
with __fuN_<sym> symbol, where N is integer), pointer to DLL name (so,
DLL name is referenced by multiple entries), and pointer to symbol name
thunk. Symbol name thunk is singleton vector (__nm_th_<symbol>) pointing
to IMAGE_IMPORT_BY_NAME structure (__nm_<symbol>) directly containing
imported name. Here comes that "om the edge" problem mentioned above: PE
specification rambles that name vector (OriginalFirstThunk) should run
in parallel with addresses vector (FirstThunk), i.e. that they (so, DLL
name is referenced by multiple entries), and pointer to symbol name
thunk. Symbol name thunk is singleton vector (__nm_th_<symbol>) pointing
to IMAGE_IMPORT_BY_NAME structure (__nm_<symbol>) directly containing
imported name. Here comes that "om the edge" problem mentioned above: PE
specification rambles that name vector (OriginalFirstThunk) should run
in parallel with addresses vector (FirstThunk), i.e. that they should
have same number of elements and terminated with zero. We violate this,
since FirstThunk points directly into machine code. But in practise, OS
loader implemented the sane way: it goes thru OriginalFirstThunk and
puts addresses to FirstThunk, not something else. It once again should
be noted that dll and symbol name structures are reused across fixup
entries and should be there anyway to support standard import stuff, so
sustained overhead is 20 bytes per reference. Other question is whether
having several IMAGE_IMPORT_DESCRIPTORS for the same DLL is possible.
Answer is yes, it is done even by native compiler/linker (libth32's
functions are in fact reside in windows9x kernel32.dll, so if you use
it, you have two IMAGE_IMPORT_DESCRIPTORS for kernel32.dll). Yet other
question is whether referencing the same PE structures several times is
valid. The answer is why not, prohibitting that (detecting violation)
would require more work on behalf of loader than not doing it.
--------------------------------------------
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -