Mail Archives: djgpp/2001/06/28/16:12:18
On Wed, 27 Jun 2001 13:57:24 -0400, Ron Natalie <ron AT sensor DOT com> sat
on a tribble, which squeaked:
>> > 00000000 55 push ebp
>> > 00000001 89E5 mov ebp,esp
>> > 00000003 83EC04 sub esp,byte +0x18
>> > 00000006 C745FC78563412 mov dword [ebp-0x4],0x12345678
>> > 0000000D C9 leave
>> > 0000000E C3 ret
>If you read before posting, you'll see the third line allocates 4
>bytes. 83 is the opcode for immediate subtract, where the EC indicates
>the register ESP operand and 04 is the value 4 being s subtracted.
I suppose I'll be flamed on "extra crispy" for asking a second "silly
question" but ... is this disassembly output wrong or is it merely
misleading or confusing? Let's see that line again:
>> > 00000003 83EC04 sub esp,byte +0x18
It's doing something with 0x04, all right, but the disassembly says
something about 0x18. Where's the 0x18 coming from? It sure doesn't
seem to be in the actual machine code itself! We can certainly forgive
the original poster some confusion, given this ... peculiarity.
Another thing that's a bit puzzling is that it seems like the runtime
keeps a stack pointer and writes *behind* it, e.g. mov dword
[ebp-0x4], foo. This, it seems to me, is a bit like writing
while (bar) {
i++;
quux(i-1);
}
in place of
while (bar) {
quux(i);
i++;
}
.
Of course, it may well be (and almost certainly is) that there is a
good reason for the way it's done above -- with the weird tricks some
chips allow these days, it might even be because it's faster. It still
looks odd (and makes me wonder whether the chip designs of the present
day are the results of geniuses or idiot savants...) Also, maybe
turning on more optimizations would produce the "expected" behavior,
i.e. you stuff the box first and then increment/decrement the counter.
You may flame me here too, for "second-guessing the compiler",
although I'm not -- I'm just curious as to why it does things a
certain way, and why bass-ackward looking code sometimes is actually
faster on modern CPUs. Maybe it's a nonlinear thing... caches and
pipelining introduce a lot of weird dependencies that don't change
semantics but dramatically affect speed. Maybe we need to hand over
all machine code generation to software now -- and then to some bottom
up system like a Darwin algorithm at that. (I can just see it now --
gcc 5.x compiling dozens of versions of a function with the same
semantics but various instruction orders, then benchmarking them
internally, and then mating the fastest versions ......... I guess it
would actually be mating some kind of tree-like representations that
account for order of side effects in their structural constraints...)
--
Bill Gates: "No computer will ever need more than 640K of RAM." -- 1980
"There's nobody getting rich writing software that I know of." -- 1980
"This antitrust thing will blow over." -- 1998
Combine neo, an underscore, and one thousand sixty-one to make my hotmail addy.
- Raw text -