Mail Archives: djgpp/1998/08/19/15:58:49
On 19 Aug 98 at 9:08, GAMMELJL AT SLU DOT EDU wrote:
> The C++ compiler seems to have a slightly different programming
> style than george DOT foot AT merton DOT oxford DOT ac DOT uk.
>
> Here is how the gnu compiler programs a subroutine which has two
> arguments, both of which are arrays. The C++ version of sub is:
> sub(unsigned int *z1,unsigned int *z2)
> {i=666; //a line to mark beginning of code
> .C++ CODE
> return 0; //xorl %eax,%eax marks end of code
> }
>
> .globl _sub__FPUiT0
> _sub__FPUiT0:
> pushl %ebp The first eight lines are straightforward
> movl %esp,%ebp but I do not relate them well to the
> subl $40,%esp lines after xorl %eax,%eax below.
So now ESP = EBP - 40...
> pushl %edi Will be used for origin of one of the arrays.
> pushl %esi Will be used for origin of the other array.
> pushl %ebx Will be used to number elements of the arrays.
... and now ESP = EBP - 52. (3 pushls = 12 bytes; 40+12 = 52)
> movl 8(%ebp),%edi Origin of z1 stored in edi.
> movl 12(%ebp),%esi Origin of z2 stored in esi.
> movl $666,_i Marks beginning of C++ code.
> .followed by lines of assembly language
> .easily recognizable as arising from CODE.
Be careful about this sort of assumption; the compiler sometimes
reorders instructions, combines several into one operation, or just
doesn't generate any code for them. When optimisation is switched
off you can rely on being able to see exactly which code came from
which instruction, but with optimisation turned on at any level it's
not so easy to tell.
> . The stack is addressed frequently using N(%ebp)
> .where N is a negative multiple of 4 (-40<=N<=-4)
Although that data is in the stack, it is actually the local variable
space allocated by the compiler, when it subtracted 40 from ESP
earlier on.
> .ebp is not changed during the program.
> .esp is not changed during the program.
> .
> xorl %eax,%eax Marks the end of the C++ code.
> leal -52(%ebp),%esp My question is: What is going on
`leal' = Load Effective Address (Long); it causes the processor to
work out which memory location would be referred to by the first
(indirect) operand (in this case `-52(%ebp)'), and store that address
(not its contents as usual) in the register specified by the second
operand (in this case `%esp'). So it effectively means "set ESP to
EBP - 52". It's restoring ESP to its value as I calculated above.
If ESP hasn't actually changed, then I can only assume that you
didn't use optimisations; if you don't optimise, gcc emits a lot of
redundant code like this.
I've never seen this before in gcc-produced code, but maybe it's only
produced by the C++ compiler. I wondered whether perhaps the C++
compiler subtracts from ESP on the fly when it needs small-scope
variables (e.g. when you declare them in the middle of a block), but
I don't think it does; the C compiler doesn't (space for local
variables in functions is allocated at the start of the function, for
all local variables including those in blocks within the function),
and this code clearly reserves 40 bytes for local variables on entry
to the function.
In this case the `leal' line could be an automatic line in the exit
code that the compiler inserts regardless of whether ESP has changed,
knowing that it will be optimised out later on if ESP hasn't actually
changed.
> popl %ebx in these remaining lines?
> popl %esi I understand the 3 popl, but not
> popl %edi the "leal -52 etc." and the "leave"
> leave or why the order of the commands is
> ret what it is. Notice that there is no
> .align 2 popl %ebp. What does "leave" do?
`leave' was meant to be paired with `enter'. `enter' saves the
values of ESP and EBP, and possibly allocates space for local
variables for you. `leave' restores ESP and EBP to their old values.
I think `leave' is a single byte opcode, so it's more efficient
memory-wise to use it; I'm not sure about the timings, perhaps it's
faster to execute too.
I don't have any documentation saying explicitly what `enter' and
`leave' do to registers, only how they work as a pair. I've never
used them, but judging by the way gcc uses `leave' I assume it's
equivalent to:
movl %ebp, %esp
popl %ebp
I also assume (working backwards from that assumption) that "enter X"
is roughly equivalent to:
pushl %ebp
movl %esp, %ebp
subl $X, %esp
(actually the AT&T syntax for "enter X" is probably "enter $X").
> A simpler example has the following source code:
>
> sub2(int a,int b)
> {i1=666; //int i1 is a global variable
> return a+b;}
>
> For sub2 gnu compiler produces:
>
> .globl _zadd__FUiUi
> _zadd__FUiUi:
> pushl %ebp
> movl %esp,%ebp
> movl $666,_i1 Marks beginning of C++ code.
> movl 8(%ebp),%eax
> addl 12(%ebp),%eax Marks end of C++ code
> leave
> ret
>
> The gnu compiler does not restore esp (no reason to: it was never
> changed) or (unless leave does it) ebp. But if sub2 is used many
> times in the code so that ebp is pushed many times, isn't there a
> danger of stack overflow (or memory leak)?
See the note above re the meaning of `leave'.
> George Foot is always
> careful to popl %ebp at the end of the subroutine after restoring
> esp (if it has been changed with a subl $4,%esp as on page 4 of
> his notes,for instance) with movl %ebp,%esp. George Foot's
> programming style is straightforward. Why the difference in
> the two styles?
Because I'm human and gcc is a computer program; also, gcc
understands the assembler a lot better than I do. ;) For me, the
most important thing is clarity -- the code must be understandable by
humans. As you've found, gcc-compiled code isn't always easy to
follow.
--
george DOT foot AT merton DOT oxford DOT ac DOT uk
- Raw text -