Mail Archives: djgpp/2000/04/13/18:10:19
Alexei A. Frounze wrote:
>Dieter Buerssner wrote:
>> It would be interesting to know, what the performance difference
>> of this code and the code without the inline assembly was.
>
>Well, I don't think it's possible to write extremely fast 3d renderer
>w/o ASM at all (at least on i386+ CPUs). Don't you think Wolf3d, Doom,
>Descent and Quake would not be as fast as they are, if they were written
>in pure C (even Watcom C, which was one of the best compilers then)?
I have not said, that the plain C code would be faster or slower.
I just asked a question, that may be not to difficult to answer
for you, because the C code is already there, in comments.
>> But, why use this? Gcc will most probably produce exactly the
>> same code by
>>
>> du >>= SUB_BITS;
>> dv >>= SUB_BITS;
>
>It will load EAX, shift EAX and put the result back instead of plane
>shift using a memory reference. At least I saw this in disassembly.
Not here. When compiling your code with gcc -O -S (gcc 2.95.2),
for the interesting lines
du = u2 - u1;
dv = v2 - v1;
// sar du, SUB_BITS
// sar dv, SUB_BITS
__asm__ __volatile__ ("
sarl %2, (%0)
sarl %2, (%1)"
:
: "g" (&du), "g" (&dv), "g" (SUB_BITS)
);
I get the following assembler output:
[All other parts snipped]
movl -132(%ebp),%ebx
subl %edi,%ebx
movl %ebx,-184(%ebp)
movl -136(%ebp),%ebx
subl %esi,%ebx
leal -96(%ebp),%edx
leal -100(%ebp),%eax
/APP
sarl $4, (%edx)
sarl $4, (%eax)
/NO_APP
So, this happens to produce correct code, even if the inline
assembly is wrong, as I explained in another post.
If I use the C code, that is cited above, the output is
the following:
movl -132(%ebp),%ebx
subl %edi,%ebx
movl -136(%ebp),%edx
subl %esi,%edx
sarl $4,%ebx
sarl $4,%edx
So, which do you think is more efficient?
Even when the to shifted values are not in registers, gcc will
usually produce code like
sarl $4, -4(%ebp)
--
Regards, Dieter
- Raw text -