Mail Archives: pgcc/1999/09/27/15:07:04
Hello,
i've written a short routine. don't watch the
variable names, they're picked randomly for the globals.
It's about the code and optimization to assembler of it!
int *board,*sweep,*Pindex,*snelbord;
or something like:
int board[64],sweep[20],Pindex[64],snelbord[64];
The above definition of arrays/pointers should
not matter for the optimization of 'tryalles'.
int tryalles(void){
int ut,*va,ua,summation=0;
for( ua = 0 ; ua < 16 ; ua++ ) {
va = Pindex;
ut = 0;
if( !sweep[snelbord[ua]] )
ut = board[ua];
va += ut;
summation += *va;
}
return(summation);
}
.align 4
.globl tryalles
.type tryalles,@function
tryalles:
pushl %ebp
movl %esp,%ebp
pushl %edi
pushl %esi
pushl %ebx
xorl %esi,%esi
movl $sweep,%edi
xorl %ecx,%ecx
movl $15,%ebx
.p2align 4,,7
.L6:
movl snelbord(%ecx),%eax
xorl %edx,%edx
sall $2,%eax
cmpl $0,(%eax,%edi)
jne .L7
movl board(%ecx),%edx
.L7:
addl Pindex(,%edx,4),%esi
addl $4,%ecx
decl %ebx
jns .L6
movl %esi,%eax
popl %ebx
popl %esi
popl %edi
movl %ebp,%esp
popl %ebp
ret
.Lfe1:
.size tryalles,.Lfe1-tryalles
.align 4
Suffering 10-15 clocks for a branch misprediction is major, it
a few instructions more to prevent that penalty can get done
at a rate of 3 instructions a clock!
I would like to zoom in into the invariant:
First the C invariant:
va = Pindex;
ut = 0;
if( !sweep[snelbord[ua]] )
ut = board[ua];
va += ut;
summation += *va;
Now how this is currently translated to 32 bits assembler:
.L6:
movl snelbord(%ecx),%eax
xorl %edx,%edx
sall $2,%eax <== where do we need this shift instruction
for?
cmpl $0,(%eax,%edi)
jne .L7 <== We don't want to have this JNE!
movl board(%ecx),%edx
.L7:
addl Pindex(,%edx,4),%esi
addl $4,%ecx
decl %ebx
jns .L6
If i look to an example like this i directly see the use of
some more registers than intel has... ...however i would like to replace
the next 3 lines
by pentiumpro instructions. I don't care how much lines it gets replaced with,
because here in my program i'm suffering in a lot of cases a huge penalty!
cmpl $0,(%eax,%edi)
jne .L7 <== We don't want to have this JNE!
movl board(%ecx),%edx
.L7:
Is it so hard to replace the above by PRO instructions?
Greetings,
Vincent
Vincent Diepeveen
diep AT xs4all DOT nl
---
...en verder ben ik van mening dat Dap het heelal in
dient te worden gestraald...
- Raw text -