delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/1999/09/27/15:07:04

Message-Id: <3.0.32.19990927171139.00c74200@pop.xs4all.nl>
X-Sender: diep AT pop DOT xs4all DOT nl
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Mon, 27 Sep 1999 17:11:42 +0200
To: pgcc AT delorie DOT com
From: Vincent Diepeveen <diep AT xs4all DOT nl>
Subject: a C routine to optimize GCC/PGCC for
Mime-Version: 1.0
Reply-To: pgcc AT delorie DOT com

Hello,

i've written a short routine. don't watch the 
variable names, they're picked randomly for the globals.
It's about the code and optimization to assembler of it!

  int *board,*sweep,*Pindex,*snelbord;

or something like:

  int board[64],sweep[20],Pindex[64],snelbord[64];

The above definition of arrays/pointers should
not matter for the optimization of 'tryalles'.

int tryalles(void){
   int ut,*va,ua,summation=0;

   for( ua = 0 ; ua < 16 ; ua++ ) {
     va = Pindex;
     ut = 0;
     if( !sweep[snelbord[ua]] )
       ut = board[ua];
     va += ut;
     summation += *va;
   }
   return(summation);
}


        .align 4
.globl tryalles
        .type    tryalles,@function
tryalles:
        pushl %ebp
        movl %esp,%ebp
        pushl %edi
        pushl %esi
        pushl %ebx
        xorl %esi,%esi
        movl $sweep,%edi
        xorl %ecx,%ecx
        movl $15,%ebx
        .p2align 4,,7
.L6:
        movl snelbord(%ecx),%eax
        xorl %edx,%edx
        sall $2,%eax
        cmpl $0,(%eax,%edi)
        jne .L7
        movl board(%ecx),%edx
.L7:
        addl Pindex(,%edx,4),%esi
        addl $4,%ecx
        decl %ebx
        jns .L6
        movl %esi,%eax
        popl %ebx
        popl %esi
        popl %edi
        movl %ebp,%esp
        popl %ebp
        ret
.Lfe1:
        .size    tryalles,.Lfe1-tryalles
        .align 4

Suffering 10-15 clocks for a branch misprediction is major, it 
a few instructions more to prevent that penalty can get done 
at a rate of 3 instructions a clock!

I would like to zoom in into the invariant:

First the C invariant:
     va = Pindex;
     ut = 0;
     if( !sweep[snelbord[ua]] )
       ut = board[ua];
     va += ut;
     summation += *va;

Now how this is currently translated to 32 bits assembler:
.L6:
        movl snelbord(%ecx),%eax
        xorl %edx,%edx
        sall $2,%eax           <== where do we need this shift instruction
for?
        cmpl $0,(%eax,%edi)             
        jne .L7                <== We don't want to have this JNE!
        movl board(%ecx),%edx
.L7:
        addl Pindex(,%edx,4),%esi
        addl $4,%ecx
        decl %ebx
        jns .L6
  
If i look to an example like this i directly see the use of
some more registers than intel has... ...however i would like to replace
the next 3 lines
by pentiumpro instructions. I don't care how much lines it gets replaced with,
because here in my program i'm suffering in a lot of cases a huge penalty!

        cmpl $0,(%eax,%edi)             
        jne .L7                <== We don't want to have this JNE!
        movl board(%ecx),%edx
       .L7:

Is it so hard to replace the above by PRO instructions?

Greetings,
Vincent




Vincent Diepeveen
diep AT xs4all DOT nl

---
...en verder ben ik van mening dat Dap het heelal in 
dient te worden gestraald...

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019