Mail Archives: pgcc/2001/03/20/09:31:40
I have a straightforward piece of code that needs to be well
optimized. Since it's VERY straightforward, I'd suppose gcc not having
problems with it. However, all versions I tested (egcs 1.1.2, pgcc-2.95.2
19991024, AthlonGCC) have the same thing that looks very much like it's
just generating completely useless instructions.
In the following three lines of codes are shown. As you can see, the
result of (a<<19)-(a<<7) is not only stored to ntt_block_p[32*17] (into
which ESI seems point to) but also on ESP, on stack. Why this extra store?
It seems that GCC decides to cache the value to stack, from which it is
later fetched as if stack would be faster than accessing the array pointed
by ESI. But since I can not see any good reason why this would be true,
for me it looks like GCC is generating lots of completely useless memory
stores.
How can I get rid of them?
typedef int Ntt_type;
static void ntt_transform(unsigned char *block, int block_offset, Ntt_type *ntt_block) {
        unsigned char *block_p;
        Ntt_type *ntt_block_p;
        int rc;
        Ntt_type a, b;
        block_p = block;
        ntt_block_p = ntt_block;
a=block_p[block_offset*1];
      bb:       8b 84 24 94 04 00 00    mov    0x494(%esp,1),%eax
      c2:       0f b6 3c 18             movzbl (%eax,%ebx,1),%edi
ntt_block_p[32*1]=a;
      c6:       89 be 80 00 00 00       mov    %edi,0x80(%esi)
ntt_block_p[32*17]= (a<<19)-(a<<7);
      cc:       89 f8                   mov    %edi,%eax
      ce:       c1 e0 07                shl    $0x7,%eax
      d1:       c1 e7 13                shl    $0x13,%edi
      d4:       29 c7                   sub    %eax,%edi
      d6:       89 bc 24 70 04 00 00    mov    %edi,0x470(%esp,1)
      dd:       89 be 80 08 00 00       mov    %edi,0x880(%esi)
- Raw text -