Mail Archives: pgcc/2000/12/07/07:42:55
Look at the following routine:
#define GETBITS(x,a,b) (((x)>>(a)) & ((1ULL<<((b)-(a)+1ULL)) - 1ULL))
#define Ntt_type unsigned int
#define Long_Ntt_type unsigned long long int
Ntt_type ntt_mul_1(Ntt_type a, Ntt_type b) {
Long_Ntt_type al, bl, rl, lo, hi;
Ntt_type r;
al = a;
bl = b;
rl = al * bl;
r = (Ntt_type)(rl>>1);
return r;
}
I compiled it with options desribed in www.athlonlinux.org pages:
-s -O3 -fomit-frame-pointer -Wall -mpentiumpro -march=pentiumpro
-malign-functions=4 -funroll-loops -fexpensive-optimizations
-malign-double -fschedule-insns2 -S (-mwide-multiply had no effect). I'm
using pgcc-2.95.2 (from --version).
The generated code is horrible. The major weird thing is that pgcc
generates _three_ multiplying instructions. It should be clear that just
one would be necessary since clearly both input values are just
32-bits. Ok, maybe the compiler is just stupid... but try removing the one
right shift at the end. Well, surprisingly, now pgcc is actually smart
enough to use just one multiplication!
The weirdness is that small change at _output_ changes how well pgcc
understands the _input_ numbers. Hmm, now that I think it probably
doesn't. But the bits 32..63 of input numbers have no effect on output
bits 0..31 unless you do the right shift. So that must be the reason.
[ For the original non-simplified version of this routine, pgcc generated
68 instructions with 3 multiplications and it is easily possible to do
in 16 instructions and one multiplication by hand. So GCC still
doesn't replace assembly programmer...]
Ps. I don't want to say that pgcc wouldn't be good, I'm just pointing out
a weakness in it. Althought the developers probably know already that, i
thought this is interesting.
Generated asm code follows:
ntt_mul_1:
subl $28,%esp
pushl %ebp
pushl %edi
pushl %esi
pushl %ebx
movl 48(%esp),%ecx
movl 52(%esp),%eax
xorl %edx,%edx
movl %eax,24(%esp)
movl %edx,28(%esp)
mull %ecx
movl 28(%esp),%esi
imull %ecx,%esi
xorl %ebx,%ebx
movl %eax,%edi
movl 24(%esp),%eax
imull %ebx,%eax
movl %edx,%ebp
addl %esi,%ebp
addl %eax,%ebp
shrdl $1,%ebp,%edi
shrl $1,%ebp
movl %edi,%eax
popl %ebx
popl %esi
popl %edi
popl %ebp
addl $28,%esp
ret
- Raw text -