delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/2000/12/07/07:42:55

Date: Thu, 7 Dec 2000 14:08:03 +0200
From: Tuukka Toivonen <tutoivon AT mail DOT student DOT oulu DOT fi>
X-Sender: tutoivon AT paju DOT oulu DOT fi
To: pgcc AT delorie DOT com
Subject: horrible code
Message-ID: <Pine.SGI.4.21.0012071329000.6451079-100000@paju.oulu.fi>
MIME-Version: 1.0
Reply-To: pgcc AT delorie DOT com

Look at the following routine:

#define GETBITS(x,a,b)  (((x)>>(a)) & ((1ULL<<((b)-(a)+1ULL)) - 1ULL))
#define Ntt_type unsigned int
#define Long_Ntt_type unsigned long long int
Ntt_type ntt_mul_1(Ntt_type a, Ntt_type b) {
        Long_Ntt_type al, bl, rl, lo, hi;
        Ntt_type r;
        al = a;
        bl = b;
        rl = al * bl;
        r  = (Ntt_type)(rl>>1);
        return r;
}

I compiled it with options desribed in www.athlonlinux.org pages:
-s -O3 -fomit-frame-pointer -Wall -mpentiumpro -march=pentiumpro
-malign-functions=4 -funroll-loops -fexpensive-optimizations
-malign-double -fschedule-insns2 -S (-mwide-multiply had no effect). I'm
using pgcc-2.95.2 (from --version).

The generated code is horrible. The major weird thing is that pgcc
generates _three_ multiplying instructions. It should be clear that just
one would be necessary since clearly both input values are just
32-bits. Ok, maybe the compiler is just stupid... but try removing the one
right shift at the end. Well, surprisingly, now pgcc is actually smart
enough to use just one multiplication!

The weirdness is that small change at _output_ changes how well pgcc
understands the _input_ numbers. Hmm, now that I think it probably
doesn't. But the bits 32..63 of input numbers have no effect on output
bits 0..31 unless you do the right shift. So that must be the reason.

[ For the original non-simplified version of this routine, pgcc generated
  68 instructions with 3 multiplications and it is easily possible to do
  in 16 instructions and one multiplication by hand. So GCC still
  doesn't replace assembly programmer...]

Ps. I don't want to say that pgcc wouldn't be good, I'm just pointing out
a weakness in it. Althought the developers probably know already that, i
thought this is interesting.

Generated asm code follows:

ntt_mul_1:
        subl $28,%esp
        pushl %ebp
        pushl %edi
        pushl %esi
        pushl %ebx
        movl 48(%esp),%ecx
        movl 52(%esp),%eax
        xorl %edx,%edx
        movl %eax,24(%esp)
        movl %edx,28(%esp)
        mull %ecx
        movl 28(%esp),%esi
        imull %ecx,%esi
        xorl %ebx,%ebx
        movl %eax,%edi
        movl 24(%esp),%eax
        imull %ebx,%eax
        movl %edx,%ebp
        addl %esi,%ebp
        addl %eax,%ebp
        shrdl $1,%ebp,%edi
        shrl $1,%ebp
        movl %edi,%eax
        popl %ebx
        popl %esi
        popl %edi
        popl %ebp
        addl $28,%esp
        ret

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019