Mail Archives: pgcc/2001/03/22/12:32:08
On Tue, 20 Mar 2001, Tuukka Toivonen wrote:
> I have a straightforward piece of code that needs to be well
> optimized. Since it's VERY straightforward, I'd suppose gcc not having
> problems with it. However, all versions I tested (egcs 1.1.2, pgcc-2.95.2
> 19991024, AthlonGCC) have the same thing that looks very much like it's
... I haven't been able to remove the useless memory accesses, but I found
something interesting (and bad) concerning all the compilers mentioned
above.
With -O2 the compiler generates 76% slower code than with -O1. More
specifically, the problem is -fregmove. Whenever I add that after -O, the
code runs *much* slower.
I can't see any significant difference in the generated assembly code. The
instructions are roughly just in different places and with different
registers. I guess it hurts scheduling somehow.
This is AMD Athlon 800 MHz. I can provide example source (~40k) for
request.
Here's short piece of example code (cutted from a longer piece, I'm not
sure if it actually is good example since I haven't timed it separately
but the rest of the code is very similar):
(the code marked with - is the fast version, + is the slow version)
a=ntt_block_p[3]; b=ntt_block_p[19]; ntt_block_p[3]=a+b; ntt_block_p[19]=ntt_2n(a-b,22)-ntt_2n(a-b,10);
- 804a7a3: 8b 71 0c mov 0xc(%ecx),%esi
- 804a7a6: 8b 79 4c mov 0x4c(%ecx),%edi
- 804a7a9: 8d 0c 37 lea (%edi,%esi,1),%ecx
- 804a7ac: 89 4d d8 mov %ecx,0xffffffd8(%ebp)
- 804a7af: 8b 5d f8 mov 0xfffffff8(%ebp),%ebx
- 804a7b2: 89 4b 0c mov %ecx,0xc(%ebx)
- 804a7b5: 89 f2 mov %esi,%edx
- 804a7b7: 29 fa sub %edi,%edx
- 804a7b9: 89 d6 mov %edx,%esi
- 804a7bb: c1 e6 16 shl $0x16,%esi
- 804a7be: 81 e6 ff ff ff 00 and $0xffffff,%esi
- 804a7c4: 89 d0 mov %edx,%eax
- 804a7c6: c1 f8 02 sar $0x2,%eax
- 804a7c9: 29 c6 sub %eax,%esi
- 804a7cb: 89 d0 mov %edx,%eax
- 804a7cd: c1 e0 0a shl $0xa,%eax
- 804a7d0: 25 ff ff ff 00 and $0xffffff,%eax
- 804a7d5: c1 fa 0e sar $0xe,%edx
- 804a7d8: 29 d0 sub %edx,%eax
- 804a7da: 29 c6 sub %eax,%esi
- 804a7dc: 89 75 d4 mov %esi,0xffffffd4(%ebp)
- 804a7df: 89 73 4c mov %esi,0x4c(%ebx)
+ 804a88d: 8b 73 0c mov 0xc(%ebx),%esi
+ 804a890: 8b 7b 4c mov 0x4c(%ebx),%edi
+ 804a893: 8d 0c 37 lea (%edi,%esi,1),%ecx
+ 804a896: 89 4d dc mov %ecx,0xffffffdc(%ebp)
+ 804a899: 89 4b 0c mov %ecx,0xc(%ebx)
+ 804a89c: 89 f2 mov %esi,%edx
+ 804a89e: 29 fa sub %edi,%edx
+ 804a8a0: 89 55 d8 mov %edx,0xffffffd8(%ebp)
+ 804a8a3: c1 65 d8 16 shll $0x16,0xffffffd8(%ebp)
+ 804a8a7: c6 45 db 00 movb $0x0,0xffffffdb(%ebp)
+ 804a8ab: 89 d0 mov %edx,%eax
+ 804a8ad: c1 f8 02 sar $0x2,%eax
+ 804a8b0: 29 45 d8 sub %eax,0xffffffd8(%ebp)
+ 804a8b3: 89 d0 mov %edx,%eax
+ 804a8b5: c1 e0 0a shl $0xa,%eax
+ 804a8b8: 25 ff ff ff 00 and $0xffffff,%eax
+ 804a8bd: c1 fa 0e sar $0xe,%edx
+ 804a8c0: 29 d0 sub %edx,%eax
+ 804a8c2: 29 45 d8 sub %eax,0xffffffd8(%ebp)
+ 804a8c5: 8b 4d d8 mov 0xffffffd8(%ebp),%ecx
+ 804a8c8: 89 4b 4c mov %ecx,0x4c(%ebx)
- Raw text -