Mail Archives: pgcc/2000/02/25/00:04:09
This was my test routine:
short_add()
{
static short int i,j,k,l;
int loop;
i=0;j=1;k=2;l=3;
for (loop=0; ++loop<=10000;){
i=j+k;
j+=l;
i+=2;
j+=i;
}
}
----
So all the add's are short ints.
I dump the assembly code and get these results on SuSE:
Reading specs from /usr/lib/gcc-lib/i486-linux/egcs-2.91.66/specs
gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)
in the default, 486 version, I get .align 16's and use of movw,
for example, for i=j+k, I get:
movw j->ax
movw k->dx
addl edx+ecx->ecx
movw cx->i
So it performs long addition but uses a word move.
If I set the -m386, I get a .4 alignment (expected) and the same
instructions.
If I set the -mpentium, I get a .4 alignment (unexpected) and the
same instructions.
If I set the -mpentiumpro, I get .4 alignment (unexpected) and the
use of the movzwl for the first 2 moves above.
--------
On Mandrake:
Reading specs from /usr/lib/gcc-lib/i586-mandrake-linux/2.95.2/specs
gcc version 2.95.2 19991024 (release)
For pentium, I get the .4 alignment and the movw.
386: .align 4, and the movzwl (?!)
486: .align 16, and back to the 'movw'
pentiumpro: same as 386, .4 and movzwl
---
From RH:Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/specs
gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)
386: .align 4, movw
486: .align 16 movw
pent: .align 4, movw
pentpro: align 4, movzwl
---
My timing tests were all on a P-III. When optimizing for
the pentpro, it *slowed* down by 25%. So it appears the "movzwl"
instruction is slower on a PIII and maybe a PII. It's hard to imagine
a movw, which does less actually being slower. in any circumstance.
So is the 486 the only processor that required a .16 alignment?
That "movzwl" for 386 on Mandrake would seem to be wrong.
Martin Ockajak wrote:
> xorl %reg0,%reg0
> movw disp(%reg0),%reg1
>
> or single
>
> movzwl disp(%reg0),%reg1
---
So if you notice, no xor's are needed, so zeroing the high
portion would seem to be just a waste of time unless the movzwl is actually
faster than a movw (?).
> Surely not on Pentium.
---
But if the xor isn't needed?
I'm just upset that the aim9 benchmark wasn't faster across the board
with pentiumpro optimization on a PIII or a PII -- I checked both and
the "-mpentium" optimization slowed down short integer addition
significantly over the -m486, so I'm just trying to track down the
source of the problem.
-linda
--
Linda Walsh @ SGI | Core Linux - Trust Technology
1200 Crittenden Lane MS:30-3-802 | Voice: (650) 933-5338
Mountain View, CA 94043 | Email: law AT sgi DOT com
- Raw text -