delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/2000/02/25/00:04:09

Sender: law AT sgi DOT com
Message-ID: <38B60A5F.D83E8C12@sgi.com>
Date: Thu, 24 Feb 2000 20:51:43 -0800
From: Linda Walsh <law AT sgi DOT com>
X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: pgcc AT delorie DOT com
Subject: Re: short add stuff
References: <38B426AF DOT 280BF1C0 AT sgi DOT com> <20000224173809 DOT A32390 AT hq DOT alert DOT sk>
Reply-To: pgcc AT delorie DOT com

This was my test routine:
short_add()
{
        static short int i,j,k,l;
        int loop;
        i=0;j=1;k=2;l=3;
        for (loop=0; ++loop<=10000;){
                i=j+k;
                j+=l;
                i+=2;
                j+=i;
        }
}
----
	So all the add's are short ints.

I dump the assembly code and get these results on SuSE:
Reading specs from /usr/lib/gcc-lib/i486-linux/egcs-2.91.66/specs
gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release) 

in the default, 486 version, I get .align 16's and use of movw,
for example, for i=j+k, I get:

movw	j->ax
movw k->dx
addl edx+ecx->ecx
movw cx->i

So it performs long addition but uses a word move.

If I set the -m386, I get a .4 alignment (expected) and the same
instructions.

If I set the -mpentium, I get a .4 alignment (unexpected) and the
same instructions.

If I set the -mpentiumpro, I get .4 alignment (unexpected) and the
use of the movzwl for the first 2 moves above.

--------
On Mandrake:
Reading specs from /usr/lib/gcc-lib/i586-mandrake-linux/2.95.2/specs
gcc version 2.95.2 19991024 (release)    
For pentium, I get the .4 alignment and the movw.
386: .align 4, and the movzwl (?!)
486: .align 16, and back to the 'movw'
pentiumpro: same as 386, .4 and movzwl
---
From RH:Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/specs
gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)   
386: .align 4, movw
486: .align 16 movw
pent: .align 4, movw
pentpro: align 4, movzwl

---
	My timing tests were all on a P-III.  When optimizing for
the pentpro, it *slowed* down by 25%.  So it appears the "movzwl"
instruction is slower on a PIII and maybe a PII.  It's hard to imagine
a movw, which does less actually being slower. in any circumstance.

	So is the 486 the only processor that required a .16 alignment?

	That "movzwl" for 386 on Mandrake would seem to be wrong.


Martin Ockajak wrote:

> xorl %reg0,%reg0
> movw disp(%reg0),%reg1
> 
> or single
> 
> movzwl disp(%reg0),%reg1
---
	So if you notice, no xor's are needed, so zeroing the high
portion would seem to be just a waste of time unless the movzwl is actually
faster than a movw (?).


> Surely not on Pentium.
---
	But if the xor isn't needed?

I'm just upset that the aim9 benchmark wasn't faster across the board
with pentiumpro optimization on a PIII or a PII -- I checked both and
the "-mpentium" optimization slowed down short integer addition
significantly over the -m486, so I'm just trying to track down the
source of the problem.

-linda

-- 
Linda Walsh @ SGI                | Core Linux - Trust Technology 
1200 Crittenden Lane MS:30-3-802 | Voice: (650) 933-5338
Mountain View, CA 94043          | Email: law AT sgi DOT com

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019