delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/2000/05/09/14:14:33

Sender: wolfi AT netsurf213 DOT neuss DOT netsurf DOT de
Message-ID: <391856CE.6BE0606B@neuss.netsurf.de>
Date: Tue, 09 May 2000 20:19:58 +0200
From: Wolfgang Formann <w DOT formann AT netsurf213 DOT neuss DOT netsurf DOT de>
X-Mailer: Mozilla 4.6 [en] (X11; I; Linux 2.2.8 i586)
X-Accept-Language: German, de, en
MIME-Version: 1.0
To: pgcc AT delorie DOT com
Subject: Re: pgcc and egcs alignment -- function, basic block and string
References: <20000130211158 DOT D641 AT cerebro DOT laendle> <Pine DOT LNX DOT 4 DOT 21 DOT 0002022017450 DOT 16833-100000 AT hq DOT alert DOT sk> <20000203131955 DOT D12247 AT atrey DOT karlin DOT mff DOT cuni DOT cz> <389C6000 DOT 5B79248 AT neuss DOT netsurf DOT de> <3917AF5A DOT FF5C82B2 AT neuss DOT netsurf DOT de> <20000509131501 DOT B27958 AT atrey DOT karlin DOT mff DOT cuni DOT cz>
Reply-To: pgcc AT delorie DOT com

Jan,

sure not, I changed the sequence (output from gas)
 116 0080 0FB6CE        movzbl  %dh,        %ecx
 117
 118 0083 333C9D00      xorl    some_mem(,%ebx,4),%edi
 118      180000
 119 008a 333C8D00      xorl    some_mem(,%ecx,4),%edi
 119      0C0000
 120 0091 88D3          movb    %dl,        %bl
to
 116 0080 88F1          movb    %dh,        %cl
 117
 118 0082 333C9D00      xorl    some_mem(,%ebx,4),%edi
 118      180000
 119 0089 333C8D00      xorl    some_mem(,%ecx,4),%edi
 119      0C0000
 120 0090 0FB6DA        movzbl  %dl,        %ebx

nothing else was done, this gave a speedup of about 2 cpu-cycles.
Well, one cycle might be caused by other things like different
parallelism caused by the first change.

I also tried to insert a single nop before the start of the loop,
causing every possible instruction to cross the 16-byte alignment.
Again, two additional cycles (the loop is executed about 400 times,
so I think I can totally ignore the addition time to decode/execute
the nop instruction). 

Since nothing else was changes, even the overall code length is
the same, and the times are reproduceable, the only reason is that
Athlon has the same problems as my good old K6 :-( Or at least
similar problems.

Wolfgang

Jan Hubicka wrote:
> 
> > Jan,
> >
> > seems to be the same with Athlon, at least with this one
> > vendor_id       : AuthenticAMD
> > cpu family      : 6
> > model           : 2
> > model name      : AMD Athlon(tm) Processor
> > stepping        : 1
> > cpu MHz         : 698.660058
> >
> > here again, I got some speedups when I rearranged the code to have no
> > instructions crossing any 16byte border.
> OK. I wilask AMD about this issue. Alex from AMD claims, that Athlon donīt
> have such problems. It is well possible the the speedups are caused by some
> accidental change elsewhere...
> 
> Honya
> >

- Raw text -


  webmaster     delorie software   privacy  
  Copyright Đ 2019   by DJ Delorie     Updated Jul 2019