Mail Archives: pgcc/2000/05/09/14:14:33
Jan,
sure not, I changed the sequence (output from gas)
116 0080 0FB6CE movzbl %dh, %ecx
117
118 0083 333C9D00 xorl some_mem(,%ebx,4),%edi
118 180000
119 008a 333C8D00 xorl some_mem(,%ecx,4),%edi
119 0C0000
120 0091 88D3 movb %dl, %bl
to
116 0080 88F1 movb %dh, %cl
117
118 0082 333C9D00 xorl some_mem(,%ebx,4),%edi
118 180000
119 0089 333C8D00 xorl some_mem(,%ecx,4),%edi
119 0C0000
120 0090 0FB6DA movzbl %dl, %ebx
nothing else was done, this gave a speedup of about 2 cpu-cycles.
Well, one cycle might be caused by other things like different
parallelism caused by the first change.
I also tried to insert a single nop before the start of the loop,
causing every possible instruction to cross the 16-byte alignment.
Again, two additional cycles (the loop is executed about 400 times,
so I think I can totally ignore the addition time to decode/execute
the nop instruction).
Since nothing else was changes, even the overall code length is
the same, and the times are reproduceable, the only reason is that
Athlon has the same problems as my good old K6 :-( Or at least
similar problems.
Wolfgang
Jan Hubicka wrote:
>
> > Jan,
> >
> > seems to be the same with Athlon, at least with this one
> > vendor_id : AuthenticAMD
> > cpu family : 6
> > model : 2
> > model name : AMD Athlon(tm) Processor
> > stepping : 1
> > cpu MHz : 698.660058
> >
> > here again, I got some speedups when I rearranged the code to have no
> > instructions crossing any 16byte border.
> OK. I wilask AMD about this issue. Alex from AMD claims, that Athlon donīt
> have such problems. It is well possible the the speedups are caused by some
> accidental change elsewhere...
>
> Honya
> >
- Raw text -