Mail Archives: djgpp/2000/03/08/22:30:41
salvador) wrote:
>K6 CPUs have a "bug" related to aligment. If some memory address is in a
>0xNNNNNC, you'll have a big penalty to read it. 0, 4 and 8 are ok,
>but C is the worst case (by far), double check you are not hiting this
>limitation.
Do you mean code alignment, data alignment or both?
Anyway, I edited the gcc -O2 -S output of the slower running version
of my program (with const), changed the .p2align 2 statements to
.p2align 4 (16 byte), for zseed, mul and mwc32 (I think these
are all data and code alignments that could contribute to the
large performance difference), and recompiled. The program
was ran faster, but there was still an order of magnitude difference
between the const and the non const version.
I also double checked the alignments with fsdb and objdump (Thanks to
Hans-Bernhard Broeker, for pointing the objdump method out to me).
zseed, mul and mwc32 were 16 byte aligned.
If you have the time and the interest, please try to compile the
source I sent and run the executable. It should take less than five
minutes.
Regards,
Dieter
- Raw text -