Mail Archives: pgcc/2001/02/20/14:59:09
On Tue, 20 Feb 2001, Nick Kurshev wrote:
> Well, I did my own investigation and results say that you are wrong, please see below:
Just for your amusement, I made some tests using my own timing code (with
rdtsc and rdpmc) too. The CPU is AMD Athlon 800 MHz (cool CPU but not too
good documentation).
This is on Linux 2.4.0, even if it doesn't really matter...
The compiler is AthlonGCC with arguments
-O3 -fomit-frame-pointer -mathlon -mcpu=athlon -march=athlon
-malign-functions=4 -funroll-loops -fexpensive-optimizations
-malign-double -fschedule-insns2 -mwide-multiply
> This code tested PADDB instruction
> a) non MMX version of code:
> "movb (%2), %%dl\n"
> "addb %%dl, (%2)\n"
Could be done 4 bytes parallel using the usual 32-bit registers, but then
it shouldn't overflow...
My thoughts:
xor eax,eax
mov [var],eax
is better for code cache but worse for register pressure than
mov dword[var],0
so it probably depends on context which one is better.
> P.S.: All tests I did with using of my own project BIEW that can be found at http://biew.sourceforge.net.
I made my tests using my own ugly code. Available for request along with a
patch against Linux 2.4.0 to enable rdpmc instruction (that bit in cr4...)
And the results:
/* A function must save registers: EBX,ESI,EDI
* Arguments are passed in: EAX,EDX,ECX
*/
/* Empty call: 5 clocks. This is substracted from the following benches below */
void benchtest(void) {
}
/* 1 clock */
void benchtest(void) {
asm volatile(
"movd %eax, %mm0\n"
"movd %mm0, %eax\n"
);
}
/* 0 clocks. Perfect parallelism! */
void benchtest(void) {
asm volatile(
"movl %edi, %eax\n"
"movl %eax, %edi\n"
);
}
/* 2 clocks */
void benchtest(void) {
asm volatile(
"pushl %edi\n"
"popl %edi\n"
);
}
/* 1 clock */
int x1,x2,x3,x4,x5;
void benchtest(void) {
x1 = 0; /* Generates: movl $0,x1 */
x2 = 0; /* movl $0,x2 */
x3 = 0; /* movl $0,x3 */
x4 = 0; /* movl $0,x4 */
x5 = 0; /* movl $0,x5 */
}
/* 1 clock, equally fast to above one */
int x1,x2,x3,x4,x5;
void benchtest(void) {
asm volatile(
"xorl %eax, %eax\n"
"movl %eax, x1\n"
"movl %eax, x2\n"
"movl %eax, x3\n"
"movl %eax, x4\n"
"movl %eax, x5\n"
);
}
- Raw text -