Date: Tue, 20 Feb 2001 21:58:44 +0200 (EET) From: Tuukka Toivonen To: Nick Kurshev cc: "pgcc AT delorie DOT com" Subject: Re: Re: Probably pgcc-2.95.2.1 does not optimized propertly? In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Reply-To: pgcc AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: pgcc AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk On Tue, 20 Feb 2001, Nick Kurshev wrote: > Well, I did my own investigation and results say that you are wrong, please see below: Just for your amusement, I made some tests using my own timing code (with rdtsc and rdpmc) too. The CPU is AMD Athlon 800 MHz (cool CPU but not too good documentation). This is on Linux 2.4.0, even if it doesn't really matter... The compiler is AthlonGCC with arguments -O3 -fomit-frame-pointer -mathlon -mcpu=athlon -march=athlon -malign-functions=4 -funroll-loops -fexpensive-optimizations -malign-double -fschedule-insns2 -mwide-multiply > This code tested PADDB instruction > a) non MMX version of code: > "movb (%2), %%dl\n" > "addb %%dl, (%2)\n" Could be done 4 bytes parallel using the usual 32-bit registers, but then it shouldn't overflow... My thoughts: xor eax,eax mov [var],eax is better for code cache but worse for register pressure than mov dword[var],0 so it probably depends on context which one is better. > P.S.: All tests I did with using of my own project BIEW that can be found at http://biew.sourceforge.net. I made my tests using my own ugly code. Available for request along with a patch against Linux 2.4.0 to enable rdpmc instruction (that bit in cr4...) And the results: /* A function must save registers: EBX,ESI,EDI * Arguments are passed in: EAX,EDX,ECX */ /* Empty call: 5 clocks. This is substracted from the following benches below */ void benchtest(void) { } /* 1 clock */ void benchtest(void) { asm volatile( "movd %eax, %mm0\n" "movd %mm0, %eax\n" ); } /* 0 clocks. Perfect parallelism! */ void benchtest(void) { asm volatile( "movl %edi, %eax\n" "movl %eax, %edi\n" ); } /* 2 clocks */ void benchtest(void) { asm volatile( "pushl %edi\n" "popl %edi\n" ); } /* 1 clock */ int x1,x2,x3,x4,x5; void benchtest(void) { x1 = 0; /* Generates: movl $0,x1 */ x2 = 0; /* movl $0,x2 */ x3 = 0; /* movl $0,x3 */ x4 = 0; /* movl $0,x4 */ x5 = 0; /* movl $0,x5 */ } /* 1 clock, equally fast to above one */ int x1,x2,x3,x4,x5; void benchtest(void) { asm volatile( "xorl %eax, %eax\n" "movl %eax, x1\n" "movl %eax, x2\n" "movl %eax, x3\n" "movl %eax, x4\n" "movl %eax, x5\n" ); }