Message-ID: <368A195D.F315167E@gmx.de> Date: Wed, 30 Dec 1998 13:15:25 +0100 From: Christian Hofrichter X-Mailer: Mozilla 4.5 [de]C-CCK-MCD QXW03201 (Win95; I) X-Accept-Language: de,en MIME-Version: 1.0 To: djgpp AT delorie DOT com Subject: pairable instructions much faster than the string operations on a Pentium and above ?! Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Reply-To: djgpp AT delorie DOT com For along time I believed that string operations (rep stosl; rep movsl) were the fastest methods to write to memory blocks untill I heard that a Pentium can execute two instructions simultaneously. So I realized that there are better methods to move memory blocks ! " rep stosl " : takes 3 clock cycles on a Pentium asm("1:\n\t" "movl (%%ebx),%%eax\n\t" /*pairable in U-pipe */ "addl $4,%%ebx\n\t" /*pairbale in V-pipe */ "decl %%ecx\n\t" /*pairable in U-pipe */ "jnz 1b": /*pairbale in V-pipe */ :"a"(55/*any value */),"c"((40*1024*1024)>>2),"b"(memory) :"%ecx","%ebx"); This takes only 2 clock cycles ! To test that, I allocated a buffer of 40 Mb. First I used memset, it took 690000 microseconds to fill the memory-block. Then I wrote it in assembler ( just to be sure) with stosl and it took the same time (how surprising ). And then I wrote the code above and now it took only approximately 426000 microseconds to fill the memory-block !! That is approximate the same ratio like 3 clock cycles to 2 clock cycles. So how about a new optimation-switch in djgpp, called pairable instructions ? After all it can often double the speed of the program. I can also be used to improve graphic-performence, can't it ?