delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1998/12/30/07:17:19

Message-ID: <368A195D.F315167E@gmx.de>
Date: Wed, 30 Dec 1998 13:15:25 +0100
From: Christian Hofrichter <ChristianHofrichter AT gmx DOT de>
X-Mailer: Mozilla 4.5 [de]C-CCK-MCD QXW03201 (Win95; I)
X-Accept-Language: de,en
MIME-Version: 1.0
To: djgpp AT delorie DOT com
Subject: pairable instructions much faster than the string operations on a
Pentium and above ?!
Reply-To: djgpp AT delorie DOT com

For along time I believed that string operations (rep stosl; rep movsl)
were the fastest methods to write to memory blocks untill I heard that a
Pentium can execute two instructions simultaneously. So I realized that
there are better methods to move memory blocks !

" rep stosl " : takes 3 clock cycles on a Pentium


asm("1:\n\t"
       "movl (%%ebx),%%eax\n\t" /*pairable in U-pipe */
       "addl   $4,%%ebx\n\t"         /*pairbale in V-pipe  */
       "decl   %%ecx\n\t"               /*pairable in U-pipe */
       "jnz 1b":                           /*pairbale in V-pipe  */
                     :"a"(55/*any value
*/),"c"((40*1024*1024)>>2),"b"(memory)
                     :"%ecx","%ebx");
This takes only 2 clock cycles !


To test that, I allocated a buffer of 40 Mb. First I used memset, it
took 690000 microseconds to fill the memory-block.
Then I wrote it in assembler ( just to be sure) with stosl and it took
the same time (how surprising ).
And then I wrote the code above and now it took only approximately
426000 microseconds to fill the memory-block !!
That is approximate the same ratio like 3 clock cycles to 2 clock
cycles.

So how about a new optimation-switch in djgpp, called pairable
instructions ? After all  it can often double the speed of the program.
I can also be used to improve graphic-performence, can't it ?


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019