Mail Archives: djgpp/1998/12/30/07:17:19
For along time I believed that string operations (rep stosl; rep movsl)
were the fastest methods to write to memory blocks untill I heard that a
Pentium can execute two instructions simultaneously. So I realized that
there are better methods to move memory blocks !
" rep stosl " : takes 3 clock cycles on a Pentium
asm("1:\n\t"
"movl (%%ebx),%%eax\n\t" /*pairable in U-pipe */
"addl $4,%%ebx\n\t" /*pairbale in V-pipe */
"decl %%ecx\n\t" /*pairable in U-pipe */
"jnz 1b": /*pairbale in V-pipe */
:"a"(55/*any value
*/),"c"((40*1024*1024)>>2),"b"(memory)
:"%ecx","%ebx");
This takes only 2 clock cycles !
To test that, I allocated a buffer of 40 Mb. First I used memset, it
took 690000 microseconds to fill the memory-block.
Then I wrote it in assembler ( just to be sure) with stosl and it took
the same time (how surprising ).
And then I wrote the code above and now it took only approximately
426000 microseconds to fill the memory-block !!
That is approximate the same ratio like 3 clock cycles to 2 clock
cycles.
So how about a new optimation-switch in djgpp, called pairable
instructions ? After all it can often double the speed of the program.
I can also be used to improve graphic-performence, can't it ?
- Raw text -