Mail Archives: djgpp/1994/04/07/06:29:08
Write a static inline function that uses asm operations. The ones in
libc.a won't be inlined.
memcpy will be inlined usually using rep movsl
(and one movsw and movsb instruction) when compiling with
gcc -O2 -fbuiltin
The generated code looks optimal to me. memmove and memset are not
inlined. I believe a memset alike should be rather easy. I have written
code for a 'memclear' function once, which from memory looks like this
(this doesn't check for valid n, which must be >= 4 here).
__inline__ static void
memclear(void *x, int n)
{
__asm__ ("shrl $1, %%ecx \n"
"jnc 1f \n"
"stosb \n"
"1: \n"
"shrl $1, %%ecx \n"
"jnc 2f \n"
"stosw \n"
"2: \n"
"rep; stosl " : : "D" (x), "c" (n), "a" (0) : "di", "cx");
}
Using memcpy in similar situations, and the length is a compile time constant
gcc will produce code without the shifts and jump instructions.
DJ, why is in the specs file the default -fno-builtin ?
I found that using -fbuiltin can speed up code up a lot. The builtin
memcpy is much faster than all the tricks with manual loop-unrolling
Dieter
- Raw text -