Date: Thu, 7 Apr 94 11:50:14 +0100
From: buers AT dg1 DOT chemie DOT uni-konstanz DOT de (Dieter Buerssner)
To: dj AT ctron DOT com
Cc: eliz AT is DOT elta DOT co DOT il, djgpp AT sun DOT soe DOT clarkson DOT edu
Subject: Re: memxxx() library functions


   Write a static inline function that uses asm operations.  The ones in
   libc.a won't be inlined.

memcpy will be inlined usually using rep movsl 
(and one movsw and movsb instruction) when compiling with 

    gcc -O2 -fbuiltin

The generated code looks optimal to me. memmove and memset are not 
inlined. I believe a memset alike should be rather easy. I have written
code for a 'memclear' function once, which from memory looks like this 
(this doesn't check for valid n, which must be >= 4 here).

__inline__ static void 
memclear(void *x, int n)
{
    __asm__ ("shrl $1, %%ecx \n"
             "jnc 1f         \n" 
             "stosb          \n"
             "1:             \n" 
             "shrl $1, %%ecx \n"
             "jnc 2f         \n"
             "stosw          \n"
             "2:             \n"
             "rep; stosl       " : : "D" (x), "c" (n), "a" (0) : "di", "cx");
} 

Using memcpy in similar situations, and the length is a compile time constant
gcc will produce code without the shifts and jump instructions.
DJ, why is in the specs file the default -fno-builtin ?
I found that using -fbuiltin can speed up code up a lot. The builtin
memcpy is much faster than all the tricks with manual loop-unrolling


   Dieter