Date: Thu, 7 Apr 94 11:50:14 +0100 From: buers AT dg1 DOT chemie DOT uni-konstanz DOT de (Dieter Buerssner) To: dj AT ctron DOT com Cc: eliz AT is DOT elta DOT co DOT il, djgpp AT sun DOT soe DOT clarkson DOT edu Subject: Re: memxxx() library functions Write a static inline function that uses asm operations. The ones in libc.a won't be inlined. memcpy will be inlined usually using rep movsl (and one movsw and movsb instruction) when compiling with gcc -O2 -fbuiltin The generated code looks optimal to me. memmove and memset are not inlined. I believe a memset alike should be rather easy. I have written code for a 'memclear' function once, which from memory looks like this (this doesn't check for valid n, which must be >= 4 here). __inline__ static void memclear(void *x, int n) { __asm__ ("shrl $1, %%ecx \n" "jnc 1f \n" "stosb \n" "1: \n" "shrl $1, %%ecx \n" "jnc 2f \n" "stosw \n" "2: \n" "rep; stosl " : : "D" (x), "c" (n), "a" (0) : "di", "cx"); } Using memcpy in similar situations, and the length is a compile time constant gcc will produce code without the shifts and jump instructions. DJ, why is in the specs file the default -fno-builtin ? I found that using -fbuiltin can speed up code up a lot. The builtin memcpy is much faster than all the tricks with manual loop-unrolling Dieter