delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1994/04/07/06:29:08

Date: Thu, 7 Apr 94 11:50:14 +0100
From: buers AT dg1 DOT chemie DOT uni-konstanz DOT de (Dieter Buerssner)
To: dj AT ctron DOT com
Cc: eliz AT is DOT elta DOT co DOT il, djgpp AT sun DOT soe DOT clarkson DOT edu
Subject: Re: memxxx() library functions

   Write a static inline function that uses asm operations.  The ones in
   libc.a won't be inlined.

memcpy will be inlined usually using rep movsl 
(and one movsw and movsb instruction) when compiling with 

    gcc -O2 -fbuiltin

The generated code looks optimal to me. memmove and memset are not 
inlined. I believe a memset alike should be rather easy. I have written
code for a 'memclear' function once, which from memory looks like this 
(this doesn't check for valid n, which must be >= 4 here).

__inline__ static void 
memclear(void *x, int n)
{
    __asm__ ("shrl $1, %%ecx \n"
             "jnc 1f         \n" 
             "stosb          \n"
             "1:             \n" 
             "shrl $1, %%ecx \n"
             "jnc 2f         \n"
             "stosw          \n"
             "2:             \n"
             "rep; stosl       " : : "D" (x), "c" (n), "a" (0) : "di", "cx");
} 

Using memcpy in similar situations, and the length is a compile time constant
gcc will produce code without the shifts and jump instructions.
DJ, why is in the specs file the default -fno-builtin ?
I found that using -fbuiltin can speed up code up a lot. The builtin
memcpy is much faster than all the tricks with manual loop-unrolling



   Dieter


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019