Mail Archives: djgpp/1994/04/08/17:38:02
Clearly, I have too much spare time. :-)
I ran a quick-and-skanky benchmark comparing the library memxxx() routines,
naive byte-by-byte implementations of them, and Duff's Device (unrolled)
versions of them.
The library version of memcpy() is 60% faster than the Duff's Device
implementation, which is in turn about 17% faster than naive byte-by-byte.
That gives you some idea of how good the library routines are. :-)
On a related note, I'm curious about the Intel architecture. Specifically,
I'd like to know:
a) Does it have odd-address access restrictions?
b) Are accesses on longword (4 byte) boundaries faster than word bounds?
b) would probably make it beneficial to write a somewhat more complex,
optimized version of memset() and memcpy(), if the library versions of
those routines currently work by byte accesses. If they work by some
block-move instruction, then give up; it's already optimal. :-)
-- chris tate
fixer AT faxcsl DOT dcrt DOT nih DOT gov
- Raw text -