Mail Archives: djgpp/1995/03/22/02:06:08
>>>>> Stephen Turnbull <turnbull AT shako DOT sk DOT tsukuba DOT ac DOT jp> writes:
> /*-----*//* fast move s[0:n-1]=t[0:n-1] */
> void str_cpy(void*s,void*t,int n){
> asm("pushl %esi"); asm("pushl %edi"); asm("cld");
> asm("movl 8(%ebp),%edi"); asm("movl 12(%ebp),%esi");
> asm("movl 16(%ebp),%ecx"); asm("rep"); asm("movsb"); asm("popl %edi");
> asm("popl %esi");}
> /*-----*/
> /* This has given me good service and should run a bit quicker than a C */
> /* version, as it uses the `rep' repeat instruction */
>
>This looks remarkably like memcpy.s in the standard DJGPP library, but
>it doesn't take advantage of a couple of optimizations included in the
>DJGPP distribution version. Why are we reinventing the wheel?
When optimizations are enabled, gcc outputs the following inline code for
memcpy(void *dest, const void *src, size_t cnt) if cnt is a constant:
-----------------------------------------------------------------------
#include <memory.h>
#define COUNT 47
void foo(void){
char dest[COUNT], src[COUNT];
memcpy(dest, src, COUNT);
}
.file "constant.c"
gcc2_compiled.:
___gnu_compiled_c:
.text
.align 4
.globl _foo
_foo:
subl $96,%esp
pushl %edi
pushl %esi
leal 56(%esp),%edi
leal 8(%esp),%esi
cld
movl $11,%ecx
rep
movsl
movsw
movsb
popl %esi
popl %edi
addl $96,%esp
ret
-----------------------------------------------------------------------
I guess this output code is smart enough.
But if cnt is not a constant but a variable:
-----------------------------------------------------------------------
#include <memory.h>
#define COUNT 47
void foo(void){
char dest[COUNT], src[COUNT];
int cnt = COUNT;
memcpy(dest, src, cnt);
}
.file "variable.c"
gcc2_compiled.:
___gnu_compiled_c:
.text
.align 4
.globl _foo
_foo:
subl $96,%esp
leal 48(%esp),%edx
movl %esp,%eax
pushl $47
pushl %eax
pushl %edx
call _memcpy
addl $12,%esp
addl $96,%esp
ret
-----------------------------------------------------------------------
memcpy() won't be compiled as the inline code any more.
This means that optimizing memory/string functions in standard library
is still effective in improving performance of the executable built
with djgpp, even if current version of gcc is capable of generating
smart inline codes for such functions when the number of bytes to be
processed is a constant.
The following code is my own memcpy() written in gas:
---------------------------------------------------------------
.data
.text
.globl _memcpy
.align 4,144
_memcpy:
pushl %esi
movl %edi,%edx
movl 8(%esp),%edi /* dst */
movl 12(%esp),%esi /* src */
movl 16(%esp),%ecx /* cnt */
movl %ecx,%eax /* DWORD move */
shrl $2,%ecx /* ecx / 4 */
andl $3,%eax /* eax % 4 */
cld
rep
movsl
movl %eax,%ecx /* copy remainder */
rep
movsb
popl %esi
movl %edx,%edi
movl 4(%esp),%eax /* return value */
ret
---------------------------------------------------------------
This code uses movsl, and thus somewhat faster than the original memcpy.s.
The drawback is that it might not work correctly if the objects overlap.
But in ANSI C, memcpy() doesn't necessarily guarrantee correct behavior
with overlapping objects. In such case, one should use memmove() instead
(I suppose DJ's original memcpy.s is indeed a memmove() code).
I've ever sent such memory/string function sources written in gas to DJ
looong ago (possibly in Fall, 1991), but he didn't seem to prefer them.
If somebody wants to get gas sources of my memory/string functions (for
14 functions), please let me know.
----
raraki(Ryuichiro Araki)
raraki AT human DOT waseda DOT ac DOT jp
- Raw text -