From: "Salvador Eduardo Tropea (SET)" To: djgpp-workers AT delorie DOT com Subject: memmove v.s. memcpy, trying to solve Date: Wed, 16 Oct 1996 16:23:12 +0300 (GMT) Message-ID: <9610161623.aa08018@ailin.inti.edu.ar> Hi Workers, here SET from Argentina: I'm working with Robert to enhance the RHIDE, or to make crazy to Robert I don't know, anyways ... yesterday Robert points me that memmove is very slow in comparisson to memcpy, I replayed (aprox.): "Why? the only difference between memmove and memcpy is a comparisson at the start to select the direction of copy." But he show me a test that shows that memmove is 2.5 times slower than memcpy in a 486 (I tested this on my 5x86 and is 1.74 times slower). I saw the sources and I discovered the reason: memmove EVER moves bytes, on the other hand memcpy uses the smart routine ___dj_movedata. I don't know if that's fixed in the 2.01 alpha but if not here is a try to make memmove similar, in terms of speed to memcpy: ------------------- memmove.s ------------------------- /* Copyright (C) 1995 DJ Delorie, see COPYING.DJ for details */ .file "memmove.s" .globl _memmove _memmove: pushl %ebp movl %esp,%ebp pushl %esi pushl %edi movl 8(%ebp),%edi movl 12(%ebp),%esi movl 16(%ebp),%ecx jecxz L2 cmpl %esi,%edi jb L3 call ___dj_movedata_rev jmp L2 L3: call ___dj_movedata L2: cld popl %edi popl %esi movl 8(%ebp),%eax leave ret ------------------- end of memmove.s ------------------------- ------------------- djmdr.s ------------------- /* Copyright (C) 1995 DJ Delorie, see COPYING.DJ for details */ /* Modified by SET to copy in reverse order */ # This routine moves %ecx bytes from %ds:%esi to %es:%edi. It clobbers # %eax, %ecx, %esi, %edi, and eflags. .file "djmdr.s" .text .align 4 .globl ___dj_movedata_rev ___dj_movedata_rev: std # Add the counter to the index addl %ecx,%edi addl %ecx,%esi decl %esi decl %edi cmpl $15,%ecx jle small_move jmp mod_4_check # Transfer bytes until either %esi or %edi is aligned % 3 align_mod_4: movsb decl %ecx mod_4_check: movl %esi,%eax andl $3,%eax cmpl $3,%eax jz big_move movl %edi,%eax andl $3,%eax cmpl $3,%eax jnz align_mod_4 big_move: movb %cl,%al # We will store leftover count in %al shrl $2,%ecx andb $3,%al # Now retrocess the index 3 positions subl $3,%edi subl $3,%esi rep movsl # %ecx known to be zero here, so insert the leftover count in %al movb %al,%cl # advance the index by 3 addl $3,%edi addl $3,%esi small_move: rep movsb ret ------------------- end of djmdr.s ------------------- I tested it and *seems* to work fine, but: a) I'm not sure. b) May be can be optimized. Hope this help and can be included in the 2.01. SET ******************************************************************************** Salvador Eduardo Tropea (SET) - salvador AT inti DOT edu DOT ar Work: INTI (National Institute of Industrial Technology) Sector: ICE (Electronic Control & Instrumentation) Post (Home): Curapaligue 2124 - Caseros (1678)- Buenos Aires - Argentina