Mail Archives: djgpp/1995/03/22/03:03:29
raraki writes:
The following code is my own memcpy() written in gas:
---------------------------------------------------------------
.data
.text
.globl _memcpy
.align 4,144
_memcpy:
pushl %esi
movl %edi,%edx
movl 8(%esp),%edi /* dst */
movl 12(%esp),%esi /* src */
movl 16(%esp),%ecx /* cnt */
movl %ecx,%eax /* DWORD move */
shrl $2,%ecx /* ecx / 4 */
andl $3,%eax /* eax % 4 */
cld
rep
movsl
movl %eax,%ecx /* copy remainder */
rep
movsb
popl %esi
movl %edx,%edi
movl 4(%esp),%eax /* return value */
ret
This is much better, but what if %esi and %edi are not aligned %4?
Every single transfer might have an unaligned load and an unaligned
store, which is slow.
I fixed this in the memcpy and movedata for the current V2 alpha.
They do movsb's until either %esi or %edi is long-aligned before doing
movsl's (and hopefully both are aligned then). The code checks for
small moves right away and just use movsb for them, skipping the
alignment overhead.
For what it's worth, I also modified memset to do aligned stosl's when
possible.
-Mat
- Raw text -