Mail Archives: djgpp/1997/11/30/21:31:11

delorie.com/archives/browse.cgi

search

Mail Archives: djgpp/1997/11/30/21:31:11

From: leathm AT solwarra DOT gbrmpa DOT gov DOT au (Leath Muller)

Message-Id: <199712010225.MAA16617@solwarra.gbrmpa.gov.au>

Subject: Re: 32bit memcpy function? _NEW_ Tried FPU memcpy (problem with CWSDPMI)

To: xmerm05 AT manes DOT vse DOT cz (Michal Mertl)

Date: Mon, 1 Dec 1997 12:25:19 +1000 (EST)

Cc: djgpp AT delorie DOT com

In-Reply-To: <Pine.ULT.3.95.971127181022.831A-100000@dec5.vse.cz> from "Michal Mertl" at Nov 27, 97 06:17:05 pm

> Other thing is that I tried to write memcpy using 64bit FPU registers as
> someone here suggested. It's about _20% faster_!!

If you know what the src values are and know they won't produce errors,
you can speed the code up even more by using the normal FP values, ie:

	fldl	src
	fldl	src + 8
	fldl	src + 16
		...
	fxch	st8, st0	
	fstpl	dest + ...
	fstpl	dest + ...

etc...

Which is 3 cycles per iteration...
 
> _LoopPoint:
>         fildq    (%%eax,%%ecx)
>         fistpq   (%%ebx,%%ecx)

Have you tried unrolling this more? The fistpq right after the fildq
(IIRC) causes a stall which can be prevented by unrolling out...

	fildq   src
	fildq   src + 8
	   ...
	fildq   src + 56
	fxch    st8, st0
	fistpq  dest
	fistpq  dest + 56

etc
	
Note: the rest of your code becomes simpler too as you don't have to worry
about adding registers to attain offsets etc...

> Interesting thing is that is run only 10-12% faster with cwsdpmi r3 and r4 but
> with pmode (1.2), cwsdpr0 (both r3 and r4), qdpmi (1.1 form QEMM 8.0) run the
> cpu code faster. The normal memcpy is about the same.

Using proper fld/fstp instructions you can do something like 64 byte moves
in around 24 (I think) cycles (not considering cache hits). I used it to clear
memory buffers (such as floating point Z-buffers) which were very fast in
SW. It just means keeping a small 64-byte zero'ed memory region which could
be used to fld/fstp at the frame buffer memory location...

Leathal.

- Raw text -

webmaster	delorie software privacy
Copyright © 2019 by DJ Delorie	Updated Jul 2019

From:	leathm AT solwarra DOT gbrmpa DOT gov DOT au (Leath Muller)
Message-Id:	<199712010225.MAA16617@solwarra.gbrmpa.gov.au>
Subject:	Re: 32bit memcpy function? _NEW_ Tried FPU memcpy (problem with CWSDPMI)
To:	xmerm05 AT manes DOT vse DOT cz (Michal Mertl)
Date:	Mon, 1 Dec 1997 12:25:19 +1000 (EST)
Cc:	djgpp AT delorie DOT com
In-Reply-To:	<Pine.ULT.3.95.971127181022.831A-100000@dec5.vse.cz> from "Michal Mertl" at Nov 27, 97 06:17:05 pm