Mail Archives: djgpp/2000/05/20/07:15:18
Thomas J. Hruska wrote:
>Hello, I am doing that inline ASM thing...again. The situation is that I
>am trying to speed up screen dumps from a buffer to the video buffer using
>far pointers.
It seems, that you are mostly moving big junks of memory at a time.
For this the function movedata() (prototye in <sys/movedata.h>) or
_movedatal() should be fast enough.
>The idea here is to perform the buffer copy using only one
>far pointer reference and a rep.
I might be wrong here, but I thought you cannot overwrite the
destination segment of movs upcodes.
>So, I loaded esi, edi, and ecx with the
>appropriate values (I hope).
Perhaps ;)
>After clearing the direction flag, I followed
><sys/farptr.h>'s example for moving data (hence, the .byte 0x64).
I think, the segment overwrite should come after the rep. But this
will be a void point anyway, see below.
>However, the problem comes in that rep movsl (or movsd, movs, movll, movb,
>movsb, mobsbb, etc.) does not assemble.
I think, this is the least of your problems. With gcc/gas you
have to write "rep; movsl" or "rep\n movsl".
>The objective is to get the framerate up
>from 48 fps to 60 fps (maybe 70 fps) with this code. NOTE: The current
>selector is _dos_ds when the inline ASM executes
The current seletor won't help you in _dos_ds. The intention seems
to be, that it is in %fs. But, even that would probably not help.
> __asm__ __volatile__ ("
> pushl %%esi
> pushl %%edi
> movl %0, %%esi
> movl %1, %%edi
> movl %2, %%ecx
> cld
> .byte 0x64
> rep movsl
> popl %%edi
> popl %%esi"
> :
> : "g" (&CurrMode.Buffer[x2]), "g" (0xB0000 - y), "g" ((x - x2) %
>0x10000));
And here comes a more subtle point of extended inline assembly.
You feel secure, when you are saving the registers esi and edi yourself,
by pushing and popping them. This is not enough. With the "g" contraint,
you may get your input in any register (and more). So assume, that
gcc decides, that "%1" really is in esi. Then your "movl %1, %%edi"
will fail, because esi has already been overwritten. The problem
is, that gcc has no way to detect this. It also may work with
one compiler version and break with the next. It may even break
with the next compiler switches you try.
Also your code does clobber the ecx register, which you didn't tell
gcc.
The following code might do wat you want. I assume, your
inputs are correct.
/* Untested, not even tried to compile it. */
/* These are needed, to tell gcc, that the input registers
get clobberd. Yes, this is not too intuitive. */
int unused1, unused2, unused3;
__asm__ volatile(
"pushw %%es\n"
"movw %w3, %%es\n"
"cld; rep; movsl\n"
"popw %%es"
: "=c" (unused1), "=S" (unused2), "=D" (unused3)
: "rm" (_dos_ds), "0" ((x - x2) % 0x10000), "1" (&CurrMode.Buffer[x2]),
"2" (0xB0000 - y)
: "memory", "cc");
I suggest you read the manual sections of gcc extended inline assembly.
The FAQ shows how to find them. In this case, unfortunately even this
won't be enough. AFAIK the "%w3" is not explained there. Recently,
Eli Zaretskii sent an URL to this list, with an IHMO very good description
of extended inline assembly. I currently don't have the URL handy.
You can find the message at www.delorie.com.
A related question. I have seen quite a few low level "video writing"
problems here. Allegro and Grx both have fast frame buffer
access methods, including fast bitblt functions. Why do so many
people go through the trouble, to reinvent the wheel?
--
Regards, Dieter Buerssner
- Raw text -