delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2000/05/20/07:15:18

From: buers AT gmx DOT de (Dieter Buerssner)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Inline ASM question...
Date: 20 May 2000 11:01:54 GMT
Lines: 101
Message-ID: <8g62mf.3vs4i2t.0@buerssner-17104.user.cis.dfn.de>
References: <8empao$5k6$1 AT nnrp02 DOT primenet DOT com> <390ef9f9$0$72098 AT SSP1NO17 DOT highway DOT telekom DOT at> <8emvhq$7mn$1 AT nnrp03 DOT primenet DOT com> <3 DOT 0 DOT 6 DOT 32 DOT 20000505015633 DOT 007b2210 AT pop DOT crosswinds DOT net> <3 DOT 0 DOT 6 DOT 32 DOT 20000510204858 DOT 007b6e40 AT pop DOT crosswinds DOT net> <3 DOT 0 DOT 6 DOT 32 DOT 20000511021045 DOT 007af4a0 AT pop DOT crosswinds DOT net> <3 DOT 0 DOT 6 DOT 32 DOT 20000519211524 DOT 007c7290 AT pop DOT crosswinds DOT net>
NNTP-Posting-Host: pec-142-79.tnt9.s2.uunet.de (149.225.142.79)
Mime-Version: 1.0
X-Trace: fu-berlin.de 958820514 789859 149.225.142.79 (16 [17104])
X-Posting-Agent: Hamster/1.3.13.0
User-Agent: Xnews/03.02.04
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

Thomas J. Hruska wrote:

>Hello, I am doing that inline ASM thing...again.  The situation is that I
>am trying to speed up screen dumps from a buffer to the video buffer using
>far pointers.  

It seems, that you are mostly moving big junks of memory at a time.
For this the function movedata() (prototye in <sys/movedata.h>) or
_movedatal() should be fast enough.

>The idea here is to perform the buffer copy using only one
>far pointer reference and a rep.  

I might be wrong here, but I thought you cannot overwrite the 
destination segment of movs upcodes. 

>So, I loaded esi, edi, and ecx with the
>appropriate values (I hope).  

Perhaps ;)

>After clearing the direction flag, I followed
><sys/farptr.h>'s example for moving data (hence, the .byte 0x64).  

I think, the segment overwrite should come after the rep. But this
will be a void point anyway, see below.

>However, the problem comes in that rep movsl (or movsd, movs, movll, movb, 
>movsb, mobsbb, etc.) does not assemble.  

I think, this is the least of your problems. With gcc/gas you
have to write "rep; movsl" or "rep\n movsl".

>The objective is to get the framerate up
>from 48 fps to 60 fps (maybe 70 fps) with this code.  NOTE:  The current
>selector is _dos_ds when the inline ASM executes 

The current seletor won't help you in _dos_ds. The intention seems
to be, that it is in %fs. But, even that would probably not help.

>      __asm__ __volatile__ ("
>        pushl %%esi
>        pushl %%edi
>        movl %0, %%esi
>        movl %1, %%edi
>        movl %2, %%ecx
>        cld
>        .byte 0x64
>        rep movsl
>        popl %%edi
>        popl %%esi"
>        :
>        : "g" (&CurrMode.Buffer[x2]), "g" (0xB0000 - y), "g" ((x - x2) %
>0x10000));

And here comes a more subtle point of extended inline assembly.
You feel secure, when you are saving the registers esi and edi yourself,
by pushing and popping them. This is not enough. With the "g" contraint,
you may get your input in any register (and more). So assume, that
gcc decides, that "%1" really is in esi. Then your "movl %1, %%edi"
will fail, because esi has already been overwritten. The problem
is, that gcc has no way to detect this. It also may work with
one compiler version and break with the next. It may even break
with the next compiler switches you try.

Also your code does clobber the ecx register, which you didn't tell
gcc.

The following code might do wat you want. I assume, your
inputs are correct.

/* Untested, not even tried to compile it. */

/* These are needed, to tell gcc, that the input registers
   get clobberd. Yes, this is not too intuitive. */
int unused1, unused2, unused3;

__asm__ volatile(
  "pushw %%es\n"
  "movw %w3, %%es\n"
  "cld; rep; movsl\n"
  "popw %%es" 
  : "=c" (unused1), "=S" (unused2), "=D" (unused3)
  : "rm" (_dos_ds), "0" ((x - x2) % 0x10000), "1" (&CurrMode.Buffer[x2]),
    "2" (0xB0000 - y)
  : "memory", "cc");

I suggest you read the manual sections of gcc extended inline assembly.
The FAQ shows how to find them. In this case, unfortunately even this
won't be enough. AFAIK the "%w3" is not explained there. Recently, 
Eli Zaretskii sent an URL to this list, with an IHMO very good description 
of extended inline assembly. I currently don't have the URL handy. 
You can find the message at www.delorie.com. 

A related question. I have seen quite a few low level "video writing"
problems here. Allegro and Grx both have fast frame buffer
access methods, including fast bitblt functions. Why do so many
people go through the trouble, to reinvent the wheel?

-- 
Regards, Dieter Buerssner

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019