delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1998/12/04/19:22:09.1

Sender: nate AT cartsys DOT com
Message-ID: <366879EB.6C322602@cartsys.com>
Date: Fri, 04 Dec 1998 16:10:19 -0800
From: Nate Eldredge <nate AT cartsys DOT com>
X-Mailer: Mozilla 4.05 [en] (X11; I; Linux 2.0.35 i486)
MIME-Version: 1.0
To: djgpp AT delorie DOT com
Subject: Re: Extended ASM (Was: misc questions)
References: <Pine DOT SUN DOT 3 DOT 91 DOT 981202164845 DOT 27255C-100000 AT is> <ranla DOT 148 DOT 0003E15E AT post DOT tau DOT ac DOT il>
Reply-To: djgpp AT delorie DOT com

Tal Lavi wrote:
> 
> >If you have already studied those, you will
> >have to ask further questions here.
> 
> Here's one.
> I need a FlushPage routine(macro?) for a 640x480x16bpp mode written in asm.
> Until now, I used a C(++) routine which uses _farsetsel & _farnspokel.
> All my experiments in writing such a routine with the extended ASM failed.
> Here's one that SEEM to be alright to me, the unexperienced ASM programer.
> Please, show me a better one(that works).
> And, PLEASE, mention what the not so obvious parts of your code does.
> 
> void FlushPage(unsigned short C)
> {
>   __asm__ __volatile__ ("
>     movw %0,%%fs\n
>     .byte 0x64\n
>     movl $0,%%edi\n

You have an FS override prefix to an instruction that doesn't even
access memory.  This is meaningless.  What you probably meant to do was
to apply it to the `stosl'.  However, even this is not such a great
idea, as the prefix will cost an extra cycle on each store operation. 
The best solution is to load your selector into the segment register
which is the default; for `stos', it's `%es'.  Remember, however, that
GCC expects ES to be preserved, so save and restore it.  (Adding it to
the clobber list won't work; GCC doesn't even know about the existence
of segment registers.)

movl %%es, %%edx
movl %0, %%es
...
rep
stosl
movl %%edx, %%es

>     movl $153600, %%ecx\n
>     rep\n
>     stosl"
>   :
>   :"r"(LFBSelector[ScreenNum]),
>    "a"((long(C)<<16)+C)
>   :"ax","cx","di","memory"
>   );
> }
> 
> I also have a couple of questions:
> 
> 1) Should I use a rep stos, or maybe movl with a branch?

For simplicity, `rep stosl' is good.  Using `movl' will require the
actual move, an add to update the index, a subtract (or possibly
compare) to decide whether you're done, and a conditional jump which
will almost always be taken.  This will be significantly slower,
especially since many processors must flush their prefetch queue on a
jump.

If you do the `mov's in bulk, it can lead to a speedup on some newer
processors (the string instructions have got less optimization as
processors evolved).  This would probably look something like:

movl count, %ecx
movl start, %edi
movl fill, %eax

loop:
movl %eax, (%edi)
movl %eax, 4(%edi)
movl %eax, 8(%edi)
...
movl %eax, 28(%edi)
addl $32, %edi
subl $32, %ecx
jnz loop

But this needs extra care to handle the leftover bytes, is more
complicated, is larger, and isn't universally better (and the speedup
isn't a lot, even when it is faster).  At least in the short run, stick
with `stos'.
 
> 2)
> > The first are constraints again, and ".byte 0x64" causes the assembler to
> > emit 0x64 into the binary code.  0x64 is the op-code for FS: prefix
> > override (meaning the next instruction uses offsets into the segment
> > whose selector is in the FS register).  sys/farptr.h uses a byte constant
> > because early versions of Gas didn't support prefixes (I'm not sure how
> > the things are with Binutils 2.8.1).
> 
> Why should you write(and know) the opcode? Isn't there an ASM instruction for that purpose?
> (movl %something, %%fs(%something))?

Yes, but some people prefer to write their prefixes explicitly.  It does
give one a better sense of what is being generated.  So

movl %reg, %fs:mem

is equivalent to

fs
movl %reg, mem

(Note that these really are prefixes since they can be applied to most
instructions, not just `mov'.)

These would be the clearest way to write those instructions.  But since
GAS was buggy and would ignore the prefixes under certain, ill-defined
circumstances, many people would write the hex opcode with `.byte'
instead, to be safe.  This especially applied to the DJGPP headers,
which might be used with almost any GAS version.

-- 

Nate Eldredge
nate AT cartsys DOT com

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019