delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1999/06/18/15:53:21

Sender: nate AT cartsys DOT com
Message-ID: <376AA3D1.AACEABBF@cartsys.com>
Date: Fri, 18 Jun 1999 12:53:53 -0700
From: Nate Eldredge <nate AT cartsys DOT com>
X-Mailer: Mozilla 4.08 [en] (X11; I; Linux 2.2.10 i586)
MIME-Version: 1.0
To: djgpp AT delorie DOT com
Subject: Re: Why dosn't my asm getpixel() work?
References: <3766f436 DOT 5083819 AT nntpserver DOT swip DOT net> <3768394B DOT 4C9E319E AT cartsys DOT com> <376981c9 DOT 8709864 AT nntpserver DOT swip DOT net>
Reply-To: djgpp AT delorie DOT com

Gathers wrote:

> >4. This code will run faster if you use the 32-bit registers and
> >instructions.  In protected mode, there is a 1-cycle penalty on each
> >16-bit instruction.
> >
> But then I'll have to use a long int instead of short? Will it still
> be faster?

Yes, for the same reason.  (16-bit instructions are slower even when the
compiler generates them.)  You should generally only use `short' when
you're storing a lot of them and need to save the memory.

Also, your 16-bit instructions won't be able to handle the 32-bit
addresses.
 
> >5. It's probably simpler to use an indirect move instead of stosb/lodsb.
> >
> And how do I do that? :)

/* calculate address in, say, %ebx and value in %al */

movb %al, (%ebx)

The string instructions are only useful in loops (and often not even
then).
 
> >6. The multiply can be optimized better; this is left as an exercise for
> >the reader.
> >
> mov ax,y
> mov bx,ax
> shl ax,8
> shl bx,6
> add ax,bx
> ax==y*320?

Something that will never cease to amaze me is that GCC discovered it
could do the equivalent of:

mov eax, y
lea eax, [eax + eax * 4]  ; multiply eax by 5
shl eax, 6  ; and then by 64-- 5 * 64 = 320

saving two instructions and a register.

> >In fact, you could let the compiler do all this:
> >
> >#include <sys/farptr.h>
> >#define mygetpixel(seg, add, x, y) (_farpeekb((seg), (add) + (x) + ((y)
> >* 320)))
> >#define myputpixel(seg, add, x, y, c) (_farpokeb((seg), (add) + (x) +
> >((y) * 320), (c)))
> >
> >The quality of the code, if optimization is on (you can find it by using
> >-S), may surprise you.
> >
> Yes, but I don't want to replace the pixel functions, I just want the
> asm code for it so I can make asm of my blur funtion, then speeding it
> up is the last step. I think even I could manage to speed it up if I
> just could get it working.. :)
> 
> Anyway, now the code works, even with -O3, unless I use both functions
> at the same time..then it only works without optimizion. I test with
> this loop
> for(x=0;x<320;x++)
>    for(y=0;y<200;y++){
>       myputpixel(screenseg,screenadd,x,y,5);
> //      _putpixel(screen,x,y,5);
> //      if(_getpixel(screen,x,y)!=5){
>       if(mygetpixel(screenseg,screenadd,x,y)!=5){
>          textprintf(screen,font,20,20,55,"x=%d y=%d",x,y);
>          while (!keypressed()) {}
>          exit(0);
>       }
>    }
> Here is the code pieces again, and I'd be really grateful if someone
> could point out the problem :)

The `mul' instruction is the expanding multiply and will store the high
half of the result into dx.  Try adding that to your clobber list.

Also try switching to use of the 32-bit instructions; there are a few
places where carries would kill you.

> screenseg=screen->seg;
> screenadd=bmp_write_line(screen,0);
> 
> unsigned char mygetpixel(unsigned short seg,unsigned long add,unsigned
> short x,unsigned short y)
> {
> unsigned char c;
> asm("push %%ds\n\t"
>     "movw %1,%%ax\n\t"
>     "movw %%ax,%%ds\n\t"
>     "movw %2,%%ax\n\t"
>     "xor %%bx,%%bx\n\t"
>     "movw $0x140,%%bx\n\t"
>     "mul %%bx\n\t"
>     "addl %3,%%ax\n\t"
>     "addw %4,%%ax\n\t"
>     "movl %%ax,%%si\n\t"
>     "lodsb\n\t"
>     "movb %%al,%0\n\t"
>     "pop %%ds"
>     :"g="(c)
>     :"g"(seg),"g"(y),"g"(add),"g"(x)
>     :"ax","bx","si","memory"
> );
> return c;
> }
> 
> void myputpixel(unsigned short seg,unsigned long add,unsigned short
> x,unsigned short y,unsigned char c)
> {
> asm("push %%es\n\t"
>     "movw %0,%%ax\n\t"
>     "movw %%ax,%%es\n\t"
>     "movw %1,%%ax\n\t"
>     "xor %%bx,%%bx\n\t"
>     "movw $0x140,%%bx\n\t"
>     "mul %%bx\n\t"
>     "addl %2,%%ax\n\t"
>     "addw %3,%%ax\n\t"
>     "movl %%ax,%%di\n\t"
>     "movb %4,%%al\n\t"
>     "stosb\n\t"
>     "pop %%es"
>     :
>     :"g"(seg),"g"(y),"g"(add),"g"(x),"g"(c)
>     :"ax","bx","di","memory"
> );
> }
> 
> /Gathers

-- 

Nate Eldredge
nate AT cartsys DOT com

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019