Mail Archives: djgpp/1999/06/18/15:53:21
Gathers wrote:
> >4. This code will run faster if you use the 32-bit registers and
> >instructions. In protected mode, there is a 1-cycle penalty on each
> >16-bit instruction.
> >
> But then I'll have to use a long int instead of short? Will it still
> be faster?
Yes, for the same reason. (16-bit instructions are slower even when the
compiler generates them.) You should generally only use `short' when
you're storing a lot of them and need to save the memory.
Also, your 16-bit instructions won't be able to handle the 32-bit
addresses.
> >5. It's probably simpler to use an indirect move instead of stosb/lodsb.
> >
> And how do I do that? :)
/* calculate address in, say, %ebx and value in %al */
movb %al, (%ebx)
The string instructions are only useful in loops (and often not even
then).
> >6. The multiply can be optimized better; this is left as an exercise for
> >the reader.
> >
> mov ax,y
> mov bx,ax
> shl ax,8
> shl bx,6
> add ax,bx
> ax==y*320?
Something that will never cease to amaze me is that GCC discovered it
could do the equivalent of:
mov eax, y
lea eax, [eax + eax * 4] ; multiply eax by 5
shl eax, 6 ; and then by 64-- 5 * 64 = 320
saving two instructions and a register.
> >In fact, you could let the compiler do all this:
> >
> >#include <sys/farptr.h>
> >#define mygetpixel(seg, add, x, y) (_farpeekb((seg), (add) + (x) + ((y)
> >* 320)))
> >#define myputpixel(seg, add, x, y, c) (_farpokeb((seg), (add) + (x) +
> >((y) * 320), (c)))
> >
> >The quality of the code, if optimization is on (you can find it by using
> >-S), may surprise you.
> >
> Yes, but I don't want to replace the pixel functions, I just want the
> asm code for it so I can make asm of my blur funtion, then speeding it
> up is the last step. I think even I could manage to speed it up if I
> just could get it working.. :)
>
> Anyway, now the code works, even with -O3, unless I use both functions
> at the same time..then it only works without optimizion. I test with
> this loop
> for(x=0;x<320;x++)
> for(y=0;y<200;y++){
> myputpixel(screenseg,screenadd,x,y,5);
> // _putpixel(screen,x,y,5);
> // if(_getpixel(screen,x,y)!=5){
> if(mygetpixel(screenseg,screenadd,x,y)!=5){
> textprintf(screen,font,20,20,55,"x=%d y=%d",x,y);
> while (!keypressed()) {}
> exit(0);
> }
> }
> Here is the code pieces again, and I'd be really grateful if someone
> could point out the problem :)
The `mul' instruction is the expanding multiply and will store the high
half of the result into dx. Try adding that to your clobber list.
Also try switching to use of the 32-bit instructions; there are a few
places where carries would kill you.
> screenseg=screen->seg;
> screenadd=bmp_write_line(screen,0);
>
> unsigned char mygetpixel(unsigned short seg,unsigned long add,unsigned
> short x,unsigned short y)
> {
> unsigned char c;
> asm("push %%ds\n\t"
> "movw %1,%%ax\n\t"
> "movw %%ax,%%ds\n\t"
> "movw %2,%%ax\n\t"
> "xor %%bx,%%bx\n\t"
> "movw $0x140,%%bx\n\t"
> "mul %%bx\n\t"
> "addl %3,%%ax\n\t"
> "addw %4,%%ax\n\t"
> "movl %%ax,%%si\n\t"
> "lodsb\n\t"
> "movb %%al,%0\n\t"
> "pop %%ds"
> :"g="(c)
> :"g"(seg),"g"(y),"g"(add),"g"(x)
> :"ax","bx","si","memory"
> );
> return c;
> }
>
> void myputpixel(unsigned short seg,unsigned long add,unsigned short
> x,unsigned short y,unsigned char c)
> {
> asm("push %%es\n\t"
> "movw %0,%%ax\n\t"
> "movw %%ax,%%es\n\t"
> "movw %1,%%ax\n\t"
> "xor %%bx,%%bx\n\t"
> "movw $0x140,%%bx\n\t"
> "mul %%bx\n\t"
> "addl %2,%%ax\n\t"
> "addw %3,%%ax\n\t"
> "movl %%ax,%%di\n\t"
> "movb %4,%%al\n\t"
> "stosb\n\t"
> "pop %%es"
> :
> :"g"(seg),"g"(y),"g"(add),"g"(x),"g"(c)
> :"ax","bx","di","memory"
> );
> }
>
> /Gathers
--
Nate Eldredge
nate AT cartsys DOT com
- Raw text -