Sender: nate AT cartsys DOT com Message-ID: <376AA3D1.AACEABBF@cartsys.com> Date: Fri, 18 Jun 1999 12:53:53 -0700 From: Nate Eldredge X-Mailer: Mozilla 4.08 [en] (X11; I; Linux 2.2.10 i586) MIME-Version: 1.0 To: djgpp AT delorie DOT com Subject: Re: Why dosn't my asm getpixel() work? References: <3766f436 DOT 5083819 AT nntpserver DOT swip DOT net> <3768394B DOT 4C9E319E AT cartsys DOT com> <376981c9 DOT 8709864 AT nntpserver DOT swip DOT net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Reply-To: djgpp AT delorie DOT com Gathers wrote: > >4. This code will run faster if you use the 32-bit registers and > >instructions. In protected mode, there is a 1-cycle penalty on each > >16-bit instruction. > > > But then I'll have to use a long int instead of short? Will it still > be faster? Yes, for the same reason. (16-bit instructions are slower even when the compiler generates them.) You should generally only use `short' when you're storing a lot of them and need to save the memory. Also, your 16-bit instructions won't be able to handle the 32-bit addresses. > >5. It's probably simpler to use an indirect move instead of stosb/lodsb. > > > And how do I do that? :) /* calculate address in, say, %ebx and value in %al */ movb %al, (%ebx) The string instructions are only useful in loops (and often not even then). > >6. The multiply can be optimized better; this is left as an exercise for > >the reader. > > > mov ax,y > mov bx,ax > shl ax,8 > shl bx,6 > add ax,bx > ax==y*320? Something that will never cease to amaze me is that GCC discovered it could do the equivalent of: mov eax, y lea eax, [eax + eax * 4] ; multiply eax by 5 shl eax, 6 ; and then by 64-- 5 * 64 = 320 saving two instructions and a register. > >In fact, you could let the compiler do all this: > > > >#include > >#define mygetpixel(seg, add, x, y) (_farpeekb((seg), (add) + (x) + ((y) > >* 320))) > >#define myputpixel(seg, add, x, y, c) (_farpokeb((seg), (add) + (x) + > >((y) * 320), (c))) > > > >The quality of the code, if optimization is on (you can find it by using > >-S), may surprise you. > > > Yes, but I don't want to replace the pixel functions, I just want the > asm code for it so I can make asm of my blur funtion, then speeding it > up is the last step. I think even I could manage to speed it up if I > just could get it working.. :) > > Anyway, now the code works, even with -O3, unless I use both functions > at the same time..then it only works without optimizion. I test with > this loop > for(x=0;x<320;x++) > for(y=0;y<200;y++){ > myputpixel(screenseg,screenadd,x,y,5); > // _putpixel(screen,x,y,5); > // if(_getpixel(screen,x,y)!=5){ > if(mygetpixel(screenseg,screenadd,x,y)!=5){ > textprintf(screen,font,20,20,55,"x=%d y=%d",x,y); > while (!keypressed()) {} > exit(0); > } > } > Here is the code pieces again, and I'd be really grateful if someone > could point out the problem :) The `mul' instruction is the expanding multiply and will store the high half of the result into dx. Try adding that to your clobber list. Also try switching to use of the 32-bit instructions; there are a few places where carries would kill you. > screenseg=screen->seg; > screenadd=bmp_write_line(screen,0); > > unsigned char mygetpixel(unsigned short seg,unsigned long add,unsigned > short x,unsigned short y) > { > unsigned char c; > asm("push %%ds\n\t" > "movw %1,%%ax\n\t" > "movw %%ax,%%ds\n\t" > "movw %2,%%ax\n\t" > "xor %%bx,%%bx\n\t" > "movw $0x140,%%bx\n\t" > "mul %%bx\n\t" > "addl %3,%%ax\n\t" > "addw %4,%%ax\n\t" > "movl %%ax,%%si\n\t" > "lodsb\n\t" > "movb %%al,%0\n\t" > "pop %%ds" > :"g="(c) > :"g"(seg),"g"(y),"g"(add),"g"(x) > :"ax","bx","si","memory" > ); > return c; > } > > void myputpixel(unsigned short seg,unsigned long add,unsigned short > x,unsigned short y,unsigned char c) > { > asm("push %%es\n\t" > "movw %0,%%ax\n\t" > "movw %%ax,%%es\n\t" > "movw %1,%%ax\n\t" > "xor %%bx,%%bx\n\t" > "movw $0x140,%%bx\n\t" > "mul %%bx\n\t" > "addl %2,%%ax\n\t" > "addw %3,%%ax\n\t" > "movl %%ax,%%di\n\t" > "movb %4,%%al\n\t" > "stosb\n\t" > "pop %%es" > : > :"g"(seg),"g"(y),"g"(add),"g"(x),"g"(c) > :"ax","bx","di","memory" > ); > } > > /Gathers -- Nate Eldredge nate AT cartsys DOT com