From: gathers AT cyberdude DOT com (Gathers) Newsgroups: comp.os.msdos.djgpp Subject: Re: Why dosn't my asm getpixel() work? Organization: Green Dragon Message-ID: <376981c9.8709864@nntpserver.swip.net> References: <3766f436 DOT 5083819 AT nntpserver DOT swip DOT net> <3768394B DOT 4C9E319E AT cartsys DOT com> X-Newsreader: Forte Agent 1.0/32.354 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Lines: 132 Date: Thu, 17 Jun 1999 23:49:57 GMT NNTP-Posting-Host: 130.244.97.29 X-Complaints-To: news-abuse AT swip DOT net X-Trace: nntpserver.swip.net 929663696 130.244.97.29 (Fri, 18 Jun 1999 01:54:56 MET DST) NNTP-Posting-Date: Fri, 18 Jun 1999 01:54:56 MET DST To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com On Wed, 16 Jun 1999 16:54:51 -0700, Nate Eldredge wrote: >lodsb acts relative to ds, not es. You should load your segment into >%ds. > >A few other problems: > >1. You need to restore es and ds after changing them. GCC expects them >not to change. Adding them to the clobber list will NOT work, so either >push/pop or use another register. > >2. You don't tell the compiler that %si or %di are being clobbered. > Hey, now it works until I turn on optimizion :) >And some efficiency issues, if you're interested: > >3. It's pointless to zero a register before overwriting it. > I know, but I was just adding that while trying to locate the error, to make sure nothing was left in the high part. >4. This code will run faster if you use the 32-bit registers and >instructions. In protected mode, there is a 1-cycle penalty on each >16-bit instruction. > But then I'll have to use a long int instead of short? Will it still be faster? >5. It's probably simpler to use an indirect move instead of stosb/lodsb. > And how do I do that? :) >6. The multiply can be optimized better; this is left as an exercise for >the reader. > mov ax,y mov bx,ax shl ax,8 shl bx,6 add ax,bx ax==y*320? >In fact, you could let the compiler do all this: > >#include >#define mygetpixel(seg, add, x, y) (_farpeekb((seg), (add) + (x) + ((y) >* 320))) >#define myputpixel(seg, add, x, y, c) (_farpokeb((seg), (add) + (x) + >((y) * 320), (c))) > >The quality of the code, if optimization is on (you can find it by using >-S), may surprise you. > Yes, but I don't want to replace the pixel functions, I just want the asm code for it so I can make asm of my blur funtion, then speeding it up is the last step. I think even I could manage to speed it up if I just could get it working.. :) Anyway, now the code works, even with -O3, unless I use both functions at the same time..then it only works without optimizion. I test with this loop for(x=0;x<320;x++) for(y=0;y<200;y++){ myputpixel(screenseg,screenadd,x,y,5); // _putpixel(screen,x,y,5); // if(_getpixel(screen,x,y)!=5){ if(mygetpixel(screenseg,screenadd,x,y)!=5){ textprintf(screen,font,20,20,55,"x=%d y=%d",x,y); while (!keypressed()) {} exit(0); } } Here is the code pieces again, and I'd be really grateful if someone could point out the problem :) screenseg=screen->seg; screenadd=bmp_write_line(screen,0); unsigned char mygetpixel(unsigned short seg,unsigned long add,unsigned short x,unsigned short y) { unsigned char c; asm("push %%ds\n\t" "movw %1,%%ax\n\t" "movw %%ax,%%ds\n\t" "movw %2,%%ax\n\t" "xor %%bx,%%bx\n\t" "movw $0x140,%%bx\n\t" "mul %%bx\n\t" "addl %3,%%ax\n\t" "addw %4,%%ax\n\t" "movl %%ax,%%si\n\t" "lodsb\n\t" "movb %%al,%0\n\t" "pop %%ds" :"g="(c) :"g"(seg),"g"(y),"g"(add),"g"(x) :"ax","bx","si","memory" ); return c; } void myputpixel(unsigned short seg,unsigned long add,unsigned short x,unsigned short y,unsigned char c) { asm("push %%es\n\t" "movw %0,%%ax\n\t" "movw %%ax,%%es\n\t" "movw %1,%%ax\n\t" "xor %%bx,%%bx\n\t" "movw $0x140,%%bx\n\t" "mul %%bx\n\t" "addl %2,%%ax\n\t" "addw %3,%%ax\n\t" "movl %%ax,%%di\n\t" "movb %4,%%al\n\t" "stosb\n\t" "pop %%es" : :"g"(seg),"g"(y),"g"(add),"g"(x),"g"(c) :"ax","bx","di","memory" ); } /Gathers