Mail Archives: djgpp/2000/05/20/12:00:15
I'm not sure you really need to use assembler. There was a quite interesting
thread (inefficiency of GCC output code & -O problem) some time ago. I came
up with inline ASM problems and I mentioned that my inline ASM is much
faster than plane C. Optimized C can be as fast as ASM or even a little
faster than ASM on different CPUs due to different optimizations actually
needed.
I don't know is it possible to find that thread somewhere in the mail
archive which I don't use or at http://www.deja.com. Anyone can explain how
this works?
I don't think you can speed up your program as much as you expect by making
bitblitting in ASM. I think if you optimize your algorithm and run GCC with
optimization switches, you can achieve a very good result.
My free dimensional texture mapper made in plane C is almost as fast as the
same implementation in inline ASM on my computer. Dieter Buerssner achieved
ebven higher FPS rate with C version than initial version with lots of
inline ASM. GCC and me do a bit different optimizations, although both seem
to be very efficient.
Good Luck
Alexei A. Frounze
-----------------------------------------
Homepage: http://alexfru.chat.ru
Mirror: http://members.xoom.com/alexfru
"Thomas J. Hruska" wrote:
>
> Hello, I am doing that inline ASM thing...again. The situation is that I
> am trying to speed up screen dumps from a buffer to the video buffer using
> far pointers. The idea here is to perform the buffer copy using only one
> far pointer reference and a rep. So, I loaded esi, edi, and ecx with the
> appropriate values (I hope). After clearing the direction flag, I followed
> <sys/farptr.h>'s example for moving data (hence, the .byte 0x64). However,
> the problem comes in that rep movsl (or movsd, movs, movll, movb, movsb,
> mobsbb, etc.) does not assemble. The objective is to get the framerate up
> from 48 fps to 60 fps (maybe 70 fps) with this code. NOTE: The current
> selector is _dos_ds when the inline ASM executes (also, assume that y =
> 0x10000, x = screen_width * screen_height, x2 = 0).
>
> __asm__ __volatile__ ("
> pushl %%esi
> pushl %%edi
> movl %0, %%esi
> movl %1, %%edi
> movl %2, %%ecx
> cld
> .byte 0x64
> rep movsl
> popl %%edi
> popl %%esi"
> :
> : "g" (&CurrMode.Buffer[x2]), "g" (0xB0000 - y), "g" ((x - x2) %
> 0x10000));
>
> Thanks for any help in advance!
>
> Thomas J. Hruska -- shinelight AT crosswinds DOT net
> Shining Light Productions -- "Meeting the needs of fellow programmers"
> http://www.shininglightpro.com/
- Raw text -