Mail Archives: djgpp/1997/08/04/17:37:44
I have a strange problem with a loop that calls a NASM function
Here's the NASM code first:
BITS 32
GLOBAL _ppix
EXTERN _mouse_x
EXTERN _mouse_y
SECTION .text
_ppix: mov eax, [_mouse_y]
mov ebx, [_mouse_x]
shl eax, 5 ; mouse_y * 32
lea eax, [eax + 4*eax] ; mouse_y * 5
lea eax, [eax + 4*eax] ; mouse_y * 5, altogether: mouse_y * 800
add eax, ebx
fs mov BYTE [eax], 11 ; put color 11 on screen
ret
fs is a selector that points to the VESA 2.0 LBF
_mouse_x and _mouse_y are from Allegro
Now if i put this code in my c program, first everything works fine:
/* VESA mode 640*480 already set, virtual width = 800 */
...
_farsetsel (LFBsel); /* sets fs */
time = rawclock ();
for (co = 0; co < 100000000; co++)
ppix ();
time = rawclock () - time;
...
I can compile and run this program, i can move the pixel by moving the mouse,
i'm getting good performance (393 clockticks, about 4,6 mio. pixel per second)
on a Pentium 100...BUT...
if i compile with -O1, -O2 or -O3, the loop doesn't work! I can still move the
pixel, but the loop never gets to an end (i inserted a little 'break' routine
that stops when i press a mouse button; the result: co (which is unsigned long)
had always a value of about 400 (random), no matter how many pixels it had
actually put or how long the program was already running)
I've already found a solution using inline assembly (almost same code as above)
which is slow without optimize (510 clockticks) but faster with -O3 (330 ticks
per 100 mio. pixel). But i want to know what's happening there anyway, since i
prefer intel syntax
BTW why can't GAS handle segreg prefixes? '.byte 0x64' works but could be better
- Raw text -