Message-Id: <m0wvUhg-0003GiC@fwd02.btx.dtag.de>
Date: Mon, 4 Aug 97 23:30 MET DST
To: djgpp AT delorie DOT com
Subject: Strange loop with optimize
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8BIT
From: Georg DOT Kolling AT t-online DOT de (Georg Kolling)
Precedence: bulk

I have a strange problem with a loop that calls a NASM function
Here's the NASM code first:

  BITS 32
  GLOBAL _ppix
  EXTERN _mouse_x
  EXTERN _mouse_y
  SECTION .text
  _ppix:  mov eax, [_mouse_y]      
          mov ebx, [_mouse_x]
          shl eax, 5                 ; mouse_y * 32
          lea eax, [eax + 4*eax]     ; mouse_y * 5 
          lea eax, [eax + 4*eax]     ; mouse_y * 5, altogether: mouse_y * 800
          add eax, ebx                
          fs mov BYTE [eax], 11      ; put color 11 on screen 
          ret

fs is a selector that points to the VESA 2.0 LBF
_mouse_x and _mouse_y are from Allegro
Now if i put this code in my c program, first everything works fine:

  /* VESA mode 640*480 already set, virtual width = 800 */
  ...
  _farsetsel (LFBsel);       /* sets fs */
  time = rawclock ();
  for (co = 0; co < 100000000; co++)
      ppix ();
  time = rawclock () - time;
  ...


I can compile and run this program, i can move the pixel by moving the mouse,
i'm getting good performance (393 clockticks, about 4,6 mio. pixel per second)
on a Pentium 100...BUT...
if i compile with -O1, -O2 or -O3, the loop doesn't work! I can still move the 
pixel, but the loop never gets to an end (i inserted a little 'break' routine
that stops when i press a mouse button; the result: co (which is unsigned long)
had always a value of about 400 (random), no matter how many pixels it had 
actually put or how long the program was already running) 
I've already found a solution using inline assembly (almost same code as above)
which is slow without optimize (510 clockticks) but faster with -O3 (330 ticks
per 100 mio. pixel). But i want to know what's happening there anyway, since i
prefer intel syntax
BTW why can't GAS handle segreg prefixes? '.byte 0x64' works but could be better