delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2000/05/03/11:56:08

From: buers AT gmx DOT de (Dieter Buerssner)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: Assembly code doesn't work properly.
Date: 3 May 2000 15:05:54 GMT
Lines: 109
Message-ID: <8epma2.3vvq7at.0@buerssner-17104.user.cis.dfn.de>
References: <39103287 DOT 92B19ACF AT htsol DOT com>
NNTP-Posting-Host: pec-145-106.tnt10.s2.uunet.de (149.225.145.106)
Mime-Version: 1.0
X-Trace: fu-berlin.de 957366354 10004382 149.225.145.106 (16 [17104])
X-Posting-Agent: Hamster/1.3.13.0
User-Agent: Xnews/03.02.04
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

Yoram Hofman wrote:

>Assembly code doesn't work properly.
>I need to transfer an image from frame memory controller (far physical
>address - "src" in my program) to DRAM.
>To optimize the transfer I wrote simple assembly code (my first)
>The problem is that it works well in little application. But when I put
>this module (.c file as is) to our project it stops to work.

This is not too surprising. It will depend on many factors, whether
your assembly works or not. It looks, like you just edited the
gcc -S output and converged this into inline assembly. This way,
there are some assumptions in your inline code, that depend i.e.
on optimization options.

>int Read_image_directly(unsigned long src, unsigned char * dest)
>{
> unsigned long dest_index = 0;
> unsigned long src_index  = src;
> src_index += IMAGE_START;
> src_index += 1L;
>
> _farsetsel(mem_sel);
>
>/*      this is original code I want to optimize
> while( dest_index < IMAGE_SIZE )  //IMAGE_SIZE = 154560 bytes
> {
>  *(dest + dest_index) = _farnspeekb( src_index );
>  dest_index++;
>  src_index = src_index + 4;
> }
>*/

You may be able to speed this up in C.

/* Untested code */
int Read_image_directly(unsigned long src, unsigned char * dest)
{
 unsigned long src_index, n;

 _farsetsel(mem_sel);
 src_index = src + IMAGE_START + 1L;
 n = IMAGE_SIZE; /* assume IMAGE_SIZE > 0 */
 do 
 {
   *dest++ = _farnspeekb( src_index );
   src_index += 4;
 }
 while (--n != 0);
 return 1;
}

If this is still too slow, you may try the gcc switch -funroll-loops
or do manual loop-unrolling like (assuming IMAGE_SIZE%2==0)

  n = IMAGE_SIZE/2;
  do
  {
    dest[0] = _farnspeekb( src_index );
    src_index += 4;
    dest[1] = _farnspeekb( src_index );
    scr_index += 4; 
    dest += 2;
  }
  while (--n != 0);


>/* my assembly */
> asm("m_loop:
>            cmpl $154559, -4(%ebp)
>            jle m_code
>            jmp m_end
>          m_code:
>            movl 12(%ebp),%eax
>            movl -4(%ebp),%edx

This tries to put dest_index into edx. When gcc decides, that
dest_index can be kept in a register, it won't allocate space
for it on the stack, an this will fail. Also, when you use
-fomit-frame-pointer, this won't work. And the whole loop looks
entirely inefficient, just as it was the output of gcc -S and without
-O. I think the C code should be faster when compiled with -O. If you 
really do need inline assembly, the FAQ will have quite a few pointers to 
documentation. But I doubt, that you can get much faster with inline 
assembly.

>One more question for what ".byte 0x64" do I need? 

This is for segment overwrite.

            .byte 0x64
            movb (%edx),%cl

is the same as
  
            movb %fs:(%edx), %cl

At least some versions of gas would produce (sometimes) wrong 
upcodes with the latter line. So people got used in hardcoding
the segment overwrite with .byte.

One last suggestion (with some speculation). I guess, when compiling
your posted code with gcc -Wall -O, it will produce some warnings about
unused variables. This would suggest, that gcc didn't allocate
space for those variables at all, and may give a hint, that accessing
them via the frame pointer will be wrong.

-- 
Regards, Dieter

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019