Mail Archives: djgpp/2000/05/03/11:56:08
From: | buers AT gmx DOT de (Dieter Buerssner)
|
Newsgroups: | comp.os.msdos.djgpp
|
Subject: | Re: Assembly code doesn't work properly.
|
Date: | 3 May 2000 15:05:54 GMT
|
Lines: | 109
|
Message-ID: | <8epma2.3vvq7at.0@buerssner-17104.user.cis.dfn.de>
|
References: | <39103287 DOT 92B19ACF AT htsol DOT com>
|
NNTP-Posting-Host: | pec-145-106.tnt10.s2.uunet.de (149.225.145.106)
|
Mime-Version: | 1.0
|
X-Trace: | fu-berlin.de 957366354 10004382 149.225.145.106 (16 [17104])
|
X-Posting-Agent: | Hamster/1.3.13.0
|
User-Agent: | Xnews/03.02.04
|
To: | djgpp AT delorie DOT com
|
DJ-Gateway: | from newsgroup comp.os.msdos.djgpp
|
Reply-To: | djgpp AT delorie DOT com
|
Yoram Hofman wrote:
>Assembly code doesn't work properly.
>I need to transfer an image from frame memory controller (far physical
>address - "src" in my program) to DRAM.
>To optimize the transfer I wrote simple assembly code (my first)
>The problem is that it works well in little application. But when I put
>this module (.c file as is) to our project it stops to work.
This is not too surprising. It will depend on many factors, whether
your assembly works or not. It looks, like you just edited the
gcc -S output and converged this into inline assembly. This way,
there are some assumptions in your inline code, that depend i.e.
on optimization options.
>int Read_image_directly(unsigned long src, unsigned char * dest)
>{
> unsigned long dest_index = 0;
> unsigned long src_index = src;
> src_index += IMAGE_START;
> src_index += 1L;
>
> _farsetsel(mem_sel);
>
>/* this is original code I want to optimize
> while( dest_index < IMAGE_SIZE ) //IMAGE_SIZE = 154560 bytes
> {
> *(dest + dest_index) = _farnspeekb( src_index );
> dest_index++;
> src_index = src_index + 4;
> }
>*/
You may be able to speed this up in C.
/* Untested code */
int Read_image_directly(unsigned long src, unsigned char * dest)
{
unsigned long src_index, n;
_farsetsel(mem_sel);
src_index = src + IMAGE_START + 1L;
n = IMAGE_SIZE; /* assume IMAGE_SIZE > 0 */
do
{
*dest++ = _farnspeekb( src_index );
src_index += 4;
}
while (--n != 0);
return 1;
}
If this is still too slow, you may try the gcc switch -funroll-loops
or do manual loop-unrolling like (assuming IMAGE_SIZE%2==0)
n = IMAGE_SIZE/2;
do
{
dest[0] = _farnspeekb( src_index );
src_index += 4;
dest[1] = _farnspeekb( src_index );
scr_index += 4;
dest += 2;
}
while (--n != 0);
>/* my assembly */
> asm("m_loop:
> cmpl $154559, -4(%ebp)
> jle m_code
> jmp m_end
> m_code:
> movl 12(%ebp),%eax
> movl -4(%ebp),%edx
This tries to put dest_index into edx. When gcc decides, that
dest_index can be kept in a register, it won't allocate space
for it on the stack, an this will fail. Also, when you use
-fomit-frame-pointer, this won't work. And the whole loop looks
entirely inefficient, just as it was the output of gcc -S and without
-O. I think the C code should be faster when compiled with -O. If you
really do need inline assembly, the FAQ will have quite a few pointers to
documentation. But I doubt, that you can get much faster with inline
assembly.
>One more question for what ".byte 0x64" do I need?
This is for segment overwrite.
.byte 0x64
movb (%edx),%cl
is the same as
movb %fs:(%edx), %cl
At least some versions of gas would produce (sometimes) wrong
upcodes with the latter line. So people got used in hardcoding
the segment overwrite with .byte.
One last suggestion (with some speculation). I guess, when compiling
your posted code with gcc -Wall -O, it will produce some warnings about
unused variables. This would suggest, that gcc didn't allocate
space for those variables at all, and may give a hint, that accessing
them via the frame pointer will be wrong.
--
Regards, Dieter
- Raw text -