delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1998/10/05/19:27:23

Date: Tue, 6 Oct 1998 00:26:44 +0100 (BST)
From: George Foot <george DOT foot AT merton DOT oxford DOT ac DOT uk>
To: BDozer <parsec AT nat DOT bg>
cc: djgpp AT delorie DOT com
Subject: Re: Need for speed
In-Reply-To: <6vbb5d$1fc$1@equila.wfpa.acad.bg>
Message-ID: <Pine.OSF.4.05.9810060008580.4537-100000@sable.ox.ac.uk>
MIME-Version: 1.0
Reply-To: djgpp AT delorie DOT com

On Mon, 5 Oct 1998, BDozer wrote:

>     I need really fast way to copy 640x480 bytes from one place to another.
> Something like movedata(), but I need some more... It have to copy the bytes
> not equal to 1 (the transperant pixel). Can you help me to make assembler
> program or whatever for that?
> I tried:
> for(i = 0; i < 640*480; i++)
> --then check every byte...
> but it's too slow
> 
> Note: I don't need to copy to the video RAM. Just from buffer to buffer

I'll assume you know some assembler.  What I'm posting here is
untested but it should give you an idea of what to do.

  .global _your_masked_blit
    .p2align 3
  _your_masked_blit:
    pushl %ebp
    movl %esp, %ebp
    pushl %esi
    pushl %edi
    
    cld                   /* copy from low addresses to high */
    movl $640*480, %ecx   /* number of bytes to copy */
    movl 8(%ebp),  %esi   /* source address in ESI */
    movl 12(%ebp), %edi   /* dest address in EDI */
  1:
    lodsb          /* read a byte from DS:ESI and increase ESI */
    cmpb $1, %al   /* is it transparent? */
    je 2f          /* if it is, jump to label `2' forwards */
    stosb          /* store the byte */
    loop 1b        /* decrease ECX and loop if not zero */
    jmp 1f         /* skip next few instructions if loop has ended */
  2:
    incl %edi      /* skip destination pixel */
    loop 1b        /* decrease ECX and loop if not zero */
  1:
    
    popl %edi
    popl %esi
    popl %ebp
    ret

Prototype:

    void your_masked_blit (void *source, void *dest);

We don't need to set DS and ES because they're already set,
according to gcc's calling conventions.  If you call this
routine from your own assembler code make sure you haven't
clobbered them.

This could probably be better optimised by reading four bytes at
a time -- if they're all transparent we can skip them all in one
go.  I think the effect of this would depend upon the sort of
image you draw.

As it is, play with alignment to see if it helps -- maybe
putting ".p2align 3, 0x90" before all the labels (though
aligning the last label wouldn't help much really).

-- 
george DOT foot AT merton DOT oxford DOT ac DOT uk

xu do tavla fo la lojban  --  http://xiron.pc.helsinki.fi/lojban/lojban.html

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019