delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1995/06/23/21:02:08

Xref: news-dnh.mv.net comp.os.msdos.djgpp:546
Path: news-dnh.mv.net!mv!news10.sprintlink.net!news.sprintlink.net!howland.reston.ans.net!agate!library.ucla.edu!news.bc.net!unixg.ubc.ca!freenet.vancouver.bc.ca!rdc
From: rdc AT freenet DOT vancouver DOT bc DOT ca (Robert Clark)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: SSSsPPPpEEEeDDDd !!!
Date: 23 Jun 1995 11:57:50 GMT
Organization: Vancouver Regional FreeNet
Lines: 79
References: <1995Jun19 DOT 134819 DOT 15176 AT ludens>
Nntp-Posting-Host: localhost
To: djgpp AT sun DOT soe DOT clarkson DOT edu
Dj-Gateway: from newsgroup comp.os.msdos.djgpp

 I'm giving you the benefit of the doubt that this is a djGCC/GAS question
and not a post to the wrong group ...

xxx AT ludens DOT elte DOT hu wrote:
: I'm an asm-programmer, who has some problem.
: 			mov	edx,xxx
: 			mov	ebx,yyy		;abs(xxx-yyy) is big (i.e.>5000)
: 			mov	ecx,100000
: 			mov	al,1
: 		align	16
: 		c1:
: 				mov	[edx],al
: 				mov	[ebx],al	;xxxx
: 				inc	edx
: 				inc	ebx
: 			dec	ecx
: 			jnz	c1

: 	This code runs very slow.
: 	Remove that line which marked ;xxxx! (sorry, I do NOT speak...)
: 	Run this! 	It's fast.
: 	It's OK, but what's the matter with the original code?
: 	Why does it run so slowly????

 gcc optimization can only get so tricky, then it's your turn ...

 You are over accessing the buss ...


 Try: [use '386 (not '86) code and un-roll (and split-up) loop]
               mov     esi,xxx         ;abs(xxx) is big (i.e.>5000)
               mov     ecx,100000 / 4  ; 1/4 the amount
               mov     eax,01010101h   ; combine
;               mov     eax,00000001h  ; OR _is_ this what you really wanted?
       align   32
       c1:
                       mov     [esi],eax  ; move _4_ bytes at once
                       inc     esi
                       dec     ecx
               jnz     c1
               mov     edi,yyy + (100000 / 4)  ;abs(yyy) is big (i.e.>5000)
               mov     ecx,100000 / 4  ; 1/4 the amount
               std
               repnz   movsd           ; move esi[] -> edi[] ecx times (backward)

 This optimization is 'off-the top of my head' and IS _untested_ !
 It only takes 13 lines of code (as opposed to your 12) and uses DOUBLEWORDS
(since you DID say your program was 'p-mode')

 _YOU_ (and I don't know why GCC did not do this for you) might FIRST wish
to try the OBVIOUS re-write that avoids accessing the buss too often;
I've ONLY included the relevant lines NOT the 'whole' section this time.


       c1:
                mov     [edx],al
                inc     edx
                mov     [ebx],al
                inc     ebx
        dec     ecx


 The second example is simpler and should only take 1 second to do using
QEDIT (CTRL-Y, DNARROW, CTRL-U), the first should be faster (remember the
first IS NOT tested, the second IS obvious {"can't fail"} ...


 You could also write code using the loopnz instruction but I'll spare you.


 Since I'm posting anyways (a FAMOUS quote of Bill !) let me mention
that I'm pleased to only get 10-15 e-mails a day now since I left the
djgpp-l and moved over to the news ...

--
Robert Clark
RDC AT freenet DOT vancouver DOT bc DOT ca  UNIX(r) System V Release 4.0  [142.103.106.2]
http://www.freenet.vancouver.bc.ca    incoming:// 49 15 00 N   123 07 00 W
                                    .

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019