delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2000/03/08/22:30:41

From: buers AT gmx DOT de (Dieter Buerssner)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: [long] gcc performance and possible bug
Date: 9 Mar 2000 02:02:18 GMT
Lines: 27
Message-ID: <8a70n9$34mhh$1@fu-berlin.de>
References: <Pine DOT SUN DOT 3 DOT 91 DOT 1000307103019 DOT 21628J-100000 AT is> <8a65uu$39fkt$1 AT fu-berlin DOT de> <38C6B414 DOT 2D67E404 AT inti DOT gov DOT ar>
NNTP-Posting-Host: pec-44-99.tnt3.s2.uunet.de (149.225.44.99)
Mime-Version: 1.0
X-Trace: fu-berlin.de 952567338 3299889 149.225.44.99 (16 [17104])
X-Posting-Agent: Hamster/1.3.13.0
User-Agent: Xnews/03.02.04
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

salvador) wrote:

>K6 CPUs have a "bug" related to aligment. If some memory address is in a
>0xNNNNNC, you'll have a big penalty to read it. 0, 4 and 8 are ok,
>but C is the worst case (by far), double check you are not hiting this
>limitation.

Do you mean code alignment, data alignment or both?

Anyway, I edited the gcc -O2 -S output of the slower running version
of my program (with const), changed the .p2align 2 statements to 
.p2align 4 (16 byte), for zseed, mul and mwc32 (I think these
are all data and code alignments that could contribute to the
large performance difference), and recompiled. The program
was ran faster, but there was still an order of magnitude difference
between the const and the non const version.

I also double checked the alignments with fsdb and objdump (Thanks to
Hans-Bernhard Broeker, for pointing the objdump method out to me).
zseed, mul and mwc32 were 16 byte aligned.

If you have the time and the interest, please try to compile the 
source I sent and run the executable. It should take less than five 
minutes.

Regards,
Dieter

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019