delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/1998/01/25/12:15:00

X-POP3-Rcpt: mlehmann AT universe DOT sgh-net DOT de
25 Jan 1998 12:15:00 +0100 (CET) :
From: Ronald Wahl <Ronald DOT Wahl AT Informatik DOT TU-Chemnitz DOT DE>
X-Sender: rwa AT goliath DOT csn DOT tu-chemnitz DOT de
To: Marc Lehmann <pcg AT goof DOT com>
cc: beastium-list AT Desk DOT nl
Subject: Re: PGCC optimizing AMD K6?
In-Reply-To: <19980125021449.38760@cerebro.laendle>
Message-ID: <Pine.LNX.3.96.980125115906.11117A-100000@goliath.csn.tu-chemnitz.de>
MIME-Version: 1.0
Sender: Marc Lehmann <pcg AT goof DOT com>
Status: RO
X-Status: A
Lines: 55

On Sun, 25 Jan 1998, Marc Lehmann wrote:
> On Sat, Jan 24, 1998 at 11:50:49PM +0100, Ronald Wahl wrote:
> > Since pgcc-980122 is out, can you verify that -ffast-math
> > (w/o funroll-loops) slows down some integer benches? The neural net ben=
ch
> > still doesn't return if -funroll-loops or -funroll-all-loops is used. H=
as
> > anybody checked if this is a problem of egcs or only pgcc? Maybe we sho=
ul
>=20
> I haven't checked it myself, but it seems to work under egcs..
>=20
> It might be a egcs bug, or maybe a simple incompatibility between egcs &
> pgcc, as you know, I'm debugging that /&$/$% unrolling code since a long
> time..=20

keep on hacking ;-)

> > PPS (for Marc): Since I've seen many fxch instructions in the assembly
> >                 output of nbench I have to note that these will not
> >                 improve performance like on a pentium. If it's possible
> >                 we should remove these. Minimizing the number of fpu
> >                 instructions should be one of the goals on a K6 since
> >                 most of these have a latency of 2 cycles and need two
> >                 cycles to execute.
>=20
> hmm.. that probably makes loop unrolling useless (doing two calculations
> independently requires fxch, due to the =A7%&$%=A7$%E$ x86 fpu architectu=
re)

yes, but actually the code produced by -funroll-loops is faster. Maybe
nbench's fp benches include enough integer code so that loop unrolling
will be a win.

> We should be able to get rid of them by defining no parallelity for the
> fp unit in the .md file,

=2E..but I hope this doesn't mean that integer code cannot run in parallel
with fp code...

> but since no instructions are marked with an attribute to do this, this
> won't have much of an effect.=20

Then we should marc^Hk the relevant instructions. Is there anybody here
who will have a look at it? My time is limited and the .md file is to
huge.

ron

--=20
\ Ronald Wahl --- rwa AT informatik DOT tu-chemnitz DOT de   \
 \ WWW: http://www.tu-chemnitz.de/~row             \
  \ Talk: rwa AT goliath DOT csn DOT tu-chemnitz DOT de            \
   \ PGP key available by finger to my email address \


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019