X-POP3-Rcpt: mlehmann AT universe DOT sgh-net DOT de 25 Jan 1998 12:15:00 +0100 (CET) From: Ronald Wahl X-Sender: rwa AT goliath DOT csn DOT tu-chemnitz DOT de To: Marc Lehmann cc: beastium-list AT Desk DOT nl Subject: Re: PGCC optimizing AMD K6? In-Reply-To: <19980125021449.38760@cerebro.laendle> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Sender: Marc Lehmann Status: RO X-Status: A Content-Length: 2147 Lines: 55 On Sun, 25 Jan 1998, Marc Lehmann wrote: > On Sat, Jan 24, 1998 at 11:50:49PM +0100, Ronald Wahl wrote: > > Since pgcc-980122 is out, can you verify that -ffast-math > > (w/o funroll-loops) slows down some integer benches? The neural net ben= ch > > still doesn't return if -funroll-loops or -funroll-all-loops is used. H= as > > anybody checked if this is a problem of egcs or only pgcc? Maybe we sho= ul >=20 > I haven't checked it myself, but it seems to work under egcs.. >=20 > It might be a egcs bug, or maybe a simple incompatibility between egcs & > pgcc, as you know, I'm debugging that /&$/$% unrolling code since a long > time..=20 keep on hacking ;-) > > PPS (for Marc): Since I've seen many fxch instructions in the assembly > > output of nbench I have to note that these will not > > improve performance like on a pentium. If it's possible > > we should remove these. Minimizing the number of fpu > > instructions should be one of the goals on a K6 since > > most of these have a latency of 2 cycles and need two > > cycles to execute. >=20 > hmm.. that probably makes loop unrolling useless (doing two calculations > independently requires fxch, due to the =A7%&$%=A7$%E$ x86 fpu architectu= re) yes, but actually the code produced by -funroll-loops is faster. Maybe nbench's fp benches include enough integer code so that loop unrolling will be a win. > We should be able to get rid of them by defining no parallelity for the > fp unit in the .md file, =2E..but I hope this doesn't mean that integer code cannot run in parallel with fp code... > but since no instructions are marked with an attribute to do this, this > won't have much of an effect.=20 Then we should marc^Hk the relevant instructions. Is there anybody here who will have a look at it? My time is limited and the .md file is to huge. ron --=20 \ Ronald Wahl --- rwa AT informatik DOT tu-chemnitz DOT de \ \ WWW: http://www.tu-chemnitz.de/~row \ \ Talk: rwa AT goliath DOT csn DOT tu-chemnitz DOT de \ \ PGP key available by finger to my email address \