delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/1998/01/25/02:14:49

X-POP3-Rcpt: mlehmann AT universe DOT sgh-net DOT de
Message-ID: <19980125021449.38760@cerebro.laendle>
25 Jan 1998 02:14:49 +0100 :
From: Marc Lehmann <pcg AT goof DOT com>
To: beastium-list AT Desk DOT nl
Subject: Re: PGCC optimizing AMD K6?
References: <Pine DOT LNX DOT 3 DOT 96 DOT 980121222144 DOT 998A-100000 AT uggae DOT rhein-neckar DOT de> <Pine DOT LNX DOT 3 DOT 96 DOT 980124232801 DOT 6485A-100000 AT goliath DOT csn DOT tu-chemnitz DOT de>
Mime-Version: 1.0
X-Mailer: Mutt 0.88
In-Reply-To: <Pine.LNX.3.96.980124232801.6485A-100000@goliath.csn.tu-chemnitz.de>; from Ronald Wahl on Sat, Jan 24, 1998 at 11:50:49PM +0100
X-Operating-System: Linux version 2.1.79 (root AT cerebro) (gcc version pgcc-2.91.03 971225 (gcc-2.8.0))
Status: RO
Lines: 52

On Sat, Jan 24, 1998 at 11:50:49PM +0100, Ronald Wahl wrote:
> On Wed, 21 Jan 1998, Holger Burbach wrote:
> 
> > On Wed, 21 Jan 1998, Ronald Wahl wrote:
> > 
> > > No test w/o -funroll-loops? I've had discovered that it will result in
> > > slower code. Maybe this has changed with pgcc/egcs-980115 but it would be
> > > nice to see the results.
> > 
> > Okay, here they are!
> > [...]
> 
> Thanx...
> 
> Since pgcc-980122 is out, can you verify that -ffast-math
> (w/o funroll-loops) slows down some integer benches? The neural net bench
> still doesn't return if -funroll-loops or -funroll-all-loops is used. Has
> anybody checked if this is a problem of egcs or only pgcc? Maybe we shoul

I haven't checked it myself, but it seems to work under egcs..

It might be a egcs bug, or maybe a simple incompatibility between egcs
& pgcc, as you know, I'm debugging that /&$/$% unrolling code since a long time..

> PS: If you send benches it would be nice to see how -ffast-math influences
>     the results. At the moment it's not generally a win and seem to slow
>     down integer code (how this?).

good question ;)

> PPS (for Marc): Since I've seen many fxch instructions in the assembly
>                 output of nbench I have to note that these will not
>                 improve performance like on a pentium. If it's possible
>                 we should remove these. Minimizing the number of fpu
>                 instructions should be one of the goals on a K6 since
>                 most of these have a latency of 2 cycles and need two
>                 cycles to execute.

hmm.. that probably makes loop unrolling useless (doing two calculations
independently requires fxch, due to the §%&$%§$%E$ x86 fpu architecture)

We should be able to get rid of them by defining no parallelity for the
fp unit in the .md file, but since no instructions are marked with
an attribute to do this, this won't have much of an effect.

      -----==-                                              |
      ----==-- _                                            |
      ---==---(_)__  __ ____  __       Marc Lehmann       +--
      --==---/ / _ \/ // /\ \/ /       pcg AT goof DOT com       |e|
      -=====/_/_//_/\_,_/ /_/\_\                          --+
    The choice of a GNU generation                        |
                                                          |

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019