X-POP3-Rcpt: mlehmann AT universe DOT sgh-net DOT de Message-ID: <19980125021449.38760@cerebro.laendle> 25 Jan 1998 02:14:49 +0100 From: Marc Lehmann To: beastium-list AT Desk DOT nl Subject: Re: PGCC optimizing AMD K6? References: Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Mailer: Mutt 0.88 In-Reply-To: ; from Ronald Wahl on Sat, Jan 24, 1998 at 11:50:49PM +0100 X-Operating-System: Linux version 2.1.79 (root AT cerebro) (gcc version pgcc-2.91.03 971225 (gcc-2.8.0)) Status: RO Content-Length: 2365 Lines: 52 On Sat, Jan 24, 1998 at 11:50:49PM +0100, Ronald Wahl wrote: > On Wed, 21 Jan 1998, Holger Burbach wrote: > > > On Wed, 21 Jan 1998, Ronald Wahl wrote: > > > > > No test w/o -funroll-loops? I've had discovered that it will result in > > > slower code. Maybe this has changed with pgcc/egcs-980115 but it would be > > > nice to see the results. > > > > Okay, here they are! > > [...] > > Thanx... > > Since pgcc-980122 is out, can you verify that -ffast-math > (w/o funroll-loops) slows down some integer benches? The neural net bench > still doesn't return if -funroll-loops or -funroll-all-loops is used. Has > anybody checked if this is a problem of egcs or only pgcc? Maybe we shoul I haven't checked it myself, but it seems to work under egcs.. It might be a egcs bug, or maybe a simple incompatibility between egcs & pgcc, as you know, I'm debugging that /&$/$% unrolling code since a long time.. > PS: If you send benches it would be nice to see how -ffast-math influences > the results. At the moment it's not generally a win and seem to slow > down integer code (how this?). good question ;) > PPS (for Marc): Since I've seen many fxch instructions in the assembly > output of nbench I have to note that these will not > improve performance like on a pentium. If it's possible > we should remove these. Minimizing the number of fpu > instructions should be one of the goals on a K6 since > most of these have a latency of 2 cycles and need two > cycles to execute. hmm.. that probably makes loop unrolling useless (doing two calculations independently requires fxch, due to the §%&$%§$%E$ x86 fpu architecture) We should be able to get rid of them by defining no parallelity for the fp unit in the .md file, but since no instructions are marked with an attribute to do this, this won't have much of an effect. -----==- | ----==-- _ | ---==---(_)__ __ ____ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / pcg AT goof DOT com |e| -=====/_/_//_/\_,_/ /_/\_\ --+ The choice of a GNU generation | |