delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/1999/03/12/18:20:04

Date: Fri, 12 Mar 1999 18:25:07 -0500 (EST)
From: "Dan Melomedman (free video eden)" <danm AT recomnet DOT recomnet DOT net>
To: Henrik Berglund SdU <pgcc AT delorie DOT com>
Subject: Re: AMDK6 optimized kernel and others
In-Reply-To: <Pine.GSO.4.05.9903121429040.12114-100000@legolas.mdh.se>
Message-ID: <Pine.BSF.3.96.990312182057.6560A-100000@recomnet.recomnet.net>
MIME-Version: 1.0
Reply-To: pgcc AT delorie DOT com
X-Mailing-List: pgcc AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com


On Fri, 12 Mar 1999, Henrik Berglund  SdU wrote:

> On 12 Mar 1999, Michael Hanke wrote:
> 
> > On Thu, 11 Mar 1999, Dan Melomedman (free video eden) wrote:
> > 
> > > without size optimization for K6. Also I noted that gzip compiled for
> > > pentium is slower that gzip compiled for amdk6 on amd machine, this kinda
> > > shows that amdk6 optimization actually works quite nicely. I use Stampede
> > This note gives me the opportunity to ask about the real gain of pgcc
> > on AMD chips. I have an old K5 processor. Since I am mainly
> > interested in scientific computing, I would like to know the possible
> > gain for fpu intense applications (e.g. BLAS). And the best possible
> > flags (IEEE arithmetic is essential!). Recently, I am using gcc 2.7.2
> > with -m486. Moreover, most programs are
> > written in FORTRAN. Is there a pg77 available or should I resort to
> > f2c?
> 
> I have notised that the 1.1.1 release of pgcc optimises better when it
> comes to float than the snapshot but maybe a bit slower at int.
> 
> the best flags for good float are 
> -O6 -march=amdk6 -funroll-all-loops -fforce-addr   
> i think.
> 
> -----------------------------------------------------------------------------
> Henrik DOT Berglund AT mds DOT mdh DOT se 
> http://www.mds.mdh.se/~adb94hbd/
> 
> 
> 

If you are using K5, you probably can't get much out of it anyway. Older
Intel Pentiums are much better at FPU stuff, and Cyrix totally sucks at
it. BTW gcc manual says that -funroll-all-loops generally results in
slower code. Don't know about that. Also wonder if you know about the
difference -mcpu=amdk6 makes together with -march=amdk6 and if -DCPU=586
or -DCPU=686 combined with amdk6 switches helps any.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019