delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/1999/05/20/04:49:37

Message-ID: <19990520104723.18724@atrey.karlin.mff.cuni.cz>
Date: Thu, 20 May 1999 10:47:23 +0200
From: Jan Hubicka <hubicka AT atrey DOT karlin DOT mff DOT cuni DOT cz>
To: pgcc AT delorie DOT com
Subject: Re: Benchmark PGCC vs EGCS on a K6-2
References: <373F3AA2 DOT A446D611 AT informatik DOT hu-berlin DOT de> <Pine DOT LNX DOT 4 DOT 10 DOT 9905181826020 DOT 1284-100000 AT data DOT mandrakesoft DOT com> <19990519105631 DOT 40676 AT atrey DOT karlin DOT mff DOT cuni DOT cz> <3743ADE8 DOT C938ADBB AT informatik DOT hu-berlin DOT de>
Mime-Version: 1.0
X-Mailer: Mutt 0.84
In-Reply-To: <3743ADE8.C938ADBB@informatik.hu-berlin.de>; from Jens-Uwe Rumstich on Thu, May 20, 1999 at 06:38:32AM +0000
Reply-To: pgcc AT delorie DOT com
X-Mailing-List: pgcc AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com

> Hi!
> 
> First, please donīt trust the numbers I posted at all. The switches
> "pgcc -mk6 -O3" and 
> "pgcc -mk6 -O4" produce the same executable, but the results had 3
> seconds difference. Too
> much to call these results reliable :-((
> 
> > About year ago I've done some tunning of egcs for K6-2. I've removed some of
> > K6-2 specific optimizations, because they seemed to produce slower code. There
> > seems to be important problem in K6 documentation. It recommends thinks that often
> > causes performance loss. Author of original K6 stuff for egcs just blindly followed
> > their recommendations so many of his changes were performance miss (especially changking
> > xor reg,reg to mov reg,0)
> 
> ooops... The mov is not faster?
> 
Only advantage of mov is reduced dependency on flags. But this advantage is not
high enought to mask code size increase and decoder slowdown caused by very
large opcode.
> > Many (not all) of this changes are in recent egcs snapshots (aka gcc 2.95.0). Because
> > I don't have any access to this CPU anymore, I would love to hear about your results with
> > this version of gcc.
> 
> Iīll try them out and write about them. 
> Do you know a way to get exact numbers? I still donīt know, why my
> results are that wrong :-(
Hmm... don't know. You might also try out egcs benchmark suite. It gets
lots of results and they are pretty exact (at least for me) and useable for
tunning the compiler. Take a look at egcs homepage to get it...

> 
> > K6 seems to have serious problems with decoding speed. I've made new haifa scheduler hooks for
> > decoding that worked quite well (I have also version for Pentium and PPro available, PPro
> > version is untested),
> 
> It seems to me, that the decoders of the K6 are not strong enough to
> feed all the execution
> units, so this is the bottleneck. One should probably try to output
> instructions, which
> result in 4 Risc-Ops per cycle. Means 2 short instructions, where each
> one is breaken into 2 Risc-Ops or a Long Instruction, which is broken
> into 4 RiscOps.
Well, this is quite hard to reach. IMO it is OK just to schedule code in
a way, that more complex (first decoder only) instruction are reached by
first scheduler and vector decoded instruction are placed to points, where
decoding of previous instruction was faster than execution so 2 cycle delay
will not change performance loss.
> In the PGCC-FAQ I read about an "recombining"-optimization, which seems
> to be intended to do exactly this. But it was marked as disabled,
> because it may slow down some code...

Recombining as far as I can remember only attempts to reverse riscify
optimization for Pentium CPU. This change caused performance loss due to reduced
pairing oportunities.
It should not affect non-riscified code anyway.
> 
> > On K6 it brought quite large speedups (-10 - 500%, usually about 10%), but changes necesarry
> > to i386.md are quite large so it would take lots of time to add them into gcc.
> 
> And would it make look even uglier, right ?? ;-)
Well, I personally think it made it look a bit better, because I've rewrote
many patterns in a way that instruction selection is clear to compiler (and
let me to decide, what instruction will be on the output using attributes).

Honza
> 
> > Honza
> 
> cu
> 	Jens-Uwe

-- 
                       OK. Lets make a signature file.
+-------------------------------------------------------------------------+
|        Jan Hubicka (Jan Hubi\v{c}ka in TeX) hubicka AT freesoft DOT cz         |
|         Czech free software foundation: http://www.freesoft.cz          |
|AA project - the new way for computer graphics - http://www.ta.jcu.cz/aa |
|  homepage: http://www.paru.cas.cz/~hubicka/, games koules, Xonix, fast  |
|  fractal zoomer XaoS, index of Czech GNU/Linux/UN*X documentation etc.  | 
+-------------------------------------------------------------------------+

- Raw text -


  webmaster     delorie software   privacy  
  Copyright Đ 2019   by DJ Delorie     Updated Jul 2019