Mail Archives: pgcc/1999/05/20/04:49:37
> Hi!
>
> First, please donīt trust the numbers I posted at all. The switches
> "pgcc -mk6 -O3" and
> "pgcc -mk6 -O4" produce the same executable, but the results had 3
> seconds difference. Too
> much to call these results reliable :-((
>
> > About year ago I've done some tunning of egcs for K6-2. I've removed some of
> > K6-2 specific optimizations, because they seemed to produce slower code. There
> > seems to be important problem in K6 documentation. It recommends thinks that often
> > causes performance loss. Author of original K6 stuff for egcs just blindly followed
> > their recommendations so many of his changes were performance miss (especially changking
> > xor reg,reg to mov reg,0)
>
> ooops... The mov is not faster?
>
Only advantage of mov is reduced dependency on flags. But this advantage is not
high enought to mask code size increase and decoder slowdown caused by very
large opcode.
> > Many (not all) of this changes are in recent egcs snapshots (aka gcc 2.95.0). Because
> > I don't have any access to this CPU anymore, I would love to hear about your results with
> > this version of gcc.
>
> Iīll try them out and write about them.
> Do you know a way to get exact numbers? I still donīt know, why my
> results are that wrong :-(
Hmm... don't know. You might also try out egcs benchmark suite. It gets
lots of results and they are pretty exact (at least for me) and useable for
tunning the compiler. Take a look at egcs homepage to get it...
>
> > K6 seems to have serious problems with decoding speed. I've made new haifa scheduler hooks for
> > decoding that worked quite well (I have also version for Pentium and PPro available, PPro
> > version is untested),
>
> It seems to me, that the decoders of the K6 are not strong enough to
> feed all the execution
> units, so this is the bottleneck. One should probably try to output
> instructions, which
> result in 4 Risc-Ops per cycle. Means 2 short instructions, where each
> one is breaken into 2 Risc-Ops or a Long Instruction, which is broken
> into 4 RiscOps.
Well, this is quite hard to reach. IMO it is OK just to schedule code in
a way, that more complex (first decoder only) instruction are reached by
first scheduler and vector decoded instruction are placed to points, where
decoding of previous instruction was faster than execution so 2 cycle delay
will not change performance loss.
> In the PGCC-FAQ I read about an "recombining"-optimization, which seems
> to be intended to do exactly this. But it was marked as disabled,
> because it may slow down some code...
Recombining as far as I can remember only attempts to reverse riscify
optimization for Pentium CPU. This change caused performance loss due to reduced
pairing oportunities.
It should not affect non-riscified code anyway.
>
> > On K6 it brought quite large speedups (-10 - 500%, usually about 10%), but changes necesarry
> > to i386.md are quite large so it would take lots of time to add them into gcc.
>
> And would it make look even uglier, right ?? ;-)
Well, I personally think it made it look a bit better, because I've rewrote
many patterns in a way that instruction selection is clear to compiler (and
let me to decide, what instruction will be on the output using attributes).
Honza
>
> > Honza
>
> cu
> Jens-Uwe
--
OK. Lets make a signature file.
+-------------------------------------------------------------------------+
| Jan Hubicka (Jan Hubi\v{c}ka in TeX) hubicka AT freesoft DOT cz |
| Czech free software foundation: http://www.freesoft.cz |
|AA project - the new way for computer graphics - http://www.ta.jcu.cz/aa |
| homepage: http://www.paru.cas.cz/~hubicka/, games koules, Xonix, fast |
| fractal zoomer XaoS, index of Czech GNU/Linux/UN*X documentation etc. |
+-------------------------------------------------------------------------+
- Raw text -