Message-Id: Date: Fri, 4 Jun 99 10:38 From: strasbur AT chkw386 DOT ch DOT pwr DOT wroc DOT pl (Krzysztof Strasburger) To: pgcc AT delorie DOT com Subject: Re: Pgcc 1.1.3 - bad performance on P6 Reply-To: pgcc AT delorie DOT com Marc Lehmann wrote: >On Wed, Jun 02, 1999 at 09:14:00AM +0000, Krzysztof Strasburger wrote: >> The obvious remark is: the code produced by pgcc for P6 is suboptimal, >> but why high optimizations kill the performance instead of improving it? >Tuning pgcc for ppro is not yet finished. But I think the bigger effect >you see is that pgcc is tuned for integer performance. You might want >to try out the hints in the pgcc faq on improving fp-performance (Yes, >unfortunately you can not have both at the same time yet). Double precision variables are already double aligned and there is nothing more to unroll in the function "gausil". I repeated the test under different conditions to remove the side effect of the function "main". Double variables in main have been declared static and main.c has been compiled with gcc 2.7.2.3 -malign-double. Gausil.c has been compiled for _pentium_ and different version run on _pentium_ 166 with 2000000 steps (times averaged for three runs each, on idle machine); -malign-double -mstack-align-double (for pgcc) -malign-jumps=0 -malign-loops=0 -malign-functions=0 -ffast-math used everywhere -O5 = -O6 (same code) 1. gcc 2.7.2.3 (-m486, of course ;) -O2 : t=7.21s 2. pgcc 1.1.3 -O4 : t=7.16s 3. pgcc 1.1.3 -O6 : t=7.26s So, i repeat, -O5/6 kills the performance on P5, not only on P6. Let us look at ealier version of pgcc (1.0.3a). It gave only two different codes : -O2 = -O3 = -O4, -O5 = -O6 4. pgcc 1.0.3a -O(2,3,4) : t=7.05s 5. pgcc 1.0.3a -O(5,6) : t=7.16s Hmmm... High optimizations always killed FP performance. Old pgcc gave better FP code, than new - and this is sad. Let us look at the latest snapshot. Again, -O2 = -O3 = -O4 and -O5 = -O6 (of course, this is not a general rule). 4. pgcc 2.93.03 -O(2,3,4) : t=7.15s 5. pgcc 2.93.03 -O(5,6) : t=7.26s Eh... It isn't better (in this case only, of course; i had other programs which were faster with pgcc 2.93.03 than with pgcc 1.1.1/2 or 1.0.3). The clear winner is the old version of pgcc. I'm going back to it. I have a cluster of pentiums, which spend about 25% of their time in the function "gausil". I really appreciate the work, which EGCS/PGCC teams do _for free_. Please, don't treat my words as flames or complaining, but i think that an important part of the compiler goes in the wrong direction. Many programs benefit from good FP performance (not only scientific software). >also, you could try a snapshot (i.e. from cvs). 1.1.x was made more for >stableness than for performance (Yes, I know 1.1.3 is not the most stable >release we had). I tried the cvs server, but the transmission breaks very often, so i still don't have the cvs version. And pgcc 1.1.3 is the first acceptable of 1.1.x for me, because fast-math didn't work correctly with earlier versions (and the latest snapshot). Krzysztof