X-pop3-spooler: POP3MAIL 2.1.0 b 4 980420 -bs-
Message-Id: <m0yhYr0-000205C@chkw386.ch.pwr.wroc.pl>
Date: Thu, 4 Jun 98 12:10 
X-UIDL: 896997218.023
From: strasbur AT chkw386 DOT ch DOT pwr DOT wroc DOT pl (Krzysztof Strasburger)
To: beastium-list AT Desk DOT nl
Subject: Performance and code size - part 2 (still FP)
Sender: Marc Lehmann <pcg AT goof DOT com>
Status: RO
Content-Length: 2154
Lines: 54

Hmm... I decided to test higher optimization levels for GAMESS, too.
Some results are surprising. Ah, the -fno-omit-frame-pointer
was not needed. It is enabled by -O5, not -O3. Read the FAQ!

8. Pgcc 1.0.2 without haifa, full -O3
   -mpentium -O3 -malign-jumps=0 -malign-loops=0 -malign-functions=0
   -malign-double -ffast-math -fno-exceptions
9. Pgcc 1.0.2 without haifa, full -O4
   -mpentium -O4 -malign-jumps=0 -malign-loops=0 -malign-functions=0
   -malign-double -ffast-math -fno-exceptions
10. Pgcc 1.0.2 without haifa, -O4 without -finline-functions
   -mpentium -O4 -malign-jumps=0 -malign-loops=0 -malign-functions=0
   -malign-double -ffast-math -fno-inline-functions -fno-exceptions
11. Pgcc 1.0.2 without haifa, -O4 without -finline-functions, loop unrolling
    enabled
   -mpentium -O4 -malign-jumps=0 -malign-loops=0 -malign-functions=0
   -malign-double -ffast-math -fno-inline-functions -fno-exceptions
   -funroll-all-loops

Variant		   8	   9	  10	  11

Executable
   size		2524444	2569980	2556428	3365932
 (bytes)

Execution
times (s)
 test 1		  67.55	  66.67	  66.60	  65.19
 test 2		 498.77	 433.39	 430.01	 425.43
 test 3		  31.00	  30.85   29.78	  29.71
 test 4		  38.25	  35.55	  36.08	  36.03
 test 5		 418.16	 402.57	 398.31	 397.86
 test 6		  23.78	  22.19	  21.92	  22.53
 test 7		 299.64	 287.61	 285.46	 280.84
 sum 1-7	1377.15	1278.83	1268.16	1257.59
		

The -O5 option gives incorrect code (no segfaults, but bad results),
For -O5 -fno-omit-frame-pointer the executable doesn't differ from the
executable produced by -O4.

Fuction inlining _decreases_ performance (compare it with column 3
from my previous posting), -O4 is again better, -O4 -fno-inline-functions
gives fastest code (not for short runs).

The code bloat for -O4 is not as disastrous as expected, I would even say,
that it is acceptable. Increase of the executable size by less than 2%
for 2% speedup is still fair. I would even risk an opinion, that "code bloat"
isn't the right word here.

And finally, the loop unrolling... Hmm... It gives faster code (1-2%), but the
executable is sooooo much larger than without the trick (over 30%)...

Krzysztof