X-pop3-spooler: POP3MAIL 2.1.0 b 4 980420 -bs- Message-Id: <m0yhYr0-000205C@chkw386.ch.pwr.wroc.pl> Date: Thu, 4 Jun 98 12:10 X-UIDL: 896997218.023 From: strasbur AT chkw386 DOT ch DOT pwr DOT wroc DOT pl (Krzysztof Strasburger) To: beastium-list AT Desk DOT nl Subject: Performance and code size - part 2 (still FP) Sender: Marc Lehmann <pcg AT goof DOT com> Status: RO Content-Length: 2154 Lines: 54 Hmm... I decided to test higher optimization levels for GAMESS, too. Some results are surprising. Ah, the -fno-omit-frame-pointer was not needed. It is enabled by -O5, not -O3. Read the FAQ! 8. Pgcc 1.0.2 without haifa, full -O3 -mpentium -O3 -malign-jumps=0 -malign-loops=0 -malign-functions=0 -malign-double -ffast-math -fno-exceptions 9. Pgcc 1.0.2 without haifa, full -O4 -mpentium -O4 -malign-jumps=0 -malign-loops=0 -malign-functions=0 -malign-double -ffast-math -fno-exceptions 10. Pgcc 1.0.2 without haifa, -O4 without -finline-functions -mpentium -O4 -malign-jumps=0 -malign-loops=0 -malign-functions=0 -malign-double -ffast-math -fno-inline-functions -fno-exceptions 11. Pgcc 1.0.2 without haifa, -O4 without -finline-functions, loop unrolling enabled -mpentium -O4 -malign-jumps=0 -malign-loops=0 -malign-functions=0 -malign-double -ffast-math -fno-inline-functions -fno-exceptions -funroll-all-loops Variant 8 9 10 11 Executable size 2524444 2569980 2556428 3365932 (bytes) Execution times (s) test 1 67.55 66.67 66.60 65.19 test 2 498.77 433.39 430.01 425.43 test 3 31.00 30.85 29.78 29.71 test 4 38.25 35.55 36.08 36.03 test 5 418.16 402.57 398.31 397.86 test 6 23.78 22.19 21.92 22.53 test 7 299.64 287.61 285.46 280.84 sum 1-7 1377.15 1278.83 1268.16 1257.59 The -O5 option gives incorrect code (no segfaults, but bad results), For -O5 -fno-omit-frame-pointer the executable doesn't differ from the executable produced by -O4. Fuction inlining _decreases_ performance (compare it with column 3 from my previous posting), -O4 is again better, -O4 -fno-inline-functions gives fastest code (not for short runs). The code bloat for -O4 is not as disastrous as expected, I would even say, that it is acceptable. Increase of the executable size by less than 2% for 2% speedup is still fair. I would even risk an opinion, that "code bloat" isn't the right word here. And finally, the loop unrolling... Hmm... It gives faster code (1-2%), but the executable is sooooo much larger than without the trick (over 30%)... Krzysztof