Mail Archives: pgcc/1998/06/04/11:23:54
Hmm... I decided to test higher optimization levels for GAMESS, too.
Some results are surprising. Ah, the -fno-omit-frame-pointer
was not needed. It is enabled by -O5, not -O3. Read the FAQ!
8. Pgcc 1.0.2 without haifa, full -O3
-mpentium -O3 -malign-jumps=0 -malign-loops=0 -malign-functions=0
-malign-double -ffast-math -fno-exceptions
9. Pgcc 1.0.2 without haifa, full -O4
-mpentium -O4 -malign-jumps=0 -malign-loops=0 -malign-functions=0
-malign-double -ffast-math -fno-exceptions
10. Pgcc 1.0.2 without haifa, -O4 without -finline-functions
-mpentium -O4 -malign-jumps=0 -malign-loops=0 -malign-functions=0
-malign-double -ffast-math -fno-inline-functions -fno-exceptions
11. Pgcc 1.0.2 without haifa, -O4 without -finline-functions, loop unrolling
enabled
-mpentium -O4 -malign-jumps=0 -malign-loops=0 -malign-functions=0
-malign-double -ffast-math -fno-inline-functions -fno-exceptions
-funroll-all-loops
Variant 8 9 10 11
Executable
size 2524444 2569980 2556428 3365932
(bytes)
Execution
times (s)
test 1 67.55 66.67 66.60 65.19
test 2 498.77 433.39 430.01 425.43
test 3 31.00 30.85 29.78 29.71
test 4 38.25 35.55 36.08 36.03
test 5 418.16 402.57 398.31 397.86
test 6 23.78 22.19 21.92 22.53
test 7 299.64 287.61 285.46 280.84
sum 1-7 1377.15 1278.83 1268.16 1257.59
The -O5 option gives incorrect code (no segfaults, but bad results),
For -O5 -fno-omit-frame-pointer the executable doesn't differ from the
executable produced by -O4.
Fuction inlining _decreases_ performance (compare it with column 3
from my previous posting), -O4 is again better, -O4 -fno-inline-functions
gives fastest code (not for short runs).
The code bloat for -O4 is not as disastrous as expected, I would even say,
that it is acceptable. Increase of the executable size by less than 2%
for 2% speedup is still fair. I would even risk an opinion, that "code bloat"
isn't the right word here.
And finally, the loop unrolling... Hmm... It gives faster code (1-2%), but the
executable is sooooo much larger than without the trick (over 30%)...
Krzysztof
- Raw text -