Mail Archives: pgcc/1998/06/02/13:10:02
Here is the comparison of executable sizes and execution times of the old
version of GAMESS (General Atomic and Molecular Electronic Structure System)
on Pentium 166 MMX (Linux 2.0.34-pre12, libc 5.4.38 etc). I tried different
compilers from the gcc/pgcc family and different options. I am focused on
compilation options giving rather small executables, so -O4 and higher
have not been tested.
The program is very FPU intensive, written in FORTRAN. Everything has been
translated to C using "f2c -a". All executables are linked with the same
libgcc.a (from gcc 2.7.2.3). Only user times are given in the table below.
What has been tested?
1. Gcc version 2.7.2.3.
-m386 -O2 -malign-jumps=0 -malign-loops=0 -malign-functions=0
-malign-double -ffast-math
2. Old pgcc (gcc 2.7.2 based).
-mpentium -O3 -malign-jumps=0 -malign-loops=0 -malign-functions=0
-malign-double -ffast-math -fno-inline-functions -fno-omit-frame-pointer
3. Pgcc 1.0.2 without haifa.
-mpentium -O3 -malign-jumps=0 -malign-loops=0 -malign-functions=0
-malign-double -ffast-math -fno-inline-functions -fno-omit-frame-pointer
-fno-exceptions
4. Pgcc 1.0.2 with haifa.
-mpentium -O3 -malign-jumps=0 -malign-loops=0 -malign-functions=0
-malign-double -ffast-math -fno-inline-functions -fno-omit-frame-pointer
-fno-exceptions
5. Gcc version 2.7.2.3 (data aligning disabled).
-m386 -O2 -malign-jumps=0 -malign-loops=0 -malign-functions=0 -ffast-math
6. Pgcc 1.0.2 without haifa (code aligning enabled).
-mpentium -O3 -malign-jumps=2 -malign-loops=2 -malign-functions=2
-malign-double -ffast-math -fno-inline-functions -fno-omit-frame-pointer
-fno-exceptions
7. Pgcc 1.0.2 without haifa (strength-reduce disabled).
-mpentium -O3 -malign-jumps=0 -malign-loops=0 -malign-functions=0
-malign-double -ffast-math -fno-inline-functions -fno-omit-frame-pointer
-fno-exceptions -fno-strength-reduce
Variant 1 2 3 4 5 6 7
Executable
size 2425144 2437764 2510988 2521100 2394536 2529196 2476332
(bytes)
Execution
times (s) and the fastest variant
test 1 (7) 66.04 72.10 66.96 67.49 86.20 66.95 65.85
test 2 (4) 481.31 453.52 450.19 447.64 574.45 458.02 455.63
test 3 (7) 34.53 34.17 31.23 31.62 45.05 31.33 30.90
test 4 (6) 37.26 38.27 36.30 36.77 51.02 36.11 37.54
test 5 (3) 445.65 445.30 402.96 408.32 555.43 404.08 417.13
test 6 (3) 24.27 24.48 22.83 23.20 29.60 22.85 22.93
test 7 (4) 312.41 323.19 294.18 294.15 406.39 296.46 302.80
sum 1-7 1401.47 1391.03 1304.65 1309.19 1748.14 1315.80 1332.78
(6) (5) (1) (2) (7) (3) (4)
Some general conclusions (valid only for FPU intensive, f2c translated
code, of course):
1. Old gcc (2.7.2.3) gives smaller executables even with double aligning of
"double precision" variables.
2. This data aligning is critical for the efficiency of the code (as pointed
out in the pgcc FAQ). Look at column 5. On the other hand, programs
which are not CPU intensive could be compiled with these options plus
-fno-strength-reduce (gcc 2.7.2.3 of course). You will get small
executables - smaller than with egcs/pgcc.
3. It is not true, that old pgcc (2.7.2 based) gives faster code than the
new one. In some cases it can be even slower than the code produced by
gcc 2.7.2.3 with -malign-double.
4. The haifa scheduler does not give more efficient code (columns 3 and 4)
5. Code aligning is meaningless. Compare columns 3 and 6. Maybe the Big Theory
of Programming says "it is the most important thing for performance".
I got code bloat only.
6. The -fstrength-reduce thing gives code bloat, but the performance
is slightly better (column 3 without and 7 with -fno-strength-reduce).
7. There is no single optimal set of compilation options for all code.
There are execution paths, where other variants run faster than 3 (which
is best in general).
I will made similar comparison for a program, which doesn't use the FPU.
Bzip2 seems to be the good candidate.
Krzysztof
- Raw text -