X-pop3-spooler: POP3MAIL 2.1.0 b 4 980420 -bs-
Message-Id: <m0yjP4R-000208C@chkw386.ch.pwr.wroc.pl>
Date: Tue, 9 Jun 98 14:08 
From: strasbur AT chkw386 DOT ch DOT pwr DOT wroc DOT pl (Krzysztof Strasburger)
To: beastium-list AT Desk DOT nl
Subject: Bzip2 - comparison of gcc vs pgcc
Sender: Marc Lehmann <pcg AT goof DOT com>
Content-Length: 4134
Lines: 74


Comparison of the executable sizes and compression times for a large tar
file (32624640 bytes - linux sources + object files) for bzip2-0.1pl1.
Every variant of the executable has been liked with libgcc.a from gcc 2.7.2.3.
1. Gcc 2.7.2.3, -m386 -O2 -malign-jumps=0 -malign-loops=0 -malign-functions=0
   -fno-strength-reduce
2. Gcc 2.7.2.3, -m386 -O2 -malign-jumps=0 -malign-loops=0 -malign-functions=0
3. Pgcc 1.0.2, no haifa, -mpentium -O2 -malign-jumps=0 -malign-loops=0
   -malign-functions=0
4. Pgcc 1.0.2, no haifa, -mpentium -O2 -malign-jumps=0 -malign-loops=0
   -malign-functions=0 -fno-strength-reduce
5. Pgcc 1.0.2, no haifa, -mpentium -O3 -malign-jumps=0 -malign-loops=0
   -malign-functions=0
6. Pgcc 1.0.2, no haifa, -mpentium -O3 -malign-jumps=0 -malign-loops=0
   -malign-functions=0 -fno-inline-functions
6a.Pgcc 1.0.2, no haifa, -mpentium -O3 -malign-jumps=0 -malign-loops=0
   -malign-functions=0 -fno-inline-functions -fno-strength-reduce
7. Pgcc 1.0.2, no haifa, -mpentium -O4 -malign-jumps=0 -malign-loops=0
   -malign-functions=0
8. Pgcc 1.0.2, no haifa, -mpentium -O4 -malign-jumps=0 -malign-loops=0
   -malign-functions=0 -fno-inline-functions
8a.Pgcc 1.0.2, no haifa, -mpentium -O4 -malign-jumps=0 -malign-loops=0
   -malign-functions=0 -fno-inline-functions -fno-strength-reduce
9. Pgcc 1.0.2, no haifa, -mpentium -O4 -malign-jumps=0 -malign-loops=0
   -malign-functions=0 -fno-inline-functions -funroll-all-loops
10.Pgcc 1.0.2, no haifa, -mpentium -O4 -malign-jumps=0 -malign-loops=0
   -malign-functions=0 -funroll-all-loops
11.Pgcc 1.0.2 with haifa, -mpentium -O4 -malign-jumps=0 -malign-loops=0
   -malign-functions=0 -funroll-all-loops
12.Pgcc 1.0.2 with haifa, -mpentium -O4 -malign-jumps=0 -malign-loops=0
   -malign-functions=0 -funroll-all-loops -fno-inline-functions

Machine: Pentium 166 MMX
OS: linux-2.0.34
libc: 5.4.38


   Executa- |             execution times (seconds) - user only
   ble size |      level -1           level -5             level -9
   (bytes)  |  compr    decompr    compr     decompr    compr     decompr
1.  41808   | 167.66    40.33     184.87     45.13     199.93     45.66
2.  42304   | 173.75    40.68     189.03     45.43     203.85     45.94
3.  44420   | 168.78    42.87     179.12     47.14     193.21     47.66
4.  44036   | 166.72    41.39     180.00     47.08     192.85     47.48
5.  53844   | 163.09    42.48     173.02     47.28     186.83     47.33
6.  44324   | 167.09    42.69     178.59     46.76     190.71     47.33
6a. 43908   | 166.30    41.95     177.54     46.52     190.67     47.05
7.  56596   | 157.20    41.12     165.30     46.41     179.09     46.38
8.  46308   | 162.70    41.41     172.48     45.75     184.93     45.92
8a. 45764   | 164.42    41.23     172.53     45.54     186.06     45.47
9.  75428   | 151.59    40.18     161.68     45.13     176.30     45.27
10. 99508   | 150.21    40.86     163.66     45.44     177.69     45.48
11. 99476   | 147.11    39.38     162.86     44.13     176.66     44.68
12. 75396   | 149.11    39.35     161.22     44.10     175.99     44.34

The conclusions are pretty different than for the FPU intensive program
GAMESS. The option -fno-strength-reduce increases the performace of the
program (for lower optimization levels). Unfortunately, -fno-inline-functions
makes the program slower (different than GAMESS). Unrolling of loops gives
large speedup, too.
Decompression is pretty fast with gcc 2.7.2.3 - faster than with most
pgcc compiled executables.
I hate to say this, but those damned, bloated "9" and "10" things are fastest.
Even worse - all bloated variants are _noticeably_ faster than non bloated
ones, while doing the compression. On the other hand, mixing of
-finline-functions and -funroll-all-loops gives slightly faster code only
for the compression level -1.
And what about haifa?
The code (tried for fastest variants only) is a bit smaller - again a
difference in comparison with GAMESS. The program runs slightly faster - yet
another difference (but other set of compilation options was used)...
Eh, life isn't perfect...
I'm going back to my FPU intensive programs.
Krzysztof