X-pop3-spooler: POP3MAIL 2.1.0 b 4 980420 -bs- Message-Id: Date: Tue, 9 Jun 98 14:08 From: strasbur AT chkw386 DOT ch DOT pwr DOT wroc DOT pl (Krzysztof Strasburger) To: beastium-list AT Desk DOT nl Subject: Bzip2 - comparison of gcc vs pgcc Sender: Marc Lehmann Content-Length: 4134 Lines: 74 Comparison of the executable sizes and compression times for a large tar file (32624640 bytes - linux sources + object files) for bzip2-0.1pl1. Every variant of the executable has been liked with libgcc.a from gcc 2.7.2.3. 1. Gcc 2.7.2.3, -m386 -O2 -malign-jumps=0 -malign-loops=0 -malign-functions=0 -fno-strength-reduce 2. Gcc 2.7.2.3, -m386 -O2 -malign-jumps=0 -malign-loops=0 -malign-functions=0 3. Pgcc 1.0.2, no haifa, -mpentium -O2 -malign-jumps=0 -malign-loops=0 -malign-functions=0 4. Pgcc 1.0.2, no haifa, -mpentium -O2 -malign-jumps=0 -malign-loops=0 -malign-functions=0 -fno-strength-reduce 5. Pgcc 1.0.2, no haifa, -mpentium -O3 -malign-jumps=0 -malign-loops=0 -malign-functions=0 6. Pgcc 1.0.2, no haifa, -mpentium -O3 -malign-jumps=0 -malign-loops=0 -malign-functions=0 -fno-inline-functions 6a.Pgcc 1.0.2, no haifa, -mpentium -O3 -malign-jumps=0 -malign-loops=0 -malign-functions=0 -fno-inline-functions -fno-strength-reduce 7. Pgcc 1.0.2, no haifa, -mpentium -O4 -malign-jumps=0 -malign-loops=0 -malign-functions=0 8. Pgcc 1.0.2, no haifa, -mpentium -O4 -malign-jumps=0 -malign-loops=0 -malign-functions=0 -fno-inline-functions 8a.Pgcc 1.0.2, no haifa, -mpentium -O4 -malign-jumps=0 -malign-loops=0 -malign-functions=0 -fno-inline-functions -fno-strength-reduce 9. Pgcc 1.0.2, no haifa, -mpentium -O4 -malign-jumps=0 -malign-loops=0 -malign-functions=0 -fno-inline-functions -funroll-all-loops 10.Pgcc 1.0.2, no haifa, -mpentium -O4 -malign-jumps=0 -malign-loops=0 -malign-functions=0 -funroll-all-loops 11.Pgcc 1.0.2 with haifa, -mpentium -O4 -malign-jumps=0 -malign-loops=0 -malign-functions=0 -funroll-all-loops 12.Pgcc 1.0.2 with haifa, -mpentium -O4 -malign-jumps=0 -malign-loops=0 -malign-functions=0 -funroll-all-loops -fno-inline-functions Machine: Pentium 166 MMX OS: linux-2.0.34 libc: 5.4.38 Executa- | execution times (seconds) - user only ble size | level -1 level -5 level -9 (bytes) | compr decompr compr decompr compr decompr 1. 41808 | 167.66 40.33 184.87 45.13 199.93 45.66 2. 42304 | 173.75 40.68 189.03 45.43 203.85 45.94 3. 44420 | 168.78 42.87 179.12 47.14 193.21 47.66 4. 44036 | 166.72 41.39 180.00 47.08 192.85 47.48 5. 53844 | 163.09 42.48 173.02 47.28 186.83 47.33 6. 44324 | 167.09 42.69 178.59 46.76 190.71 47.33 6a. 43908 | 166.30 41.95 177.54 46.52 190.67 47.05 7. 56596 | 157.20 41.12 165.30 46.41 179.09 46.38 8. 46308 | 162.70 41.41 172.48 45.75 184.93 45.92 8a. 45764 | 164.42 41.23 172.53 45.54 186.06 45.47 9. 75428 | 151.59 40.18 161.68 45.13 176.30 45.27 10. 99508 | 150.21 40.86 163.66 45.44 177.69 45.48 11. 99476 | 147.11 39.38 162.86 44.13 176.66 44.68 12. 75396 | 149.11 39.35 161.22 44.10 175.99 44.34 The conclusions are pretty different than for the FPU intensive program GAMESS. The option -fno-strength-reduce increases the performace of the program (for lower optimization levels). Unfortunately, -fno-inline-functions makes the program slower (different than GAMESS). Unrolling of loops gives large speedup, too. Decompression is pretty fast with gcc 2.7.2.3 - faster than with most pgcc compiled executables. I hate to say this, but those damned, bloated "9" and "10" things are fastest. Even worse - all bloated variants are _noticeably_ faster than non bloated ones, while doing the compression. On the other hand, mixing of -finline-functions and -funroll-all-loops gives slightly faster code only for the compression level -1. And what about haifa? The code (tried for fastest variants only) is a bit smaller - again a difference in comparison with GAMESS. The program runs slightly faster - yet another difference (but other set of compilation options was used)... Eh, life isn't perfect... I'm going back to my FPU intensive programs. Krzysztof