Mail Archives: pgcc/1998/07/06/15:40:25
On Sun, 5 Jul 1998, Andrea Arcangeli wrote:
>I suggest you to learn and use the gcc inline asm. The way gcc implements
>inline gcc is so far the best. It allow gcc to optimize out everything as
>best.
Yes, except that I happen to hate AT&T syntax ;)
>true since for example the eax register has not to be preserved at all. It
>would be nice to pass the last parameter of the function call in the eax
>register and the other parameters across the stack as usual. I think it
>would help a lot in performance. I' ll try to discover the improvement.
>Fast latency: 1007, normal latency 1307
[ not using EAX ] [ using EAX for arg pass ]
Interesting. I made some experiments too.
Test program: bzip2 0.1pl2
I added function prototypes for all functions in the program
(and removed those already existing). I told the compiler
to use different amount of register parameters and then
compiled the program and measured how long it took to
compress uncompressed LyX 0.12.0 source tar file (7997440 bytes)
to /dev/null.
My test system: Pentium 120 MHz, 24 MB main memory, 32 MB
swap, Linux 2.0.34, gcc version 2.7.2. There were no other
active programs background eating CPU-time, but the
hard disk rotated few times showing that not everything
fit in the disk cache.
The tests show no significant speedup until I use all
3 registers, in which case it's about 6% faster.
Question: why gcc doesn't allow more than 3 registers
to be used?? x86 would have 7 or at least 6 free registers.
Each case first shows the used compiler flags, and then
the test run was made 4 times. The times are in real-time
seconds (measured using my own program using RDTSC instruction)
The last number is length of the stripped ELF executable
(so case 4 gives smallest executables).
Patch for bzip and some more information is in file
http://www.ee.oulu.fi/~tuukkat/regpass-test.tar.gz
Considerations:
- All libc calls used conventional stack parameter passing
convention. This could be changed by breaking compatibility.
- Why kernel doesn't use register parameters?? It would be
ideal since it wouldn't break compatibility!
Can we think this test closes the case? I don't think. Especially
that the case 5 gives so much better performance than any other
case make me suspecting that a lot more testing (of different
real-life programs) is needed.
Surprise, surprise: case 2 is faster than case 1!
CASE 1: no register parameter passing. Compiler-selected
inline functions.
-O3 -fomit-frame-pointer -funroll-loops -g
clock count: 100.54
clock count: 100.46
clock count: 100.77
clock count: 100.64
total clock count: 402.41 / 4
65544
CASE 2: no register parameter passing. No inline functions.
-O2 -fomit-frame-pointer -funroll-loops -g
clock count: 99.609
clock count: 99.731
clock count: 99.508
clock count: 99.617
total clock count: 398.46 / 4
54200
CASE 3: 1 register argument. No inline functions.
__attribute__ (( regparm(1) ))
-O2 -fomit-frame-pointer -funroll-loops -g
clock count: 100.14
clock count: 99.742
clock count: 100.12
clock count: 99.701
total clock count: 399.7 / 4
54040
CASE 4: 2 register argument. No inline functions.
__attribute__ (( regparm(2) ))
-O2 -fomit-frame-pointer -funroll-loops -g
clock count: 99.725
clock count: 99.698
clock count: 99.44
clock count: 99.209
total clock count: 398.07 / 4
53896
CASE 5: 3 register argument. No inline functions.
__attribute__ (( regparm(3) ))
-O2 -fomit-frame-pointer -funroll-loops -g
clock count: 94.509
clock count: 94.295
clock count: 94.171
clock count: 94.328
total clock count: 377.3 / 4
53912
( I'm CCing this to pgcc list since I think those people
could be interested; maybe they could implement automatic
register passing for static functions?)
--
| Tuukka Toivonen <tuukkat AT ee DOT oulu DOT fi> [PGP public key
| Homepage: http://www.ee.oulu.fi/~tuukkat/ available]
| Try also finger -l tuukkat AT ee DOT oulu DOT fi
| Studying information engineering at the University of Oulu
+-----------------------------------------------------------
- Raw text -