Mail Archives: djgpp/2000/03/06/18:54:02
From: | buers AT gmx DOT de (Dieter Buerssner)
|
Newsgroups: | comp.os.msdos.djgpp
|
Subject: | [long] gcc performance and possible bug
|
Date: | 6 Mar 2000 22:25:39 GMT
|
Lines: | 152
|
Message-ID: | <8a1b91$33j7m$1@fu-berlin.de>
|
NNTP-Posting-Host: | u-214.frankfurt3.ipdial.viaginterkom.de (62.180.18.214)
|
Mime-Version: | 1.0
|
X-Trace: | fu-berlin.de 952381539 3263734 62.180.18.214 (16 [17104])
|
X-Posting-Agent: | Hamster/1.3.13.0
|
User-Agent: | Xnews/03.02.04
|
To: | djgpp AT delorie DOT com
|
DJ-Gateway: | from newsgroup comp.os.msdos.djgpp
|
Reply-To: | djgpp AT delorie DOT com
|
With the attached code, I get very wierd performance results.
I tested the code with gcc 2952 and binutils 295 (djgpp203),
with gcc 2952 and binutils 281 (djgpp202) and with
gcc 260 and binutils 251 (djgpp1x) under plain DOS and in a WIN98
DOS window with the compiler options -O, -O2 and -O3.
In the following table, the first number is for function mwc32,
the second number for function mwc32c.
usec/call (plain DOS)
-O -O2 -O3
djgpp203 0.027 0.027 0.023 0.193 0.030 0.030
djgpp202 0.027 0.224 0.026 0.224 0.030 0.029
djgpp1x 0.070 0.250 0.080 0.236 0.053 0.239
usec/call (WIN98)
-O -O2 -O3
djgpp203 0.027 0.027 0.023 0.197 0.030 0.030
djgpp202 0.027 0.227 0.027 0.227 0.030 0.030
djgpp1x 0.070 0.250 0.081 0.240 0.053 0.244
You will note, that there sometimes is almost an order of
magnitude difference between the performance of mwc32 and
mwc32c. The only difference between these functions is
the type of the variable mul (static unsigned long vs.
static const unsigned long). mwc32c is always slower,
when there is a significant performance difference.
I tested djgpp203 more thoroughly. In this case, -O and
-O3 seem to result in the same performance. But with minor
changes in the source code, I also got this order of magnitude
difference with -O and -O3.
On linux, with gcc 2952 and binutils 295 I get consistanty
0.027 usec/call for mwc32 and mwc32c.
This code seems also to trigger a bug in gcc 2952.
Please look at the following sample output:
D:\RAND>gcc -O -Wall mwc32tst.c
D:\RAND>a
mwc32: s=3051870873, used 3.626 CPU seconds 0.02702 usec/call
mwc32c: s=3051870873, used 3.571 CPU seconds 0.02661 usec/call
D:\RAND>gcc -O2 -Wall mwc32tst.c
D:\RAND>a
(null): s=3051870873, used 3.077 CPU seconds 0.02292 usec/call
(null): s=3051870873, used 25.934 CPU seconds 0.19322 usec/call
^^^^^^
With -O3, everything works again. I get the (null) also under linux.
I do not get the (null), when compiling with gcc260.
This all was tested with a AMD K6-2.
Can you reproduce my wierd results? Is the some stupid bug
in my code?
Regards,
Dieter
/* mwc32tst.c */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
unsigned long speed_loop(unsigned long (*tr)(void), unsigned long n)
{
unsigned long s;
s = 0;
do
s+=tr();
while (--n != 0);
return s;
}
/* test the speed of function tr, take function call and loop
overhead into account */
void speed(unsigned long (*tr)(void), unsigned long (*dummy)(void),
unsigned long n, const char *description)
{
clock_t anf, anfdum;
unsigned long s;
anfdum = clock();
speed_loop(dummy, n);
anfdum = clock() - anfdum;
anf = clock();
s = speed_loop(tr, n);
anf = clock() - anf;
anf -= anfdum;
printf("%10s: s=%lu, used %.3f CPU seconds %.5f usec/call\n",
description,
s, (double)anf/CLOCKS_PER_SEC, 1e6/n*(double)anf/CLOCKS_PER_SEC);
}
#define CALLS (1UL << 27) /* Tune this as appropriate */
/* avoid inlining of these functions */
unsigned long dum_rand(void);
unsigned long mwc32(void);
unsigned long mwc32c(void);
int main(void)
{
speed(mwc32, dum_rand, CALLS, "mwc32");
speed(mwc32c, dum_rand, CALLS, "mwc32c");
return 0;
}
/* dummy function, for comparision */
unsigned long dum_rand(void)
{
return 0UL;
}
typedef unsigned long long ul64;
/* Two implemantations of the multiply with carry RNG.
The only difference is the type of mul */
static ul64 zseed = ((ul64)0x12345678UL<<32) | 0x87654321UL;
unsigned long mwc32(void)
{
unsigned long l1, l2;
ul64 res;
static unsigned long mul=999996864UL;
l1 = (unsigned long)(zseed & 0xffffffffUL);
l2 = zseed>>32;
res = l2+l1*(ul64)mul;
zseed = res;
return (unsigned long)(res & 0xffffffffUL);
}
static ul64 zseedc = ((ul64)0x12345678UL<<32) | 0x87654321UL;
unsigned long mwc32c(void)
{
unsigned long l1, l2;
ul64 res;
static const unsigned long mul=999996864UL;
l1 = (unsigned long)(zseedc & 0xffffffffUL);
l2 = zseedc>>32;
res = l2+l1*(ul64)mul;
zseedc = res;
return (unsigned long)(res & 0xffffffffUL);
}
- Raw text -