delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2000/03/06/18:54:02

From: buers AT gmx DOT de (Dieter Buerssner)
Newsgroups: comp.os.msdos.djgpp
Subject: [long] gcc performance and possible bug
Date: 6 Mar 2000 22:25:39 GMT
Lines: 152
Message-ID: <8a1b91$33j7m$1@fu-berlin.de>
NNTP-Posting-Host: u-214.frankfurt3.ipdial.viaginterkom.de (62.180.18.214)
Mime-Version: 1.0
X-Trace: fu-berlin.de 952381539 3263734 62.180.18.214 (16 [17104])
X-Posting-Agent: Hamster/1.3.13.0
User-Agent: Xnews/03.02.04
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

With the attached code, I get very wierd performance results.
I tested the code with gcc 2952 and binutils 295 (djgpp203),
with gcc 2952 and binutils 281 (djgpp202) and with
gcc 260 and binutils 251 (djgpp1x) under plain DOS and in a WIN98
DOS window with the compiler options -O, -O2 and -O3.

In the following table, the first number is for function mwc32,
the second number for function mwc32c.

		    usec/call  (plain DOS)

		     -O              -O2             -O3
djgpp203   0.027   0.027   0.023   0.193   0.030   0.030
djgpp202   0.027   0.224   0.026   0.224   0.030   0.029
djgpp1x    0.070   0.250   0.080   0.236   0.053   0.239

		    usec/call  (WIN98)

		     -O              -O2             -O3
djgpp203   0.027   0.027   0.023   0.197   0.030   0.030
djgpp202   0.027   0.227   0.027   0.227   0.030   0.030
djgpp1x    0.070   0.250   0.081   0.240   0.053   0.244

You will note, that there sometimes is almost an order of
magnitude difference between the performance of mwc32 and
mwc32c. The only difference between these functions is
the type of the variable mul (static unsigned long vs.
static const unsigned long). mwc32c is always slower,
when there is a significant performance difference.

I tested djgpp203 more thoroughly. In this case, -O and
-O3 seem to result in the same performance. But with minor
changes in the source code, I also got this order of magnitude
difference with -O and -O3.

On linux, with gcc 2952 and binutils 295 I get consistanty
0.027 usec/call for mwc32 and mwc32c.

This code seems also to trigger a bug in gcc 2952.

Please look at the following sample output:

D:\RAND>gcc -O -Wall mwc32tst.c
D:\RAND>a
     mwc32: s=3051870873, used 3.626 CPU seconds 0.02702 usec/call
    mwc32c: s=3051870873, used 3.571 CPU seconds 0.02661 usec/call
D:\RAND>gcc -O2 -Wall mwc32tst.c
D:\RAND>a
    (null): s=3051870873, used 3.077 CPU seconds 0.02292 usec/call
    (null): s=3051870873, used 25.934 CPU seconds 0.19322 usec/call
    ^^^^^^

With -O3, everything works again. I get the (null) also under linux.
I do not get the (null), when compiling with gcc260.

This all was tested with a AMD K6-2.

Can you reproduce my wierd results? Is the some stupid bug
in my code?

Regards,
Dieter


/* mwc32tst.c */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

unsigned long speed_loop(unsigned long (*tr)(void), unsigned long n)
{
  unsigned long s;
  s = 0;
  do
    s+=tr();
  while (--n != 0);
  return s;
}

/* test the speed of function tr, take function call and loop
   overhead into account */
void speed(unsigned long (*tr)(void), unsigned long (*dummy)(void),
	   unsigned long n, const char *description)
{
  clock_t anf, anfdum;
  unsigned long s;
  anfdum = clock();
  speed_loop(dummy, n);
  anfdum = clock() - anfdum;
  anf = clock();
  s = speed_loop(tr, n);
  anf = clock() - anf;
  anf -= anfdum;
  printf("%10s: s=%lu, used %.3f CPU seconds %.5f usec/call\n", 
description,
	 s, (double)anf/CLOCKS_PER_SEC, 1e6/n*(double)anf/CLOCKS_PER_SEC);
}


#define CALLS (1UL << 27)  /* Tune this as appropriate */

/* avoid inlining of these functions */
unsigned long dum_rand(void);
unsigned long mwc32(void);
unsigned long mwc32c(void);

int main(void)
{
  speed(mwc32, dum_rand, CALLS, "mwc32");
  speed(mwc32c, dum_rand, CALLS, "mwc32c");
  return 0;
}

/* dummy function, for comparision */
unsigned long dum_rand(void)
{
  return 0UL;
}

typedef unsigned long long ul64;

/* Two implemantations of the multiply with carry RNG.
   The only difference is the type of mul */

static ul64 zseed = ((ul64)0x12345678UL<<32) | 0x87654321UL;

unsigned long mwc32(void)
{
  unsigned long l1, l2;
  ul64 res;
  static unsigned long mul=999996864UL;
  l1 = (unsigned long)(zseed & 0xffffffffUL);
  l2 = zseed>>32;
  res = l2+l1*(ul64)mul;
  zseed = res;
  return (unsigned long)(res & 0xffffffffUL);
}

static ul64 zseedc = ((ul64)0x12345678UL<<32) | 0x87654321UL;

unsigned long mwc32c(void)
{
  unsigned long l1, l2;
  ul64 res;
  static const unsigned long mul=999996864UL;
  l1 = (unsigned long)(zseedc & 0xffffffffUL);
  l2 = zseedc>>32;
  res = l2+l1*(ul64)mul;
  zseedc = res;
  return (unsigned long)(res & 0xffffffffUL);
}

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019