delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2000/03/08/14:26:00

From: buers AT gmx DOT de (Dieter Buerssner)
Newsgroups: comp.os.msdos.djgpp
Subject: Re: [long] gcc performance and possible bug
Date: 8 Mar 2000 18:25:35 GMT
Lines: 120
Message-ID: <8a65uu$39fkt$1@fu-berlin.de>
References: <Pine DOT SUN DOT 3 DOT 91 DOT 1000307103019 DOT 21628J-100000 AT is>
NNTP-Posting-Host: pec-1-96.tnt1.s2.uunet.de (149.225.1.96)
Mime-Version: 1.0
X-Trace: fu-berlin.de 952539935 3456669 149.225.1.96 (16 [17104])
X-Posting-Agent: Hamster/1.3.13.0
User-Agent: Xnews/03.02.04
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

Eli Zaretskii) wrote:

>Did you look at the generated assembly?  That could provide important
>clues.

I slightly changed my source, to make the difference even more
obvious.

This is the context diff off gcc -O2 -S output of the two versions.

*** const.s     Wed Mar  8 19:03:10 2000
--- nonconst.s  Wed Mar  8 19:04:52 2000
***************
*** 99,108 ****
  _zseed:
        .long -2023406815
        .long 305419896
- .text
        .p2align 2
  _mul.12:
        .long 999996864
        .p2align 2
  .globl _mwc32
  _mwc32:
--- 99,108 ----
  _zseed:
        .long -2023406815
        .long 305419896
        .p2align 2
  _mul.12:
        .long 999996864
+ .text
        .p2align 2
  .globl _mwc32
  _mwc32:

The only difference is, that the varible mul (_mul.12) is in the
text segment for const and in the data segment otherwise (as you
would suspect), and that the const version is much slower (factor
of ten!).

To exclude, that there may be a (hardware) problem with my system:
Could please anybody try, to reproduce my results by compiling
the following program with 

gcc -O2 mwc32tst.c

and running a.exe, then uncomment the const close to the end
of the listing and recompile and rerun. Please post or mail your
results, maybe including your processor and versions of gcc and
binutils (I have AMD K6-2, tried with various versions of gcc and
binutils, including gcc 2.95.2 and binutils 2.9.5).

Regards,
Dieter

/* mwc32tst.c */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define CALLS (1UL << 27)  /* Tune this as appropriate */

/* Call function pointed to by tr n times */
unsigned long speed_loop(unsigned long (*tr)(void), unsigned long n)
{
  unsigned long s;
  s = 0;
  do
    s+=tr();
  while (--n != 0);
  return s;
}

/* avoid inlining of these functions */
unsigned long dum_rand(void);
unsigned long mwc32(void);

/* test the speed of function mwc32, take function call and loop
   overhead into account */
int main(void)
{
  clock_t anf, anfdum;
  unsigned long s, n = CALLS;

  anfdum = clock();
  speed_loop(dum_rand, n);
  anfdum = clock() - anfdum;
  anf = clock();
  s = speed_loop(mwc32, n);
  anf = clock() - anf;
  anf -= anfdum;
  printf("s=%lu, used %.5f usec/call (w.o call overhead)\n",
         s, 1e6/n*(double)anf/CLOCKS_PER_SEC);
  return 0;
}

unsigned long dum_rand(void)
{
  return 0UL;
}

typedef unsigned long long ul64;

static ul64 zseed = ((ul64)0x12345678UL<<32) | 0x87654321UL;

/* Multiply with carry RNG */
unsigned long mwc32(void)
{
  unsigned long l1, l2;
  ul64 res;
  /* Uncommenting the const can make this function much slower,
     depending on compiler switches and the phase of the moon :-) */
  static /* const */ unsigned long mul=999996864UL;
  l1 = (unsigned long)(zseed & 0xffffffffUL);
  l2 = zseed>>32;
  res = l2+l1*(ul64)mul;
  zseed = res;
  return (unsigned long)(res & 0xffffffffUL);
}

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019