delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/1999/03/27/00:29:27

Sender: reimer AT schwan DOT e-technik DOT tu-ilmenau DOT de
Message-ID: <36FC6B66.D36CD9CE@e-technik.tu-ilmenau.de>
Date: Sat, 27 Mar 1999 06:23:50 +0100
From: Wolfgang Reimer <reimer AT e-technik DOT tu-ilmenau DOT de>
Organization: Technical Univ. of Ilmenau
X-Mailer: Mozilla 4.5 [en] (X11; I; Linux 2.1.125 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: pgcc AT delorie DOT com
Subject: [Fwd: Re: Aligning stack variables [8-byte operands]]
Reply-To: pgcc AT delorie DOT com

This is a multi-part message in MIME format.
--------------FABEDD86F58C5281C28DCE19
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

FYI

-------- Original Message --------
Subject: Re: Aligning stack variables [8-byte operands]
Date: Sat, 27 Mar 1999 06:19:22 +0100
From: Wolfgang Reimer <reimer AT e-technik DOT tu-ilmenau DOT de>
Organization: Technical Univ. of Ilmenau
To: John Wehle <john AT feith DOT com>
CC: holzloehner AT umbc DOT edu,Bernd Schmidt
<crux AT pool DOT informatik DOT rwth-aachen DOT de>,VP Developers
<devel AT virtualphotonics DOT com>,fftw AT theory DOT lcs DOT mit DOT edu
References: <199903270013 DOT TAA00443 AT jwlab DOT FEITH DOT COM>

John Wehle wrote:
> 
> >  I have tried compiling the FFTW package (a package from MIT for fast
> > Fourier transforms) with a custom option provided by the software
> > authors, that allowed to also align stack variables on 8-byte boundaries
> > -- the speed improvement was dramatic. I have been wondering for a long
> > time why it is so hard to align stack variables, when an option such as
> > -malign-double already can align globals?
> 
> Actually egcs always aligns constants, globals, statics for the x86 (assuming
> the object file format supports it) so it isn't necessary to use -malign-double
> for this purpose.  -malign-double changes the alignment rules for doubles
> which can improve the performance though it breaks binary compatibility.
> 
> The problem of aligning stack variables on 64 bit (or 128 bit) boundary is
> that there is no clear approach which is a guaranteed win.  The alignment
> can be accomplished by:
> 
>   1) Always keeping the stack aligned.
> 
>      a) Either by always preallocating the stack frame (including outgoing
>         argument space) in multiples of 64 bits (or 128 bits).  This means
>         that call arguments aren't pushed onto the stack, instead they are
>         moved to the stack.
> 
>           Negatives:  Move is bigger opcode than push.
> 
>           Pluses: Don't require adjustment at each call site.
> 
>      b) Or by adjusting the stack at each call site.
> 
>            Negatives: More instructions.
> 
>            Pluses: Can use push.
> 
>   2) Use a register to align the stack when necessary.
> 
>        Negatives: Burns a register.
> 
>        Pluses: Less instructions.
> 
> Keep in mind that integer code doesn't benefit from more alignment and
> that it's desirable to avoid impacting integer performance.  As with
> many things it is a question of tradeoffs.
> 
[ lines deleted ]
> 
> -- John

Hi John,

I created a small test program which impressively shows how important
the 8-byte stack alignment of 8-byte operands is to the Intel Pentium
floating point performance. The speed up of aligned visa unaligned code
is not only some ten percent (as it is usual in case of other Pentium
specific optimizations). On my dual PentiumPro/200 Linux box, properly
aligned code runs about 150% faster (speed ratio is about 2.5 !!!) than
misaligned code (I mean alignment of doubles on stack). And this is not
only true in the case of my special designed test program but also with
the FFTW (http://theory.lcs.mit.edu/~fftw) code which is our most
essential application when doing split-step simulations of optical
fibers. So for me (and other number crunching guys) the alignment of
8-byte operands (doubles) on stack would improve Intel P6 performance
much more than any other sophisticated Pentium optimization strategy.
That's why it should be the "Number ONE" on the TODO list of the egcs
compiler development group, IMHO.

With pgcc-2.91.60 19981201 (egcs-1.1.1 release) and previous versions
there is a compiler flag "-mstack-align-double" but unfortunately it
does not work properly. Sometime all doubles aligned properly and
sometimes all doubles are misaligned.
  
I attached my small test program, a tiny makefile, and the log file of
the output from the test run on my computer. The program can be built
and run by "make test". Additionally, I built a statically linked binary
(gzipped about 45k) which should run under all types of ELF Linux
systems. If somebody is interested, just let me know and I will send it.

Best regards,
-- 
Wolfgang Reimer (Dr.-Ing.)
 
T U I  --  Technical University of Ilmenau,  GERMANY, Thuringia
Address: TU Ilmenau, FEI/IKM, PF 100565, 98684 Ilmenau, GERMANY
http://ikmcip1.e-technik.tu-ilmenau.de  Phone: +49-3677-69-2619
mailto:reimer AT e-technik DOT tu-ilmenau DOT de   Fax  : +49-3677-69-1195
--------------FABEDD86F58C5281C28DCE19
Content-Type: text/plain; charset=us-ascii;
 name="stackalign.c"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="stackalign.c"

/* This small test program shows how extremely strong the run time with
 * Intel P5 and P6 depends on correct 8-Byte alignment of 8-Byte operands
 * (doubles) on stack. On my P6 the speed ratio (misaligned/aligned)
 * is up to 2.5!
 *
 * When compiling with -O2 or higher you must compile this code with 
 *   -fno-inline-functions (or otherwise the compiler will expand
 *      the function test_loop() inline) and
 *   -ffloat-store (or otherwise the compiler will use registers only
 *      instead of accessing the stack).
 *
 * Author: Wolfgang M. Reimer, mailto:reimer AT e-technik DOT tu-ilmenau DOT de
 *         Idea of stack alignment manipulation and checking is
 *         stolen from FFTW code (http://theory.lcs.mit.edu/~fftw)
 * Date  : 99/03/27
 */

#include <sys/time.h>
#include <unistd.h>
#include <stdio.h>

#define LOOPS 10000000L

double test_loop(int *aligned) {
  double a=1, b=2, c=2, d=1, e=0.5, f=0.5, g=1, h=2, i=2, j=1, k=0.5;
  int z;

  /* check double alignment */
  *aligned = ((((long) &k) & 0x7) == 0);
  for (z = 0; z < LOOPS; z++) {
    a *= k; b *= a; c *= b; d *= c; e *= d;
    f *= e; g *= f; h *= g; i *= h; j *= i; k *= j;
  }

  return k;
}

double empty_loop(int *aligned) {
  double a=1, b=2, c=2, d=1, e=0.5, f=0.5, g=1, h=2, i=2, j=1, k=0.5;
  int z;

  /* check double alignment */
  *aligned = ((((long) &k) & 0x7) == 0);
  for (z = 0; z < LOOPS; z++) { /* empty loop */ }

  return k;
}

double time_diff(struct timeval t1, struct timeval t2)
{
     struct timeval diff;

     diff.tv_sec = t1.tv_sec - t2.tv_sec;
     diff.tv_usec = t1.tv_usec - t2.tv_usec;
     /* normalize */
     while (diff.tv_usec < 0) {
          diff.tv_usec += 1000000L;
          diff.tv_sec -= 1;
     }

     return diff.tv_usec * 1e-6 + diff.tv_sec;
}

#define GET_TIME(timex,alignedx)  \
  printf("Running ... "); fflush(stdout); \
  \
  /* time the test loop */ \
  gettimeofday(&t1, 0); \
  d = test_loop(&(alignedx)); \
  gettimeofday(&t2, 0); \
  time = time_diff(t2, t1); \
  \
  /* time the empty loop */ \
  gettimeofday(&t1, 0); \
  d = empty_loop(&a); \
  gettimeofday(&t2, 0); \
  (timex) = time - time_diff(t2, t1); \
  printf("with %s aligned doubles, run time was %g seconds.\n", \
         alignment[(alignedx)], (timex)); \
  if ((alignedx) != a) \
    printf("Alignment between test and empty loop differs!\n");

int main(void) {
  struct timeval t1, t2;
  double d, time, time1, time2, ratio;
  int    aligned1, aligned2, a;
  char   alignment[2][7] = {{" oddly"}, {"evenly"}};

  /* hack to align stack oddly */
  if (!(((long) (__builtin_alloca(0))) & 0x7)) __builtin_alloca(4);
  GET_TIME(time1, aligned1);

  /* hack to align stack evenly */
  if (((long) (__builtin_alloca(0))) & 0x7) __builtin_alloca(4);
  GET_TIME(time2, aligned2);

  if (aligned1) ratio = time2 / time1; else ratio = time1 / time2;
  printf("The speed ratio (odd/even) is %g!\n", ratio);

  return 0;
}


--------------FABEDD86F58C5281C28DCE19
Content-Type: text/plain; charset=us-ascii;
 name="makefile"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="makefile"

# Author: Wolfgang M. Reimer, mailto:reimer AT e-technik DOT tu-ilmenau DOT de
# Date  : 99/03/27
# Run "make test" to build and run the alignment test

#OPTIMIZER	= -O6 -mcpu=pentiumpro -malign-double -fomit-frame-pointer
OPTIMIZER	= -O0 -mcpu=pentiumpro -malign-double -fomit-frame-pointer

TARGET	= stackalign
CC	= gcc
CFLAGS	= -Wall -Wno-unused $(OPTIMIZER) -fno-inline-functions -ffloat-store

all:	$(TARGET)

test:	$(TARGET)
	@echo
	@echo "*********************  System info  **************************"
	uname -a
	@echo
	@echo "********************** $(CC) Version ***************************"
	$(CC) -v
	@echo
	@echo "***************** Running $(TARGET)  ************************"
	./$(TARGET)

clean:
	$(RM) -rf $(TARGET).s $(TARGET).o

distclean: clean
	$(RM) -rf $(TARGET)


--------------FABEDD86F58C5281C28DCE19
Content-Type: text/plain; charset=us-ascii;
 name="stackalign.log"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="stackalign.log"

[reimer AT schwan stackalign]$ make test
gcc -Wall -Wno-unused -O0 -mcpu=pentiumpro -malign-double -fomit-frame-pointer -fno-inline-functions -ffloat-store    stackalign.c   -o stackalign 
*********************  System info  **************************
uname -a
Linux schwan.e-technik.tu-ilmenau.de 2.1.125 #1 SMP Fri Nov 6 20:46:08 CET 1998 i686 unknown


********************** gcc Version ***************************
gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/specs
gcc version egcs-2.91.66 19990314 (egcs-1.1.2 release)

***************** Running stackalign  ************************
./stackalign
Running ... with  oddly aligned doubles, run time was 14.2632 seconds.
Running ... with evenly aligned doubles, run time was 5.35753 seconds.
The speed ratio (odd/even) is 2.66227!


--------------FABEDD86F58C5281C28DCE19--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019