delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/2000/01/29/20:42:36

Date: Sun, 30 Jan 2000 01:14:44 +0100
From: Jan Hubicka <hubicka AT atrey DOT karlin DOT mff DOT cuni DOT cz>
To: Chris Sears <cbsears AT ix DOT netcom DOT com>
Cc: pgcc AT delorie DOT com
Subject: Re: pgcc and egcs alignment -- function, basic block and string
Message-ID: <20000130011444.A32728@atrey.karlin.mff.cuni.cz>
References: <38921CD6 DOT 2A725779 AT ix DOT netcom DOT com> <20000129032101 DOT A25630 AT atrey DOT karlin DOT mff DOT cuni DOT cz> <38927310 DOT 2033EED4 AT ix DOT netcom DOT com>
Mime-Version: 1.0
X-Mailer: Mutt 1.0i
In-Reply-To: <38927310.2033EED4@ix.netcom.com>; from cbsears@ix.netcom.com on Fri, Jan 28, 2000 at 08:56:48PM -0800
Reply-To: pgcc AT delorie DOT com
Errors-To: dj-admin AT delorie DOT com
X-Mailing-List: pgcc AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com

> 
> Is there a switch to turn this alignment off so that I could test it?
> -mcode-align?  Or does this turn off alignment of entry points
> as well?
There are switches -malign-jump and -malign-loops, that may do what do you want.
> 
> > > In pgcc strings are being aligned to cache lines.
> > > But is alignment even necessary for strings?
> > It is. Consider memset/memcpy/strlen expanders. These can work
> > much better when they know that destination is word size aligned.
> 
> I didn't quite understand this.  The string alignment now is to a
> cache line.
> 
>         .file "ioport.c"
>          .version "01.01"
>         gcc2_compiled.:
>         .section .rodata
>         .LC0:
>          .string "eip: %p\n"
>          .align 32
>         .LC1:
>          .string "/home/chris/linux/include/asm/spinlock.h"
> 
> Admittedly, a cache line is word aligned as well,
> but wouldn't .align 4 suffice to align to a word boundary?
Yes. We was discussing this recently with Richard and we probably will change this bit.
The rationale behind is to place string into as few cache lines as possible.
(when the string starts near end of cache line, it may go cross one extra).
But this  needs some tunning.
> 
> If possible could you send me email telling me what happened.
I didn't had time to test it yet (I am preparing for exam and I've dopne other
100Kb patch to function calling code ehh..)I will do that tommorow :)
and let you know.
> 
> > > So in summary, I think that functions should be aligned to cache lines
> > > and that basic blocks and strings should not be aligned at all.
> > Gcc don't align every basic block. It uses alignments for top of loops, where
> > the alignment to ifetch block is necesary. Top of loop appearing at the very
> > end of ifetch blocks may cause stalls in the decoding process IMO.
> > Second alignment is dont after barriers, where situation is in many points
> > of view equivalent to function entry point.
> 
> The .p2align 4,,7 is deceptively misleading.  It could probably be better
> read as .align 8 as the 7 represents a limit of 7 nops, which gas usually
> replaces with a do nothing leal and a nop.
> 
> So given that this can happen in four cases in a 32 byte cache line:
> 
>     bytes 0-7 + 7 gets aligned to bytes 7-15        -- alignment not done
>     bytes 8-15 + 7 gets aligned to 16                    -- alignment to 16
>     bytes 16-23 + 7 gets aligned to 23-31            -- alignment not done
>     bytes 24-31 + 7 get aligned to 32                    -- alignment to 32
> 
> So half of the time it isn't being aligned anyways.  In the second case,
> it seems a waste since the icache line will be in the buffer.  No point.
> In the fourth case, I can see a point, especially if there is an jmp
> instruction and no nops will be executed.
The second case bring similar sppedups to the fourth case at least on the CPUs
without of order execution. I will do some tests how does this behave on Pentium.
But I believe that code starting near the end of 16byte prefetch buffer will cause
stalls too.
> 
> > Aligning to 16 byte boundary can be quite good tradeoff between code size
> > and cache line fetching effecienty. While function starting near end of
> > cache line is catastrophical, function starting in the middle of it is not
> > so bad.
> > Again Intel Optimizing Manual recommends this. I believe Intel did some experiments
> > before saying so.
> 
> 16 byte alignment for functions trades memory against cache footprint.
> I would strongly prefer cache and I would urge someone to look at this.
> In this case, I wouldn't take Intel's word.
The problem is, that you need to align function to the largest alignment used
in the code. If you use 32 byte boundary based alignment for loops, you
need 32 byte alignment for functions as well. Gas puts alignment of whole section
based on the .align directive at the start and other alignments are done relative to
this value.
At least this is my understanding of thinks. I may be mistaken in this case.

Honza

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019