delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/2000/01/29/02:13:06

Sender: chris AT mindspring DOT com
Message-ID: <38927310.2033EED4@ix.netcom.com>
Date: Fri, 28 Jan 2000 20:56:48 -0800
From: Chris Sears <cbsears AT ix DOT netcom DOT com>
X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.13-7mdk i686)
X-Accept-Language: en
MIME-Version: 1.0
To: hubicka AT atrey DOT karlin DOT mff DOT cuni DOT cz
CC: pgcc AT delorie DOT com
Subject: Re: pgcc and egcs alignment -- function, basic block and string
References: <38921CD6 DOT 2A725779 AT ix DOT netcom DOT com> <20000129032101 DOT A25630 AT atrey DOT karlin DOT mff DOT cuni DOT cz>
Reply-To: pgcc AT delorie DOT com

Jan,

thanks for your reply.

> > In pgcc some basic blocks (loops?) are being aligned.
> > These 16 byte blocks are ifetch blocks.
> > Quoting Agner Fog, "While aligning data is always important,
> > aligning code is not necessary on the PPlain and PMMX."
>
> The alignment (4,,7) is consistent with Intel Optimizing Manual's
> recommendation. Changing this value might require quite extensive testing to
> prove your statement. For Pentium, the alignment 4,,7 seems to be win
> according to my (simple) tests.

Is there a switch to turn this alignment off so that I could test it?
-mcode-align?  Or does this turn off alignment of entry points
as well?

> > In pgcc strings are being aligned to cache lines.
> > But is alignment even necessary for strings?
> It is. Consider memset/memcpy/strlen expanders. These can work
> much better when they know that destination is word size aligned.

I didn't quite understand this.  The string alignment now is to a
cache line.

        .file "ioport.c"
         .version "01.01"
        gcc2_compiled.:
        .section .rodata
        .LC0:
         .string "eip: %p\n"
         .align 32
        .LC1:
         .string "/home/chris/linux/include/asm/spinlock.h"

Admittedly, a cache line is word aligned as well,
but wouldn't .align 4 suffice to align to a word boundary?

>
> I will verify this tommorow and in case you are correct, I will fix this bug.
>
> (in both gas and gcc).

If possible could you send me email telling me what happened.

> > So in summary, I think that functions should be aligned to cache lines
> > and that basic blocks and strings should not be aligned at all.
> Gcc don't align every basic block. It uses alignments for top of loops, where
> the alignment to ifetch block is necesary. Top of loop appearing at the very
> end of ifetch blocks may cause stalls in the decoding process IMO.
> Second alignment is dont after barriers, where situation is in many points
> of view equivalent to function entry point.

The .p2align 4,,7 is deceptively misleading.  It could probably be better
read as .align 8 as the 7 represents a limit of 7 nops, which gas usually
replaces with a do nothing leal and a nop.

So given that this can happen in four cases in a 32 byte cache line:

    bytes 0-7 + 7 gets aligned to bytes 7-15        -- alignment not done
    bytes 8-15 + 7 gets aligned to 16                    -- alignment to 16
    bytes 16-23 + 7 gets aligned to 23-31            -- alignment not done
    bytes 24-31 + 7 get aligned to 32                    -- alignment to 32

So half of the time it isn't being aligned anyways.  In the second case,
it seems a waste since the icache line will be in the buffer.  No point.
In the fourth case, I can see a point, especially if there is an jmp
instruction and no nops will be executed.

> Aligning to 16 byte boundary can be quite good tradeoff between code size
> and cache line fetching effecienty. While function starting near end of
> cache line is catastrophical, function starting in the middle of it is not
> so bad.
> Again Intel Optimizing Manual recommends this. I believe Intel did some experiments
> before saying so.

16 byte alignment for functions trades memory against cache footprint.
I would strongly prefer cache and I would urge someone to look at this.
In this case, I wouldn't take Intel's word.

To summarize:

    word alignment for strings -- .align 4 not .align 32
    cache line alignment for functions -- .align 32 not .align 4 (egcs) or .align 16
pgcc
    change loop body alignment to only the fourth quarter of a cacheline
        .p2align probably can't do this -- not .p2align 4,,7


Chris Sears
cbsears AT ix DOT netcom DOT com

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019