Date: Sun, 30 Nov 1997 10:57:57 -0800 (PST) Message-Id: <199711301857.KAA28236@adit.ap.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" To: Michael Bukin , djgpp AT delorie DOT com From: Nate Eldredge Subject: Re: binutils 2.8.1 performance = slow Precedence: bulk At 07:59 11/29/1997 +0600, Michael Bukin wrote: > Different versions of binutils produce different instructions >for .align directive. E.g. for the following function: > >int test (void) { > __asm__ __volatile__ (".align 4\n\tmovl $0, %eax\n\t.align 4"); > return 0; >} > >bnu2.7: >leal 0x0(%esi),%esi >leal 0x0(%esi,1),%esi > >bnu2.8.1: >leal 0x0(%esi),%esi >leal 0x0(%edi,1),%edi > >This difference may have effects on instructions pipelining or something. >I think, speed may depend on instructions near .align. I would not expect this to be a problem in GCC-compiled code (as opposed to inline asm). Looking at GCC assembler output, whenever GCC wants to align a particular piece of code, it writes `.align N,0x90'. This pads with the NOP instruction, which should be sufficiently fast. This avoids wondering what the assembler will decide to pad with. If the program which experienced the slowdown used inline asm, this could explain it. The fix is to explicitly say to pad with NOPs, like GCC does. Nate Eldredge eldredge AT ap DOT net