delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1997/12/22/10:16:11

Message-Id: <m0xk3uw-000S2aC@inti.gov.ar>
Comments: Authenticated sender is <salvador AT natacha DOT inti DOT gov DOT ar>
From: "Salvador Eduardo Tropea (SET)" <salvador AT inti DOT gov DOT ar>
Organization: INTI
To: djgpp AT delorie DOT com
Date: Mon, 22 Dec 1997 12:18:13 +0000
MIME-Version: 1.0
Subject: Bad limitation in BNU Aligment

/*
Hi All:

  I think that's an strong limitation (bug?) in djgpp's BNU 2.8.1 (I think 2.7 
is the same):

Problem:
  BNU can't align to a 64 (or more) bits boundary.

How to see it:
  Just ask for .align 3 or .balign 8 and you'll get only 32 bits aligment.
(of course you can have enough luck to get 64 bits, not in my example/case)

How it affects the code:
  My inline assembler shows that my CPU will execute some piece of code at
3 different speeds if I:
1) Align to 32 bits in the second "32 bits nibble" of a 64 bits boundary.
2) Align to 32 bits in the first "32 bits nibble" of a 64 bits boundary.
3) Align to 64 bits boundary.
  As an example one of my routines gives:
1) 367 ticks 2) 311 ticks 3) 300 ticks (22% of difference!!!)

Note: I even suspect that in fact the right aligment for the Cx5x86 is 128
bits because the internal bus (cache to CPU) is 128 bits.

How did I saw it:
  I was testing 3 versions of the routines and the speeds were totally
crazy, routines better optimized reported worst speed. After figuring out
that the speed was changing just commenting one of the routines I started
to find what a hell was going on.

The following code shows the missaligment: (Pepe==foo in my language ;-)

*/
#include <stdio.h>

int main(int argc, char *argv[])
{
 unsigned char *pp;

 asm ("
.align 3
Pepe:
 movl $Pepe,%%eax
 "
 : "a="(pp) );

 printf("%X (%d)\n",(unsigned)pp,((unsigned)pp) & 7);
 return 0;
}
/*
  The bug(?) is in the linker and not in AS.
  I tried .balign 16 in a .s and then I compiled it with as and finally
decompiled it with objdump -d. The .o file is correctly aligned. But if I
make an exe with this file the aligment is totally broken.
  Looking deeper I saw the source of the problem:
  
*LD aligns to 32 bits when joins .o files*

  That's the problem, GAS starts all the .o files like if it will be start
in the address 0 (full aligned for anything) BUT as ld aligns each .o file
to 32 bits (like adding .align 2 at the end of the .s file) you can't get
more than this.
  Now: Is there any way to configure that?
  (The problem is hard because it can destroys Pentium optimizations)

  Currently I'm using a workaround that is a little tricky:
  
1) I'm declaring all the functions that need aligment in a section (.setali)
For that we need the section attributed enabled.
2) After each function that I send to this section I add a macro that is
expanded to asm(".balign 16");
3) I tweasted djgpp.djl to put the .setali section in the code segment and
128 bits aligned with respect to the last section inside the code.

  That works very well but needs a modified gcc and specials specs and
djgpp.djl files (specs to be sure that ld isn't using the built-in script).
  As an advantage it wastes memory only in the special section and not in
the whole program.

SET
*/
------------------------------------ 0 --------------------------------
Visit my home page: http://www.geocities.com/SiliconValley/Vista/6552/
Salvador Eduardo Tropea (SET). (Electronics Engineer)
Alternative e-mail: set-sot AT usa DOT net - ICQ: 2951574
Address: Curapaligue 2124, Caseros, 3 de Febrero
Buenos Aires, (1678), ARGENTINA
TE: +(541) 759 0013

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019