Mail Archives: djgpp/1997/12/22/10:16:11
/*
Hi All:
I think that's an strong limitation (bug?) in djgpp's BNU 2.8.1 (I think 2.7
is the same):
Problem:
BNU can't align to a 64 (or more) bits boundary.
How to see it:
Just ask for .align 3 or .balign 8 and you'll get only 32 bits aligment.
(of course you can have enough luck to get 64 bits, not in my example/case)
How it affects the code:
My inline assembler shows that my CPU will execute some piece of code at
3 different speeds if I:
1) Align to 32 bits in the second "32 bits nibble" of a 64 bits boundary.
2) Align to 32 bits in the first "32 bits nibble" of a 64 bits boundary.
3) Align to 64 bits boundary.
As an example one of my routines gives:
1) 367 ticks 2) 311 ticks 3) 300 ticks (22% of difference!!!)
Note: I even suspect that in fact the right aligment for the Cx5x86 is 128
bits because the internal bus (cache to CPU) is 128 bits.
How did I saw it:
I was testing 3 versions of the routines and the speeds were totally
crazy, routines better optimized reported worst speed. After figuring out
that the speed was changing just commenting one of the routines I started
to find what a hell was going on.
The following code shows the missaligment: (Pepe==foo in my language ;-)
*/
#include <stdio.h>
int main(int argc, char *argv[])
{
unsigned char *pp;
asm ("
.align 3
Pepe:
movl $Pepe,%%eax
"
: "a="(pp) );
printf("%X (%d)\n",(unsigned)pp,((unsigned)pp) & 7);
return 0;
}
/*
The bug(?) is in the linker and not in AS.
I tried .balign 16 in a .s and then I compiled it with as and finally
decompiled it with objdump -d. The .o file is correctly aligned. But if I
make an exe with this file the aligment is totally broken.
Looking deeper I saw the source of the problem:
*LD aligns to 32 bits when joins .o files*
That's the problem, GAS starts all the .o files like if it will be start
in the address 0 (full aligned for anything) BUT as ld aligns each .o file
to 32 bits (like adding .align 2 at the end of the .s file) you can't get
more than this.
Now: Is there any way to configure that?
(The problem is hard because it can destroys Pentium optimizations)
Currently I'm using a workaround that is a little tricky:
1) I'm declaring all the functions that need aligment in a section (.setali)
For that we need the section attributed enabled.
2) After each function that I send to this section I add a macro that is
expanded to asm(".balign 16");
3) I tweasted djgpp.djl to put the .setali section in the code segment and
128 bits aligned with respect to the last section inside the code.
That works very well but needs a modified gcc and specials specs and
djgpp.djl files (specs to be sure that ld isn't using the built-in script).
As an advantage it wastes memory only in the special section and not in
the whole program.
SET
*/
------------------------------------ 0 --------------------------------
Visit my home page: http://www.geocities.com/SiliconValley/Vista/6552/
Salvador Eduardo Tropea (SET). (Electronics Engineer)
Alternative e-mail: set-sot AT usa DOT net - ICQ: 2951574
Address: Curapaligue 2124, Caseros, 3 de Febrero
Buenos Aires, (1678), ARGENTINA
TE: +(541) 759 0013
- Raw text -