Mail Archives: cygwin/1999/10/10/18:57:50
I've been experimenting in the hope of clearing up some of the confusion
about alignments. I'm running the cygwin 19990819 snapshot, but I don't
think that makes a difference. Here are my observations:
Cygwin does not provide a way to get 64-bit data alignment, except by Fortran
COMMON blocks or the equivalent. Some commercial compilers share this
"feature." I believe some say that the Microsoft ABI forbids any attempt to
improve on this, although this presents a significant performance problem.
128-bit alignment is required to get satisfactory performance with C long
double, but proposals to support any such storage in g77 have been shot down
for now.
The combination of gcc-2.95.1 and cygwin binutils did not configure code
alignments properly. Installing a recent binutils snapshot on cygwin and
re-configuring 2.95.1 produces the expected .p2align 4,,7 scheme in the code
generated by 2.95.1, but generates .align 16 with 2.96. Either is an
improvement over the code I obtained with the cygwin binutils. However,
since cygwin does not support 128-bit alignments, neither takes full effect
at link time. On my test cases, there is a net performance deficit of 5%
associated with improper code alignments, using the p2align 4,,7 code and
comparing with its performance on the same P II under linux. Individual
loops are affected by up to 30%, but some of what can be gained by changing
alignment in one place is nearly always lost somewhere else. The effect
seems not so large when running on a P III Xeon, but the Xeon box doesn't
have a decent timer for linux. I use the QueryPerformance..() in NT as I
normally do in W95, as clock() is not useful on that box. The results I get
under NT, W2K, and W95 are consistent, given that all possible background
processes have been shut off.
There is one aspect to .p2align which has been acknowledged as a bug in gcc,
which is that the p2align instruction is not placed at the top of the loop
body for those loops which have 1 to 4 stack adjustment instructions above
the point where the loop is entered the first time. This produces a
significant performance hit when it causes a loop to occupy an extra cache
line. I have corrected this by editing the .s in each case, in order to be
able to isolate the differences leading to the conclusions I have stated
above.
I have also supplied some of my own math functions in order to eliminate
differences caused by the different libraries in cygwin (newlib) and linux
(glibc-2.1). To me, this is somewhat of a sore point, that all the common
libraries continue to carry various deficiencies in the math functions
(mainly performance problems as far as newlib is concerned). I note that
certain commercial compilers provide their own math libraries, not because
theirs are better (they aren't necessarily), but because they isolate them
from changes in operating environment.
Is there any possibility of cygwin addressing the problems associated with
lack of 64- or 128-bit alignment, or is this simply one of the performance
deficits we must accept?
Tim
tprince AT computer DOT org
--
Want to unsubscribe from this list?
Send a message to cygwin-unsubscribe AT sourceware DOT cygnus DOT com
- Raw text -