Mailing-List: contact cygwin-help AT sourceware DOT cygnus DOT com; run by ezmlm List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT sourceware DOT cygnus DOT com Delivered-To: mailing list cygwin AT sourceware DOT cygnus DOT com From: N8TM AT aol DOT com Message-ID: <0.ab707f44.253273b1@aol.com> Date: Sun, 10 Oct 1999 18:56:49 EDT Subject: default alignments restricted to 32 bits? To: cygwin AT sourceware DOT cygnus DOT com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Windows AOL sub 26 I've been experimenting in the hope of clearing up some of the confusion about alignments. I'm running the cygwin 19990819 snapshot, but I don't think that makes a difference. Here are my observations: Cygwin does not provide a way to get 64-bit data alignment, except by Fortran COMMON blocks or the equivalent. Some commercial compilers share this "feature." I believe some say that the Microsoft ABI forbids any attempt to improve on this, although this presents a significant performance problem. 128-bit alignment is required to get satisfactory performance with C long double, but proposals to support any such storage in g77 have been shot down for now. The combination of gcc-2.95.1 and cygwin binutils did not configure code alignments properly. Installing a recent binutils snapshot on cygwin and re-configuring 2.95.1 produces the expected .p2align 4,,7 scheme in the code generated by 2.95.1, but generates .align 16 with 2.96. Either is an improvement over the code I obtained with the cygwin binutils. However, since cygwin does not support 128-bit alignments, neither takes full effect at link time. On my test cases, there is a net performance deficit of 5% associated with improper code alignments, using the p2align 4,,7 code and comparing with its performance on the same P II under linux. Individual loops are affected by up to 30%, but some of what can be gained by changing alignment in one place is nearly always lost somewhere else. The effect seems not so large when running on a P III Xeon, but the Xeon box doesn't have a decent timer for linux. I use the QueryPerformance..() in NT as I normally do in W95, as clock() is not useful on that box. The results I get under NT, W2K, and W95 are consistent, given that all possible background processes have been shut off. There is one aspect to .p2align which has been acknowledged as a bug in gcc, which is that the p2align instruction is not placed at the top of the loop body for those loops which have 1 to 4 stack adjustment instructions above the point where the loop is entered the first time. This produces a significant performance hit when it causes a loop to occupy an extra cache line. I have corrected this by editing the .s in each case, in order to be able to isolate the differences leading to the conclusions I have stated above. I have also supplied some of my own math functions in order to eliminate differences caused by the different libraries in cygwin (newlib) and linux (glibc-2.1). To me, this is somewhat of a sore point, that all the common libraries continue to carry various deficiencies in the math functions (mainly performance problems as far as newlib is concerned). I note that certain commercial compilers provide their own math libraries, not because theirs are better (they aren't necessarily), but because they isolate them from changes in operating environment. Is there any possibility of cygwin addressing the problems associated with lack of 64- or 128-bit alignment, or is this simply one of the performance deficits we must accept? Tim tprince AT computer DOT org -- Want to unsubscribe from this list? Send a message to cygwin-unsubscribe AT sourceware DOT cygnus DOT com