X-pop3-spooler: POP3MAIL 2.1.0 b 4 980420 -bs- Message-ID: <19980714004206.16665@cerebro.laendle> Date: Tue, 14 Jul 1998 00:42:06 +0200 From: Marc Lehmann To: Misha Cc: beastium Subject: Re: PGCC's lack of optimizations... (slightly lengthy) Mail-Followup-To: Misha , beastium References: <35A9E060 DOT 34A50938 AT netvision DOT net DOT il> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <35A9E060.34A50938@netvision.net.il>; from Misha on Mon, Jul 13, 1998 at 01:24:32PM +0300 X-Operating-System: Linux version 2.1.108 (root AT cerebro) (gcc version pgcc-2.91.43 19980628 (gcc2 ss-980502 experimental)) Status: RO Content-Length: 3663 Lines: 77 On Mon, Jul 13, 1998 at 01:24:32PM +0300, Misha wrote: > > I am trying to compile some number-crunching stuff on my Linux > (PentiumII). I have both gcc-2.7.2.1 and pgcc-1.0.3. > The point is that pgcc produces consistently WORSE code than gcc-2.7.2.1 > on both floating point and integer issues. > In all cases it produces code that is approx. 5% to 25% slower on the PentiumII. > I have read the entire pgcc documentation, so I believe I use all the appropriate I guess the number crunching code is fpu-intensive? in that case, the double alignment is absolutely essential, otherwise performance is absolutely random and might well be much slower than with gcc. which libc are you using? if you use libc5 or an earlier version of glibc2.0.6, consider upgrading. Have you used the -malign-double flag? (and maybe -mstack-align-double)? without these flags, fp-performance is random as well. (newer snapshots align many static variables automatically, so it might be worth to give them a try) you might also want to try -mpentiumpro and -march=pentiumpro to ensure egcs/pgcc actually produces code for your cpu. (a final tip, independent of this issue, it often helps to -funroll-all-loops and/or -fschedule-insns) The reasons for all this is that the default x86 ABI specifies a highly suboptimal alignment for doubles. Changing this alignment breaks the ABI and _might_ require that you re-compile all code including all libraries you use, so thsi can't be on by default. > I can't send you the code, but I can tell you that it is some sort of a DSP-kind if you are sure you have the correct alignment, and the problem still persists, I'd really like to get this problem fixed. > It is a bit sad that the compiler that produces i486 code, produces better code than > the compiler that produces Pentium code. I still hope I might doing something wrong... its an interesting question who actually made an error. at the time the x86 ABI was created, double alignment was not a problem. With modern cpus (pentium and above) it is. > 1. Is the problem known? probably. > 2. Are there any tools like SGI's "perfex" available for Linux? > The "perfex" tool executes the code and then reports the statistics from the > CPU internal event counters, so you have a picture of, say, how many L1 and L2 > cache misses were, the FPU unit utilization, mispredicted branches, etc... there are various patches floating around that make use of the performance monitoring registers under linux, but I do not have a pointer ;( > maximum optimization option, it produces the assembly code, but it also > places some statistics on the success of optimizations in the code! For > instance, in tight loops it gives you software pipelining success, > parallelization success and CPU unit utilization in %. If gcc only had this information itself... doing this for x86 is much more difficult than for a sane architecture. The performance of code on pentiums depends on such things as stepping or # of bugs fixed :() > such a tool be available whether as a part of pgcc or otherwise. I would Hmm, that would be way cool, yet I do not know what this might be good for, except for tracking down bugs or similar tasks ;) -----==- | ----==-- _ | ---==---(_)__ __ ____ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / pcg AT goof DOT com |e| -=====/_/_//_/\_,_/ /_/\_\ --+ The choice of a GNU generation | |