X-pop3-spooler: POP3MAIL 2.1.0 b 4 980420 -bs- Message-ID: <35A9E060.34A50938@netvision.net.il> Date: Mon, 13 Jul 1998 13:24:32 +0300 From: Misha X-Mailer: Mozilla 4.04 [en] (X11; I; IRIX 5.3 IP22) MIME-Version: 1.0 To: beastium-list AT Desk DOT nl Subject: PGCC's lack of optimizations... (slightly lengthy) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: Marc Lehmann Status: RO X-Status: A Content-Length: 2027 Lines: 44 Hi. I am new to the list, so I haven't yet read all the possible FAQs. I apologize in advance if this questions has been raised thousands of times. I am trying to compile some number-crunching stuff on my Linux (PentiumII). I have both gcc-2.7.2.1 and pgcc-1.0.3. The point is that pgcc produces consistently WORSE code than gcc-2.7.2.1 on both floating point and integer issues. In all cases it produces code that is approx. 5% to 25% slower on the PentiumII. I have read the entire pgcc documentation, so I believe I use all the appropriate flags. I can't send you the code, but I can tell you that it is some sort of a DSP-kind code. I have recently started to learn the P6 architecture and browse through the assembled code, to see exactly why it does worse. It is a bit sad that the compiler that produces i486 code, produces better code than the compiler that produces Pentium code. I still hope I might doing something wrong... So finally I have two questions and one suggestion: 1. Is the problem known? 2. Are there any tools like SGI's "perfex" available for Linux? The "perfex" tool executes the code and then reports the statistics from the CPU internal event counters, so you have a picture of, say, how many L1 and L2 cache misses were, the FPU unit utilization, mispredicted branches, etc... And the suggestion: The SGI compiler has a very cool compiler option. When you ask it to produce assembly language code (-S) in conjunction with the maximum optimization option, it produces the assembly code, but it also places some statistics on the success of optimizations in the code! For instance, in tight loops it gives you software pipelining success, parallelization success and CPU unit utilization in %. It would be nice if such a tool be available whether as a part of pgcc or otherwise. I would gladly write the stuff myself, but unfortunately my working schedule does not permit to do so. Sincere regards, and keep up the great work! Misha Michael (Misha) Pak Tel-Aviv Israel.