X-pop3-spooler: POP3MAIL 2.1.0 b 4 980420 -bs-
Message-ID: <35A9E060.34A50938@netvision.net.il>
Date: Mon, 13 Jul 1998 13:24:32 +0300
From: Misha <vulcao AT netvision DOT net DOT il>
X-Mailer: Mozilla 4.04 [en] (X11; I; IRIX 5.3 IP22)
MIME-Version: 1.0
To: beastium-list AT Desk DOT nl
Subject: PGCC's lack of optimizations... (slightly lengthy)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: Marc Lehmann <pcg AT goof DOT com>
Status: RO
X-Status: A
Content-Length: 2027
Lines: 44

Hi.

  I am new to the list, so I haven't yet read all the possible FAQs. I apologize
in advance if this questions has been raised thousands of times.

I am trying to compile some  number-crunching stuff on my Linux
(PentiumII). I have both gcc-2.7.2.1 and pgcc-1.0.3.
The point is that pgcc produces consistently WORSE code than gcc-2.7.2.1
on both floating point and integer issues.
In all cases it produces code that is approx. 5% to 25% slower on the PentiumII.
I have read the entire pgcc documentation, so I believe I use all the appropriate
flags.
I can't send you the code, but I can tell you that it is some sort of a DSP-kind
code.
I have recently started to learn the P6 architecture and browse through the assembled
code, to see exactly why it does worse.

It is a bit sad that the compiler that produces i486 code,  produces better code than
the compiler that produces Pentium code. I still hope I might doing something wrong...

So finally I have two questions and one suggestion:
1.  Is the problem known?
2.  Are there any tools like SGI's "perfex"  available for Linux?
     The "perfex" tool executes the code and then reports the statistics from the
     CPU internal event counters, so you have a picture of, say, how many L1 and L2
     cache misses were, the FPU unit utilization, mispredicted branches, etc...

And the suggestion:
The SGI compiler has a very cool compiler option. When you ask it
to produce assembly language code (-S) in conjunction with the maximum
optimization option, it produces the assembly code, but it also places some statistics
on the success of optimizations in the code!
For instance, in tight loops it gives you software pipelining success, parallelization
success and CPU unit utilization in %.
It would be nice if such a tool be available whether as a part of pgcc or otherwise.
I would gladly write the stuff myself, but unfortunately my working schedule does not
permit to do so.

Sincere regards, and keep up the great work!
Misha

Michael (Misha) Pak
Tel-Aviv
Israel.