X-pop3-spooler: POP3MAIL 2.1.0 b 4 980420 -bs-
Message-Id: <3.0.32.19980719215256.0098e2c0@xs4all.nl>
X-Sender: diep AT xs4all DOT nl
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Sun, 19 Jul 1998 21:54:08 +0100
To: Marc Lehmann <pcg AT goof DOT com>, beastium-list AT desk DOT nl
From: Vincent Diepeveen <diep AT xs4all DOT nl>
Subject: Re: speed PGCC vs GCC for DIEP
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Sender: Marc Lehmann <pcg AT goof DOT com>
Status: RO
X-Status: A
Content-Length: 3294
Lines: 84

At 03:17 PM 7/20/98 +0200, you wrote:
>On Mon, Jul 20, 1998 at 01:33:58PM +0000, Vincent Diepeveen wrote:
>> Hello,
 
>> I'd be glad to receive additional optimizations i can try on it, and also
>> how can i dissassemble the code of pgcc, so that i can report what can be
>> improved during optimization (if assume like to receive this)?

>you can use objdump to disassamble the object files or binaries, or, even

>better, just instruct gcc to skip the assembler and output assembly code.

I'll try that.

Right now trying several optimizations.

>> Here the makefile and notes about speed are written by CFLAGS.

>this is very interesting, I can't explain it. Can you send me the source,
>or part of it which I can use as a benchmark? Without source to check
>I can't help you.
>

Nice try, but my program is not gnu.

I'll study assembler output of pgcc and the optimization failures, and will
email you personally changed C code and labels that are causing problems and
compile to that horrible assembler.

As i pointed out the main difference in optimizing is the 32 bits versus
8 bits datastructure, in the loop i already gave you, and i don't see why
you need mov???? instruction to convert unsigned char to int, that's another
clock cycle wasted. visual lucky doesn't do that. It XORs EAX,EAX to do it. 

Perhaps i better write it in capitals might work: 
it's UNSIGNED CHAR, so you don't
have problems with sign. You can put it directly in eax

New results for strings:

-O6   is 15% slower than gcc 2.7.2.3
-O6   -malign-double -funroll-all-loops -malign-functions=2 -malign-jumps=2
      -mamdk6 this is 2% slower than gcc
-O2   the same string (hope i typed it over correctly from my pro, anyway i
            cut'n pasted that email from a guy called Ph. Elbaz Vincent)
      this is 5% slower than gcc.

That was to be expected to be slower,because i have a pentium pro and not a
cheap K6, which for old programs performs well, but for the new 32 bits
compiled programs suck.

Old compiles of my program run quite fast on K6, after i have made 32 bits
code out of my program it has become slower, because K6 breaks 8 bits quicker
than 32 bits code. It needs less micro-ops for it.

Now K6-350Mhz SDRAM at 112Mhz bus is slower than PII-300 EDO RAM.

Few months ago K6-200 used to be as fast as Pentium pro 200...
....right now nothing can stop PII, and especially not PII because i can
run parallel soon on it. AMD/IBM K6 and Cyrix M2 regrettably can't run be
put on a dual or quatro motherboard.

Anyway K6 is fast considering its price, but is it smart for a 
compiler to make optimizations for an outdated socket 7 clone?

Assuming i have the choice, when selling software, then i'll 
NEVER deliver a K6 optimized version, but always a PII/PRO optimized one,
which also runs at pentium (so without incompatible instructions like cmove).

I'll do that because i think socket 7 is outdated. No future.
Look to the level 2 cache what they did to it. They put it at the mainboard!!
Awfull! So socket 7 has no future, the faster your processor, the less
you profit from it as level 2 cache speed kills you.

That's what happens to my program when running on 350+ Mhz K6, it runs slow
on it. Slower than on a PII-266 SDRAM even, and one of the reasons is
level 2 cache.

Vincent