Mail Archives: pgcc/1998/07/20/19:55:14
At 03:17 PM 7/20/98 +0200, you wrote:
>On Mon, Jul 20, 1998 at 01:33:58PM +0000, Vincent Diepeveen wrote:
>> Hello,
>> I'd be glad to receive additional optimizations i can try on it, and also
>> how can i dissassemble the code of pgcc, so that i can report what can be
>> improved during optimization (if assume like to receive this)?
>you can use objdump to disassamble the object files or binaries, or, even
>better, just instruct gcc to skip the assembler and output assembly code.
I'll try that.
Right now trying several optimizations.
>> Here the makefile and notes about speed are written by CFLAGS.
>this is very interesting, I can't explain it. Can you send me the source,
>or part of it which I can use as a benchmark? Without source to check
>I can't help you.
>
Nice try, but my program is not gnu.
I'll study assembler output of pgcc and the optimization failures, and will
email you personally changed C code and labels that are causing problems and
compile to that horrible assembler.
As i pointed out the main difference in optimizing is the 32 bits versus
8 bits datastructure, in the loop i already gave you, and i don't see why
you need mov???? instruction to convert unsigned char to int, that's another
clock cycle wasted. visual lucky doesn't do that. It XORs EAX,EAX to do it.
Perhaps i better write it in capitals might work:
it's UNSIGNED CHAR, so you don't
have problems with sign. You can put it directly in eax
New results for strings:
-O6 is 15% slower than gcc 2.7.2.3
-O6 -malign-double -funroll-all-loops -malign-functions=2 -malign-jumps=2
-mamdk6 this is 2% slower than gcc
-O2 the same string (hope i typed it over correctly from my pro, anyway i
cut'n pasted that email from a guy called Ph. Elbaz Vincent)
this is 5% slower than gcc.
That was to be expected to be slower,because i have a pentium pro and not a
cheap K6, which for old programs performs well, but for the new 32 bits
compiled programs suck.
Old compiles of my program run quite fast on K6, after i have made 32 bits
code out of my program it has become slower, because K6 breaks 8 bits quicker
than 32 bits code. It needs less micro-ops for it.
Now K6-350Mhz SDRAM at 112Mhz bus is slower than PII-300 EDO RAM.
Few months ago K6-200 used to be as fast as Pentium pro 200...
....right now nothing can stop PII, and especially not PII because i can
run parallel soon on it. AMD/IBM K6 and Cyrix M2 regrettably can't run be
put on a dual or quatro motherboard.
Anyway K6 is fast considering its price, but is it smart for a
compiler to make optimizations for an outdated socket 7 clone?
Assuming i have the choice, when selling software, then i'll
NEVER deliver a K6 optimized version, but always a PII/PRO optimized one,
which also runs at pentium (so without incompatible instructions like cmove).
I'll do that because i think socket 7 is outdated. No future.
Look to the level 2 cache what they did to it. They put it at the mainboard!!
Awfull! So socket 7 has no future, the faster your processor, the less
you profit from it as level 2 cache speed kills you.
That's what happens to my program when running on 350+ Mhz K6, it runs slow
on it. Slower than on a PII-266 SDRAM even, and one of the reasons is
level 2 cache.
Vincent
- Raw text -