X-pop3-spooler: POP3MAIL 2.1.0 b 4 980420 -bs- Message-Id: <3.0.32.19980719215256.0098e2c0@xs4all.nl> X-Sender: diep AT xs4all DOT nl X-Mailer: Windows Eudora Pro Version 3.0 (32) Date: Sun, 19 Jul 1998 21:54:08 +0100 To: Marc Lehmann , beastium-list AT desk DOT nl From: Vincent Diepeveen Subject: Re: speed PGCC vs GCC for DIEP Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: Marc Lehmann Status: RO X-Status: A Content-Length: 3294 Lines: 84 At 03:17 PM 7/20/98 +0200, you wrote: >On Mon, Jul 20, 1998 at 01:33:58PM +0000, Vincent Diepeveen wrote: >> Hello, >> I'd be glad to receive additional optimizations i can try on it, and also >> how can i dissassemble the code of pgcc, so that i can report what can be >> improved during optimization (if assume like to receive this)? >you can use objdump to disassamble the object files or binaries, or, even >better, just instruct gcc to skip the assembler and output assembly code. I'll try that. Right now trying several optimizations. >> Here the makefile and notes about speed are written by CFLAGS. >this is very interesting, I can't explain it. Can you send me the source, >or part of it which I can use as a benchmark? Without source to check >I can't help you. > Nice try, but my program is not gnu. I'll study assembler output of pgcc and the optimization failures, and will email you personally changed C code and labels that are causing problems and compile to that horrible assembler. As i pointed out the main difference in optimizing is the 32 bits versus 8 bits datastructure, in the loop i already gave you, and i don't see why you need mov???? instruction to convert unsigned char to int, that's another clock cycle wasted. visual lucky doesn't do that. It XORs EAX,EAX to do it. Perhaps i better write it in capitals might work: it's UNSIGNED CHAR, so you don't have problems with sign. You can put it directly in eax New results for strings: -O6 is 15% slower than gcc 2.7.2.3 -O6 -malign-double -funroll-all-loops -malign-functions=2 -malign-jumps=2 -mamdk6 this is 2% slower than gcc -O2 the same string (hope i typed it over correctly from my pro, anyway i cut'n pasted that email from a guy called Ph. Elbaz Vincent) this is 5% slower than gcc. That was to be expected to be slower,because i have a pentium pro and not a cheap K6, which for old programs performs well, but for the new 32 bits compiled programs suck. Old compiles of my program run quite fast on K6, after i have made 32 bits code out of my program it has become slower, because K6 breaks 8 bits quicker than 32 bits code. It needs less micro-ops for it. Now K6-350Mhz SDRAM at 112Mhz bus is slower than PII-300 EDO RAM. Few months ago K6-200 used to be as fast as Pentium pro 200... ....right now nothing can stop PII, and especially not PII because i can run parallel soon on it. AMD/IBM K6 and Cyrix M2 regrettably can't run be put on a dual or quatro motherboard. Anyway K6 is fast considering its price, but is it smart for a compiler to make optimizations for an outdated socket 7 clone? Assuming i have the choice, when selling software, then i'll NEVER deliver a K6 optimized version, but always a PII/PRO optimized one, which also runs at pentium (so without incompatible instructions like cmove). I'll do that because i think socket 7 is outdated. No future. Look to the level 2 cache what they did to it. They put it at the mainboard!! Awfull! So socket 7 has no future, the faster your processor, the less you profit from it as level 2 cache speed kills you. That's what happens to my program when running on 350+ Mhz K6, it runs slow on it. Slower than on a PII-266 SDRAM even, and one of the reasons is level 2 cache. Vincent