Newsgroups: comp.os.msdos.djgpp From: "e.oti" Subject: Re: Speed Optimization is getting worse with V2.01 Sender: usenet AT fys DOT ruu DOT nl (News system Tijgertje) Message-ID: <3275129C.1FC5@stud.warande.ruu.nl> Date: Mon, 28 Oct 1996 20:07:56 GMT Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii References: <551i8f$2dv AT sjx-ixn9 DOT ix DOT netcom DOT com> Mime-Version: 1.0 Organization: Physics and Astronomy, University of Utrecht, The Netherlands Lines: 56 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp William D. Kirby wrote: > > I have a test program for timing tests, and it shows that DJGPP v2.01 > produces slower executables than v2.0 which are slower than v1.0. We > are giving up a lot of speed with the improvements being made. Presently > DJGPP v2.01 exe's are about 10% slower than targets made with Borland 4.5 > with Power Pack a 32 DPMI extention. > > -- I recently downloaded v. 2.01 too and noticed a couple of things (probably already mentioned here in other threads): *The executables are more bloated; it seems to be due to the symbol table because stripping the coff output produces an executable of the same size as v2.00. *The optimisation flags are switched on differently; -fforce-mem is now a part of O2 ; it wasn't before. I spent a couple of hours playing around with optimisation switches to get the "feel" of how it works. The end result is, it optimises just as well as v 2.00 but it takes a different combination to achieve the same result. Gcc allows you to fiddle with the nature of optimisation quite a bit, and it helps to know what your code is doing. Here are a couple of tips: 1. Profile your code and check which routines are time critical. Compile your code with the ordinary optimisation switches: -O2 -m486 -fomit-frame-pointer -ffast-math Disassemble it or compile with -S to get the gas input file. Examine the assembler code. 2. If there aren't too many memory accesses within the inner loop try adding -fforce-addr to the optimisation switch list. This copies addresses into registers for pointer arithmetic. It helps a lot if a couple of pointers are used heavily within one single loop. It doesn't help if you're referencing dozens of different addresses in the inner loop. 3. If there are a lot of memory references, try adding -fno-force-mem, because repeated copies of memory variables into registers causes a lot of bloat and slows down the speed, naturally. 4. Profile the effect of -funroll-loop and -funroll-all-loops, and -fstrength-reduce. Try to compile the different source files with different optimisation flags. Finally, may I add that I've always succeeded in getting gcc executables to run faster than the Borland-compiled version ( version 5.0 excepted, I haven't access to it) and certainly faster than djgpp v1. Not that that's an objective criterion or anything, but let that be a stimulus to optimise further. Elliott