From: "John S. Fine" Newsgroups: comp.os.msdos.djgpp Subject: Optimizations Date: Thu, 17 Sep 1998 14:34:13 -0400 Lines: 58 Message-ID: <36015625.62D2@erols.com> Reply-To: johnfine AT erols DOT com NNTP-Posting-Host: 207-172-241-249.s58.as8.bsd.erols.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Precedence: bulk I am working on some code that needs to be fast and fairly small on a 486. The system has slow dram and no L2 cache, so in many cases small will be the best way to achieve fast (fit more in L1 cache). While debugging I noticed many place where gcc has generated crude code that is both larger and slower than I would have expected. I am using -O2 and no other optimization switches. Are there other switches that are appropriate to this project? I am using gcc 2.7.2.1. Would a newer version produce better 486 code, or do the improvements just help Pentium+ CPUs? The C code needs to be very portable, but I am only worried about the performance of the gcc based version. I could use conditionals to support nonportable optimizations for the gcc version, but I really want to avoid confusing other people who must look at the C code. My code frequently has expressions of the form ( A << ( (B) & 31 ) ) where B is a subexpression. GCC always computes B in some poorly chosen register, then moves it to ecx, ANDs cl with 0x1F and then does the shift. On an x86 there is no need to AND cl with 0x1F before a shift. The CPU only uses the low five bits of cl for the shift anyway. However, I can't remove the "& 31" from the source code and have it still be portable. I understand that gcc includes templates that control the generation of instructions. Can a template describe something like ( A << (B & 31) )? How hard would it be for me (I have never recompiled any part of djgpp) to add that template and recompile? GCC also adds NOPs to align many branch targets to dword boundaries. In my project, that usually slows the code down, because the harm done by extra cache misses outweighs the benefits of aligning. Can I individually turn off optimizations like that while generally optimizing for speed rather than space? The most common form of bad code seems to be computing a value in one register and then moving it to the register where it is needed. In all these cases, there was nothing preventing it from computing the value in the correct register to begin with. Are there any options to make it spend more time during compilation thinking about register selection, so it won't get those wrong? -- http://www.erols.com/johnfine/ http://www.geocities.com/SiliconValley/Peaks/8600/