Date: Fri, 3 Sep 1999 22:02:13 +0200 From: Marc Lehmann To: pgcc AT delorie DOT com Subject: [xomicron AT chat DOT ru: May be new peephole optimizations in pgcc.] Message-ID: <19990903220213.D610@cerebro.laendle> Mail-Followup-To: pgcc AT delorie DOT com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Operating-System: Linux version 2.2.12 (root AT cerebro) (gcc driver version pgcc-2.95.1 19990816 (release) executing gcc version 2.7.2.3) Sender: Marc Lehmann Reply-To: pgcc AT delorie DOT com Hi all! Vadim Suhomlinov sent me the following suggestions, most of them relatively easy to implement. If anybody wants to have a look at pgcc or gcc and maybe write a patch, this would be a start! ----- Forwarded message from vadim suhomlinov ----- Subject: May be new peephole optimizations in pgcc. 1) -fschedule-insns on PII may be desired. On bzip2 it improve perfomance by 9% 2) Problems with not enough registers can be avoided if using MMX registers as general with -mmx option. 3) Putting emms after mmx code is faster than putting emms at function epilogue. ------------------------------------------------------------- This is peephole optimizations: 1) sin(arg) & cos(arg) -> fsincos 2) unroll strlen as shown in www.announce.com/agner. (Agner Fox's Pentium optimization manual). Also when mmx target. 3) fild mem / fop -> fiop mem on PPro, K6,Cyrix 4) fstp st /fstp st -> fucompp on Pentium /Ppro 5) Anti AGI feature ,implemented in peephole with supporting shl/add/sub/inc/dec with complex address operand. Like this: sal eax,2/ lea ecx,[eax*2+ebx+3] -> lea ecx,[eax*8+ebx+3]/ lea eax,[eax*4]. 6) fsqrt/fabs -> fsqrt 7) fldz / fucompp -> (?) ftst 8) op reg,imm1/op reg,imm2 -> op reg, imm1 op imm2 ( add esp, -8 / add esp,-16 -> add esp,-24) This needs serious work (I think): Like Intel C do not push function parameters into stack with using push, but patch the header of the parent function to reserv enough space on stack and use mov instruction to set parameters. This also did add esp, unneccessary after function call. Reserve maximum space which may be needed to call function after analysing all function calls in the target function. ----- End forwarded message ----- The last hint should be read with care, as pgcc once implemented this for the amd-k6, but it was a loss on every cpu that was tested, including the k6. Most of the code to do this, however, should still be available. -- -----==- | ----==-- _ | ---==---(_)__ __ ____ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / pcg AT goof DOT com |e| -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+ The choice of a GNU generation | |