From: "Alexei A. Frounze" Newsgroups: comp.os.msdos.djgpp Subject: Re: inefficiency of GCC output code & -O problem Date: Sun, 16 Apr 2000 19:07:03 +0400 Organization: MTU-Intel ISP Lines: 66 Message-ID: <38F9D717.9438A3F6@mtu-net.ru> References: NNTP-Posting-Host: ppp101-4.dialup.mtu-net.ru Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 7bit X-Trace: gavrilo.mtu.ru 955897651 32522 212.188.101.4 (16 Apr 2000 15:07:31 GMT) X-Complaints-To: usenet-abuse AT mtu DOT ru NNTP-Posting-Date: 16 Apr 2000 15:07:31 GMT X-Mailer: Mozilla 4.72 [en] (Win95; I) X-Accept-Language: en,ru To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com Kalum Somaratna aka Grendel wrote: > > On Sun, 16 Apr 2000, Alexei A. Frounze wrote: > > > Just make either a plane C code (which is slower) or huge external ASM > > subroutine > > Well Alexei, programmers have particularly bad tendency of finding where > the bottleneck in a program really is... say for example if a character > move genration algorithm takes up 90% of your programs time and the > blitting takes up only 10% then what use will writing the blitting in > assembly have..the move genration part is what you should optimise...since > it takes the most time.. Nice story. :) > So what I would suggest would be to write the entire (or as much code as > possible) using C..then you can run gprof and see which routines are > taking up the most cpu time...and belive me you will be surprised... > *then* you can decide on what routines to optimise or not... No, I won't be surprised. There are only 2 subroutines that really slows down the performance: 1st - tmapping routine 2nd - polygon endge scanning routine These are the most expensive subroutines and by the way the 1st one is almost fully optimized. I need to optimze or invent another algorithm for the second one, since it takes 4...15% of the first one and it's not a very good thing for my 3d engine. Other subroutines do nothing slow. Just a little work with vertices (rotation, translation), keyboard I/O, screen double buffering. What else??? I mentioned almost everything. > > And also I find that surprisingly enough a good optimizing compiler > produces faster code than a hadwritten assembly sequence. Because the > compiler can optimize the output of the C code taking in to advatage the > cpu characteristics of various x86 architectures... Not really. The inner loop in my tmapper can not be written in pure C. Belive me. No one compiler figure out such a trick as used in my ASM module. > And also the assembly you *think* is fast may sometimes be very slow (ie > there maybe a faster way of doing it) on diferent x86 cpu's.... Btw, I have a spinning piramid demo. It's renders a spinning perspective tmapped piramid at the screen in the 320x200 resolution (piramid fills the screen entirely). It runs at 20 FPS on 486dx2-66. I think it's a very good proof of the efficiency of my tmapper. But I'm not saying it's a limit. :) Plane C will give less FPS, really. Btw, I try to use pairable simple instructions and handle CSE (common subexpression elimination) myself plus I replace some code with faster equevalent (i.e. multiplications instead of divisions)... And I keep away from REloading FPU registers/stack. This really speeds up the engine. I'm sure advanced programmer is much more clever than a compiler. At least I treat myself as an advanced one and usually this is proved by the achieved results. bye. Alexei A. Frounze ----------------------------------------- Homepage: http://alexfru.chat.ru Mirror: http://members.xoom.com/alexfru