delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2000/04/16/12:41:02

From: "Alexei A. Frounze" <alex DOT fru AT mtu-net DOT ru>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: inefficiency of GCC output code & -O problem
Date: Sun, 16 Apr 2000 19:07:03 +0400
Organization: MTU-Intel ISP
Lines: 66
Message-ID: <38F9D717.9438A3F6@mtu-net.ru>
References: <Pine DOT LNX DOT 4 DOT 10 DOT 10004161837540 DOT 1138-100000 AT darkstar DOT grendel DOT net>
NNTP-Posting-Host: ppp101-4.dialup.mtu-net.ru
Mime-Version: 1.0
X-Trace: gavrilo.mtu.ru 955897651 32522 212.188.101.4 (16 Apr 2000 15:07:31 GMT)
X-Complaints-To: usenet-abuse AT mtu DOT ru
NNTP-Posting-Date: 16 Apr 2000 15:07:31 GMT
X-Mailer: Mozilla 4.72 [en] (Win95; I)
X-Accept-Language: en,ru
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

Kalum Somaratna aka Grendel wrote:
> 
>  On Sun, 16 Apr 2000, Alexei A. Frounze wrote:
> 
> > Just make either a plane C code (which is slower) or huge external ASM
> > subroutine
> 
> Well Alexei, programmers have particularly bad tendency of finding where
> the bottleneck in a program really is... say for example if a character
> move genration algorithm takes up 90% of your programs time and the
> blitting takes up only 10% then what use will writing the blitting in
> assembly have..the move genration part is what you should optimise...since
> it takes the most time..

Nice story. :)

> So what I would suggest would be to write the entire (or as much code as
> possible) using C..then you can run gprof and see which routines are
> taking up the most cpu time...and belive me you will be surprised...
> *then* you can decide on what routines to optimise or not...

No, I won't be surprised. There are only 2 subroutines that really slows down
the performance: 
  1st - tmapping routine
  2nd - polygon endge scanning routine
These are the most expensive subroutines and by the way the 1st one is almost
fully optimized. I need to optimze or invent another algorithm for the second
one, since it takes 4...15% of the first one and it's not a very good thing for
my 3d engine.
Other subroutines do nothing slow. Just a little work with vertices (rotation,
translation), keyboard I/O, screen double buffering. What else??? I mentioned
almost everything.

> 
> And also I find that surprisingly enough a good optimizing compiler
> produces faster code than a hadwritten assembly sequence. Because the
> compiler can optimize the output of the C code taking in to advatage the
> cpu characteristics of various x86 architectures...

Not really. The inner loop in my tmapper can not be written in pure C. Belive
me. No one compiler figure out such a trick as used in my ASM module.

> And also the assembly you *think* is fast may sometimes be very slow (ie
> there maybe a faster way of doing it) on diferent x86 cpu's....

Btw, I have a spinning piramid demo. It's renders a spinning perspective tmapped
piramid at the screen in the 320x200 resolution (piramid fills the screen
entirely). It runs at 20 FPS on 486dx2-66. I think it's a very good proof of the
efficiency of my tmapper. But I'm not saying it's a limit. :) Plane C will give
less FPS, really.

Btw, I try to use pairable simple instructions and handle CSE (common
subexpression elimination) myself plus I replace some code with faster
equevalent (i.e. multiplications instead of divisions)... And I keep away from
REloading FPU registers/stack. This really speeds up the engine. 

I'm sure advanced programmer is much more clever than a compiler.
At least I treat myself as an advanced one and usually this is proved by the
achieved results.

bye.
Alexei A. Frounze
-----------------------------------------
Homepage: http://alexfru.chat.ru
Mirror:   http://members.xoom.com/alexfru

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019