Mail Archives: djgpp/2000/04/16/12:41:02
Kalum Somaratna aka Grendel wrote:
>
> On Sun, 16 Apr 2000, Alexei A. Frounze wrote:
>
> > Just make either a plane C code (which is slower) or huge external ASM
> > subroutine
>
> Well Alexei, programmers have particularly bad tendency of finding where
> the bottleneck in a program really is... say for example if a character
> move genration algorithm takes up 90% of your programs time and the
> blitting takes up only 10% then what use will writing the blitting in
> assembly have..the move genration part is what you should optimise...since
> it takes the most time..
Nice story. :)
> So what I would suggest would be to write the entire (or as much code as
> possible) using C..then you can run gprof and see which routines are
> taking up the most cpu time...and belive me you will be surprised...
> *then* you can decide on what routines to optimise or not...
No, I won't be surprised. There are only 2 subroutines that really slows down
the performance:
1st - tmapping routine
2nd - polygon endge scanning routine
These are the most expensive subroutines and by the way the 1st one is almost
fully optimized. I need to optimze or invent another algorithm for the second
one, since it takes 4...15% of the first one and it's not a very good thing for
my 3d engine.
Other subroutines do nothing slow. Just a little work with vertices (rotation,
translation), keyboard I/O, screen double buffering. What else??? I mentioned
almost everything.
>
> And also I find that surprisingly enough a good optimizing compiler
> produces faster code than a hadwritten assembly sequence. Because the
> compiler can optimize the output of the C code taking in to advatage the
> cpu characteristics of various x86 architectures...
Not really. The inner loop in my tmapper can not be written in pure C. Belive
me. No one compiler figure out such a trick as used in my ASM module.
> And also the assembly you *think* is fast may sometimes be very slow (ie
> there maybe a faster way of doing it) on diferent x86 cpu's....
Btw, I have a spinning piramid demo. It's renders a spinning perspective tmapped
piramid at the screen in the 320x200 resolution (piramid fills the screen
entirely). It runs at 20 FPS on 486dx2-66. I think it's a very good proof of the
efficiency of my tmapper. But I'm not saying it's a limit. :) Plane C will give
less FPS, really.
Btw, I try to use pairable simple instructions and handle CSE (common
subexpression elimination) myself plus I replace some code with faster
equevalent (i.e. multiplications instead of divisions)... And I keep away from
REloading FPU registers/stack. This really speeds up the engine.
I'm sure advanced programmer is much more clever than a compiler.
At least I treat myself as an advanced one and usually this is proved by the
achieved results.
bye.
Alexei A. Frounze
-----------------------------------------
Homepage: http://alexfru.chat.ru
Mirror: http://members.xoom.com/alexfru
- Raw text -