Mail Archives: djgpp/2000/04/26/12:14:23
Could I get a copy of your 3d engine?
"Alexei A. Frounze" wrote:
>
> Hello guys!
>
> I beg your pardon for the delay. I had not inet for a couple of days and I was
> thinking of the conclusion and the tests we have taken.
>
> Well, it's time to tell you what we have now around the problem I came up with
> some time before.
>
> I had a nice 3d engine developed with use of GCC (2.95.2) and some assembly
> (both inline and external routine). The program compiled and worked properly
> until I wanted to optimize it using GCC with -O2 switch.
>
> GCC started to flame that my inline assembly is faulty or someones of you said
> buggy. That doesn't make any difference, my code just became unrecognized by GCC
> and AS. This was the actual problem I came up with.
> Btw, some time before that I had a look at the GCC output code (GCC has been
> invoked without any -O switches). And I noticed that inline assembly is needed
> since GCC generated pretty too much redundant code that should be optimized.
> Actually I didn't know that GCC really outputs slow code, if there are no
> command-line switches that makes GCC to optimize the code.
>
> Btw, I made one fun mistake. I used SAR (assembly instruction) for arithmetical
> shift right instead of >>. I used it because I didn't know that C generates
> different instructions the case when signed number needs to be shifted and the
> case with unsigned integer. ;)
>
> Thus I had a lot of incorrect inline assembly code at the beginning and I didn't
> know what to do, since my inline code has been done relying on the manual about
> GCC inline ASM. Seems that article was either incorrect or pretty outdated. As
> far as I know inline assembly has been a bit changed in newer versions of GCC. I
> have an old program with a lot of inline ASM that was made in 1997. It doesn't
> compile with current GCC anymore w/o patching the source code.
>
> To make things a bit clear... I've always used the "g" thing for passing
> parameters to inline assembly blocks. Now AFAIK it's wrong. "g" may be used for
> eax, ebx, ecx, edx or variable in memory. So if I want to pass some parameters
> to the block, I must take in acount that I can't use "g", if there is not enough
> spare registers and I can't use esi and edi registers. Just eax, ebx, ecx and
> edx plus memory referencies.
>
> It was a bit shocking to discover because the code compiled normally before I
> tried that -O2 switch. So I came up with a message with title "insufficiency of
> GCC output code and the -O problem".
>
> Seems now you know what really happend.
>
> Then some of people appearing in the NG said me that my inline assembly code
> makes all the problems and that is not a bug in the compiler. I still doubt that
> GCC has a good behaviour here. It must either compile normally my inline
> assembly w/o depending on the optimization switches or fail with the same error
> messages again w/o regard of those switches. It's still an open question w/o
> answer.
>
> Then Dieter and some of you suggested me to rewrite my inline ASM with something
> other than just the "g" stuff.
>
> Dieter also was interested if my inline ASM is needed. I.e. what would happen,
> if use plane C and optimization switch to the GCC.
> I said that my inline ASM greatly improves the performance here (*greatly*
> because I compared my inline ASM with plane C source compiled w/o any
> optimization switches. I also thought that GCC has no efficient optimizer... But
> that was the past.). He also asked for some numerical results of the comparison.
> So we started out our bet. :))
>
> Dieter was very lucky (an me too) because I left plane C version of my main
> functions commented between /* */. Each comment block was followed by inline
> assembly replacement.If there were no such comment blocks, we wouldn't have
> something serious to talk about. :)
>
> Then Dieter sent me some results of his test and I performed some test on my
> computer. Plane C version worked faster (in percents) on Dieter's computer while
> the version with inline ASM worked faster on mine. I think that's due to
> different CPUs. They work differently so we have different results.
>
> I'm not speaking here about actual parts of the code and tricks I used in order
> to increase total performance of my engine. Some of them are really good
> (replacement for ceil() and parallel division that works faster for me. Btw, GCC
> also can generate such tricky code.). Anyway we have different primary results.
>
> Then Dieter made implementation of the most inner loop of the texture mapper in
> plane C with unrolled loop just like in my external ASM implementation of the
> same inner loop. He also _inline_d that function and replaced the (int) cast to
> inline analogue. He got the engine running faster than before.
>
> After that I improved my code a little (replaced SHR with >> and eliminated some
> redundant code out of my inner loop). And then I compared my program that has a
> lot of ASM (both inline and one external subroutine -- inner loop) with Dieter's
> plane C implementation. I was surprised... Dieter's version ran almost as fast
> as mine. Just *a bit* slower.
>
> Thus we proved that GCC has a very good optimizer. And if you want to make your
> program faster, it's not really needed to put a lot of assembly code into the
> source. IMHO that's great!!! :)
>
> I'm not sure I should post test tables with values for FPS and details about
> which parts of the code were C and which were ASM. Posting those tables means
> that I need to explain what exactly Dieter and I were working on in all the
> details.
>
> So, anyone may learn from this.
>
> Btw, recently I changed implementation of my original texture mapping algorithm
> and won some extra FPS. That means your code performans greatly depends on the
> actual algorithm implementation, since neither compiler nor optimizer can figure
> out your algorithm and improve it as well as optimize code. :)
>
> About two days ago I generated an .S file out of Dieter's C version of the
> tmapper. Then I replaced manually Dieter's inner loop with the code from my
> external ASM implementation of the same inner loop. And it became faster once
> more. :)
>
> So actually, my assembly code is not very bad. :)
>
> Well, here must go some kind of conclusing words now.
>
> What we have now:
> - fixed inline assembly
> - yet another pretty efficient optimizing compiler :)
> - faster 3d engine
> - some real experience we all can learn from
>
> Thanks to Dieter, everyone else and me for coming up with such a problem.
>
> Dieter, what do you think of this conclusion? Wanna correct something, or there
> is everything alright in the above text?
>
> Thanks,
> Later
> - Alexei A. Frounze
> -----------------------------------------
> Homepage: http://alexfru.chat.ru
> Mirror: http://members.xoom.com/alexfru
--
Robin Hugh Johnson
"Robbat2"
QTOD: "I used to be an idealist, but I got mugged by reality."
E-Mail : robbat2 AT orbis-terrarum DOT net
ICQ# : 30269588 or 41961639
Home Page : http://www.orbis-terrarum.net
Time Zone : Pacific Daylight (GMT - 8)
- Raw text -