Message-ID: <3906FA5C.D77BC997@home.com> From: Robin Johnson Organization: Orbit Computers X-Mailer: Mozilla 4.7 [en] (Win98; U) X-Accept-Language: en,af,es MIME-Version: 1.0 Newsgroups: comp.os.msdos.djgpp Subject: Re: THE CONCLUSION References: <38F20E7A DOT 3330E9A4 AT mtu-net DOT ru> <3906D238 DOT 888D65F7 AT mtu-net DOT ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Lines: 145 Date: Wed, 26 Apr 2000 14:17:55 GMT NNTP-Posting-Host: 24.113.36.103 X-Complaints-To: abuse AT home DOT net X-Trace: news1.rdc1.bc.home.com 956758675 24.113.36.103 (Wed, 26 Apr 2000 07:17:55 PDT) NNTP-Posting-Date: Wed, 26 Apr 2000 07:17:55 PDT To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com Could I get a copy of your 3d engine? "Alexei A. Frounze" wrote: > > Hello guys! > > I beg your pardon for the delay. I had not inet for a couple of days and I was > thinking of the conclusion and the tests we have taken. > > Well, it's time to tell you what we have now around the problem I came up with > some time before. > > I had a nice 3d engine developed with use of GCC (2.95.2) and some assembly > (both inline and external routine). The program compiled and worked properly > until I wanted to optimize it using GCC with -O2 switch. > > GCC started to flame that my inline assembly is faulty or someones of you said > buggy. That doesn't make any difference, my code just became unrecognized by GCC > and AS. This was the actual problem I came up with. > Btw, some time before that I had a look at the GCC output code (GCC has been > invoked without any -O switches). And I noticed that inline assembly is needed > since GCC generated pretty too much redundant code that should be optimized. > Actually I didn't know that GCC really outputs slow code, if there are no > command-line switches that makes GCC to optimize the code. > > Btw, I made one fun mistake. I used SAR (assembly instruction) for arithmetical > shift right instead of >>. I used it because I didn't know that C generates > different instructions the case when signed number needs to be shifted and the > case with unsigned integer. ;) > > Thus I had a lot of incorrect inline assembly code at the beginning and I didn't > know what to do, since my inline code has been done relying on the manual about > GCC inline ASM. Seems that article was either incorrect or pretty outdated. As > far as I know inline assembly has been a bit changed in newer versions of GCC. I > have an old program with a lot of inline ASM that was made in 1997. It doesn't > compile with current GCC anymore w/o patching the source code. > > To make things a bit clear... I've always used the "g" thing for passing > parameters to inline assembly blocks. Now AFAIK it's wrong. "g" may be used for > eax, ebx, ecx, edx or variable in memory. So if I want to pass some parameters > to the block, I must take in acount that I can't use "g", if there is not enough > spare registers and I can't use esi and edi registers. Just eax, ebx, ecx and > edx plus memory referencies. > > It was a bit shocking to discover because the code compiled normally before I > tried that -O2 switch. So I came up with a message with title "insufficiency of > GCC output code and the -O problem". > > Seems now you know what really happend. > > Then some of people appearing in the NG said me that my inline assembly code > makes all the problems and that is not a bug in the compiler. I still doubt that > GCC has a good behaviour here. It must either compile normally my inline > assembly w/o depending on the optimization switches or fail with the same error > messages again w/o regard of those switches. It's still an open question w/o > answer. > > Then Dieter and some of you suggested me to rewrite my inline ASM with something > other than just the "g" stuff. > > Dieter also was interested if my inline ASM is needed. I.e. what would happen, > if use plane C and optimization switch to the GCC. > I said that my inline ASM greatly improves the performance here (*greatly* > because I compared my inline ASM with plane C source compiled w/o any > optimization switches. I also thought that GCC has no efficient optimizer... But > that was the past.). He also asked for some numerical results of the comparison. > So we started out our bet. :)) > > Dieter was very lucky (an me too) because I left plane C version of my main > functions commented between /* */. Each comment block was followed by inline > assembly replacement.If there were no such comment blocks, we wouldn't have > something serious to talk about. :) > > Then Dieter sent me some results of his test and I performed some test on my > computer. Plane C version worked faster (in percents) on Dieter's computer while > the version with inline ASM worked faster on mine. I think that's due to > different CPUs. They work differently so we have different results. > > I'm not speaking here about actual parts of the code and tricks I used in order > to increase total performance of my engine. Some of them are really good > (replacement for ceil() and parallel division that works faster for me. Btw, GCC > also can generate such tricky code.). Anyway we have different primary results. > > Then Dieter made implementation of the most inner loop of the texture mapper in > plane C with unrolled loop just like in my external ASM implementation of the > same inner loop. He also _inline_d that function and replaced the (int) cast to > inline analogue. He got the engine running faster than before. > > After that I improved my code a little (replaced SHR with >> and eliminated some > redundant code out of my inner loop). And then I compared my program that has a > lot of ASM (both inline and one external subroutine -- inner loop) with Dieter's > plane C implementation. I was surprised... Dieter's version ran almost as fast > as mine. Just *a bit* slower. > > Thus we proved that GCC has a very good optimizer. And if you want to make your > program faster, it's not really needed to put a lot of assembly code into the > source. IMHO that's great!!! :) > > I'm not sure I should post test tables with values for FPS and details about > which parts of the code were C and which were ASM. Posting those tables means > that I need to explain what exactly Dieter and I were working on in all the > details. > > So, anyone may learn from this. > > Btw, recently I changed implementation of my original texture mapping algorithm > and won some extra FPS. That means your code performans greatly depends on the > actual algorithm implementation, since neither compiler nor optimizer can figure > out your algorithm and improve it as well as optimize code. :) > > About two days ago I generated an .S file out of Dieter's C version of the > tmapper. Then I replaced manually Dieter's inner loop with the code from my > external ASM implementation of the same inner loop. And it became faster once > more. :) > > So actually, my assembly code is not very bad. :) > > Well, here must go some kind of conclusing words now. > > What we have now: > - fixed inline assembly > - yet another pretty efficient optimizing compiler :) > - faster 3d engine > - some real experience we all can learn from > > Thanks to Dieter, everyone else and me for coming up with such a problem. > > Dieter, what do you think of this conclusion? Wanna correct something, or there > is everything alright in the above text? > > Thanks, > Later > - Alexei A. Frounze > ----------------------------------------- > Homepage: http://alexfru.chat.ru > Mirror: http://members.xoom.com/alexfru -- Robin Hugh Johnson "Robbat2" QTOD: "I used to be an idealist, but I got mugged by reality." E-Mail : robbat2 AT orbis-terrarum DOT net ICQ# : 30269588 or 41961639 Home Page : http://www.orbis-terrarum.net Time Zone : Pacific Daylight (GMT - 8)