Mail Archives: djgpp/2000/04/18/14:19:42
[This thread should be dead by now, but I really cannot leave some things
uncorrected]
Alexei A. Frounze wrote:
>Dieter Buerssner wrote:
[The >> > is from Alexei in an reply to Kalum Somaratna aka Grendel]
>> >Yes, we have proved. We also haven't trow away all my inline ASM. The
>> >FIDIVRL trick is still alive. :)
>>
>> Wrong.
[In the same reply Alexei has written]
\begin{quote}
You've forgot (in fact, Dieter haven't mentioned) about the
FIDIVRL instruction executed in parallel to the span() function.
This is a real trick that makes difference. Even Dieter didn't
change it and left this piece of my inline ASM AS-IS.
\end{quote}
I did change this. And I mentioned everything. I especially
mentioned, that for one test, I changed part of the inline assembly
to C code. (I did this at places, where it seemed to me, that
the inline assembly would not have much inpact to the performance.)
I also mentioned, that for the other test, I got rid of all your inline
assembly (and adding one new line of inline assembly). So, the
quotes are just plain wrong.
>It's not wrong, since I don't get your results with (USEC=USEC2=1 and -O
>switch). I get it *slower*. And I have no idea what's up.
Don't you see, that the these sentences tell something totally
different, than the quotes. I never stated that you will be
able to reproduce my numbers. "It's not wrong, since ..."
doesn't make any sense.
Alexei, reread the thread. I think, I has always tried to write
exactly what I have done. Your statements make me look like a lier.
They are often out of context. I have reported the numbers exactly
like I have told you in my post about this stupid bet. Without
any of your inline assembly, I got exactly the same performance
here. I have no doubt, that you might measure something different.
I don't call you a lier. It really doesn't surprise me, that the
results are highly machine dependant. But from looking at the
asm output (I use fsdb after compiling with -g, it shows nicely
C source and asm together, but there exist other means), it seems to me,
that there shouldn't be a big difference at all for T_Map() with
and without inline assembly (besides the rounding to int, which
I coded by one inline function). I explained, that you use the
FPU stack efficiently. Some of this advantage, you lost by all
those references. Count the FPU instructions in the .s output,
and you will see, that the C version will need as many
fmul/fdiv etc. instructions. It will need quite a few fxch instructions,
that you don't need. It will need to discard the top of the floating
point stack a few times, where you don't need it. These things can be
very CPU dependant. The C code will avoid many adress calculations,
to make up for it.
Also, if you think that pairing of the fidivr with span is really
important, you *might* be able to get it with the C code as well.
I delayed that part of the C code till after the span, because
it was just a very little bit faster here. The C code is still
there in comments. Gcc will not use fidivr, it will use
fdivr instead. Obviously gcc decided, to trade an inverse
division by an integer (compile time constant), with an
inverse divisision by a floating-point constant.
You might have optimzed your code exactly for your processor.
The C code isn't optimized to any processor, it is just what
usual coding principles suggests. (And you obviously thought
the same about this, because much is almost unchanged form
your comments). The "optimization" of saving two divisions
for three multiplies, at least IHMO, is not allowed to be done
by the compiler, but that is a whole different issue.
The numbers I have written are true, they are for the first
screen of your program. I have not bothered, to find any MIN/MAX,
but playing around a little bit, I can essentially see no difference
between the C code and the inline assembly.
- Raw text -