Mail Archives: pgcc/2000/02/03/11:21:02
> On Sat, Jan 29, 2000 at 03:21:01AM +0100, Jan Hubicka <hubicka AT atrey DOT karlin DOT mff DOT cuni DOT cz> wrote:
> > It is. Consider memset/memcpy/strlen expanders. These can work
> > much better when they know that destination is word size aligned.
>
> source you meant ;)
Well, currently the expanders align destination, except for the strlen
(were destination does not make sense.
> > Again Intel Optimizing Manual recommends this. I believe Intel did some experiments
>
> Well, doing some experiments yourself (as you said) does not hurt
> either (if you have a pentium). While intel recommends relatively large
> alignments, "common knowledge" (linus for example ;) recommends no
> alignments at all.
>
> In _my_ tests large alignment is a very very slight win, but in the real
> world, the increased code size might not be worth it (cache effect, long
> nops, AGI because of lea-nops).
>
> It's a must on 486, though, and a bit better on ppro and later.
Yes, thats problem.
I've measured small wins in my benchmarks for alignments. But problem is
that changing the policy needs some extensive testing to prove that Intel
is wrong. I think I don't even have enought knowedge and time to do that.
The current alignment scheme (4,,7) don't do so much padding at the average
(it is 29/16 = 1.81 bytes at average if I am not mistaken) so I believe it
is not too expensive. Problem is with chained alignments (where one alignment
forces another one). This happends in switches, where code is slightly larger
than 7 bytes, loops with tiny internal loops, where padding of internal loop
hurts etc.
I've made an experiment with code shortening alignments that are too large
(i.e when loop or block they precede is shorten that suggested alignment,
it is shortened) and I've got very good results with even more aggresive
alignments with this optimization on AMD-Athlon (but it does have large
cache sizes, so the situation on Pentiums may be different).
The code is available in the egcs mailing list archives under "shorten alignments"
keyword.
Other experiment I did is on K6 (that is very touchy about too large alignments)
where I've implemented strategy of aligning loops at least two instructions from
the boundary (to not stall decoding). This patch is also available in the
mailing lists somewhere around August and brings interesting improvements
on that platform. Possibly similar strategy can apply to PPro/Athlon as well,
but I don't know the exact penalties for decoding near end of boundary there.
Also egcs now have fresh code for static branch prediction.
I would like to extend it to be able to predict number of executions for
each basic block (and thus number of repetitions of loops).
(this can be done easily when we are driven by profiler output, but I am
not sure how to implement it using the prediction algorithms. I may use
markov chains to count expected number of iterations from branch probabilities
gcc generate, but I doubt it is usefull. References to papers and other docs
are welcomed.
At the end I would like to converge to flowgraph with "expected number of
repetitions" values for each edge and basic block.
)
This may be then used to decide whteher alignment of the basic block is good
idea or not and emit much fewer alignments that we do currently.
Honza
>
> --
> -----==- |
> ----==-- _ |
> ---==---(_)__ __ ____ __ Marc Lehmann +--
> --==---/ / _ \/ // /\ \/ / pcg AT opengroup DOT org |e|
> -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE --+
> The choice of a GNU generation |
> |
- Raw text -