Mail Archives: pgcc/2000/02/02/16:00:19
On Sun, 30 Jan 2000, Marc Lehmann wrote:
> > 10% is really a lot, inside a loop, which takes (about) 25 * 35 cycles.
>
> That's very much. I doubt it really is the three nops, but...
Well, AFAIK K6 family (especially K6-1) is pretty sensitive to
splitting insns over cache line boundary. Such cases slow down the
decoding of instruction. Considering importance of decoders'
performance on K6 and loop length (only 25-35 cycles as being said)
and assuming some longer insns was split this way, 10% difference
is IMHO possible.
BTW: On my K6-2, I get best performance when loops and functions are
aligned to 8 byte boundary. But this (as well as cache line end issues)
deserves more testing, so I will do so during weekend.
Have a nice day
------------------------------------------------------------------------------
Martin Ockajak a.k.a. Mandos <mandos AT hq DOT alert DOT sk> http://hq.alert.sk/~mandos
"The goal of Computer Science is to build something that will last at
least until we've finished building it."
- Raw text -