Mail Archives: djgpp/1998/02/13/05:15:34
From: | Till Harbaum <harbaum AT ibr DOT cs DOT tu-bs DOT de>
|
Newsgroups: | comp.os.msdos.djgpp
|
Subject: | Re: cmpl takes 14 clk cycles on a Pentium ???
|
Date: | 13 Feb 1998 10:05:11 +0100
|
Organization: | TU Braunschweig, Informatik (Bueltenweg), Germany
|
Lines: | 30
|
Distribution: | world
|
Message-ID: | <yks3ehneuyg.fsf@flens.ibr.cs.tu-bs.de>
|
References: | <199802130328 DOT TAA12256 AT adit DOT ap DOT net>
|
NNTP-Posting-Host: | flens.ibr.cs.tu-bs.de
|
Mime-Version: | 1.0 (generated by tm-edit 7.106)
|
To: | djgpp AT delorie DOT com
|
DJ-Gateway: | from newsgroup comp.os.msdos.djgpp
|
Nate Eldredge <eldredge AT ap DOT net> writes:
>
> At 04:46 2/12/1998 -0500, Mario Deschenes wrote:
> >Hi everyone,
> >
> > I'm using RDTSC to profile a routine and I got something strange. My
> >routine looks like:
> [snipped]
> This is somewhat of a shot in the dark, since I am no hardware guru. (Btw,
> you know that `align X' aligns to the nearest 2^X boundary, right?) What
> seems most likely to me is some kind of caching or prefetch issue. Perhaps
> when the target of the jump is close enough, it is already in a cache and is
> fetched faster. But when it's farther away, a new chunk has to be fetched
> from real memory, which is slower.
>
I think this is the right idea. Most modern cpu's do some kind of
burst read ahead of there code. This means: If the cpu reads an
instruction word it initiates a burst transfer from ram to cache and
reads some data it will likely need in the future (the 68040 for
example always reads 4 quadwords, even if it doesn't need them).
The bahaviour of the code also depends on the implementation
of the branch prediction unit of the cpu (which covers a big area
of the pentium die, i think, so it should be very good :-).
Switch of all caching and look if you still measure those differnces.
Ciao,
Till
- Raw text -