Mail Archives: pgcc/1999/03/05/12:03:56
> > I cannot see what is so difficult about it[1] ... I think it is
> > just a special case of loop unrolling.
> >
> > char *p;
> > int i;
> >
> > for(i=0; i<4; i++)
> > p[i] |= 0x80;
> >
> > should become a 32-bit OR ... once we can do that, the rest of
> > SIMD should be trivial[2]
>
> What you write is trivial if it is allowed. I am no compiler expert but I don't think
> that a compiler is allowed to unroll that loop. It isn't obvious that p and i are
> independent.
As long as I do not take the address of i (or declare i as "register") it is allowed
IIRC.
> Or p[0] and p[1] etc.
p[0] and p[1] have to be independent (they have different addresses after all).
> > The macro approach has additional advantages though, I really
> > would not like to get
> > 11 bits precision for a normal float though I probably would not
> > mind sometimes.
>
> This is enough many times like for sound-processing or simple geometry.
Sure. "short float" ?
> > [2] - Well, it could be a bit difficult to ensure a float * is
> > 128-bit aligned ...
>
> Just align all memory on 128-bit boundaries when compiling or what about a
> new type like Randy Fisher's libmmx http://min.ecn.purdue.edu/~rfisher/Research/
> Libmmx/libmmx.html
I think the basic idea is to compile unchanged normal C code to optimized code using
SIMD features in the architecture, either those explicitly provided (MMX, 3dN, KNI,
MVI aso) or those implicitly existant (almost all bitwise operators can be done using
the full wordsize).
What you propose is bound to one architecture and requires rewriting code. As soon as
you do that, you could write assembly as well.
- Raw text -