Message-ID: <19990305170528.43643@insula.local> Date: Fri, 5 Mar 1999 17:05:28 +0000 From: Philipp Rumpf To: pgcc AT delorie DOT com Subject: Re: SSI/KNI support (was RE: Intel/Cygnus) References: <19990304152121 DOT 42144 AT insula DOT local> <000001be66ff$5c17a660$3bd16482 AT ellemtel DOT se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.89.1 In-Reply-To: <000001be66ff$5c17a660$3bd16482@ellemtel.se>; from David Jonsson on Fri, Mar 05, 1999 at 12:57:34PM +0100 X-Accept-Language: en,de,se Reply-To: pgcc AT delorie DOT com > > I cannot see what is so difficult about it[1] ... I think it is > > just a special case of loop unrolling. > > > > char *p; > > int i; > > > > for(i=0; i<4; i++) > > p[i] |= 0x80; > > > > should become a 32-bit OR ... once we can do that, the rest of > > SIMD should be trivial[2] > > What you write is trivial if it is allowed. I am no compiler expert but I don't think > that a compiler is allowed to unroll that loop. It isn't obvious that p and i are > independent. As long as I do not take the address of i (or declare i as "register") it is allowed IIRC. > Or p[0] and p[1] etc. p[0] and p[1] have to be independent (they have different addresses after all). > > The macro approach has additional advantages though, I really > > would not like to get > > 11 bits precision for a normal float though I probably would not > > mind sometimes. > > This is enough many times like for sound-processing or simple geometry. Sure. "short float" ? > > [2] - Well, it could be a bit difficult to ensure a float * is > > 128-bit aligned ... > > Just align all memory on 128-bit boundaries when compiling or what about a > new type like Randy Fisher's libmmx http://min.ecn.purdue.edu/~rfisher/Research/ > Libmmx/libmmx.html I think the basic idea is to compile unchanged normal C code to optimized code using SIMD features in the architecture, either those explicitly provided (MMX, 3dN, KNI, MVI aso) or those implicitly existant (almost all bitwise operators can be done using the full wordsize). What you propose is bound to one architecture and requires rewriting code. As soon as you do that, you could write assembly as well.