delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/1999/03/05/12:03:56

Message-ID: <19990305170528.43643@insula.local>
Date: Fri, 5 Mar 1999 17:05:28 +0000
From: Philipp Rumpf <prumpf AT jcsbs DOT lanobis DOT de>
To: pgcc AT delorie DOT com
Subject: Re: SSI/KNI support (was RE: Intel/Cygnus)
References: <19990304152121 DOT 42144 AT insula DOT local> <000001be66ff$5c17a660$3bd16482 AT ellemtel DOT se>
Mime-Version: 1.0
X-Mailer: Mutt 0.89.1
In-Reply-To: <000001be66ff$5c17a660$3bd16482@ellemtel.se>; from David Jonsson on Fri, Mar 05, 1999 at 12:57:34PM +0100
X-Accept-Language: en,de,se
Reply-To: pgcc AT delorie DOT com

> > I cannot see what is so difficult about it[1] ... I think it is 
> > just a special case of loop unrolling.
> > 
> > char *p;
> > int i;
> > 
> > for(i=0; i<4; i++)
> > 	p[i] |= 0x80;
> > 
> > should become a 32-bit OR ... once we can do that, the rest of 
> > SIMD should be trivial[2]
> 
> What you write is trivial if it is allowed. I am no compiler expert but I don't think
> that a compiler is allowed to unroll that loop. It isn't obvious that p and i are
> independent.

As long as I do not take the address of i (or declare i as "register") it is allowed
IIRC.

> Or p[0] and p[1] etc.

p[0] and p[1] have to be independent (they have different addresses after all).

> > The macro approach has additional advantages though, I really 
> > would not like to get
> > 11 bits precision for a normal float though I probably would not 
> > mind sometimes.
> 
> This is enough many times like for sound-processing or simple geometry.

Sure. "short float" ?

> > [2] - Well, it could be a bit difficult to ensure a float * is 
> > 128-bit aligned ...
> 
> Just align all memory on 128-bit boundaries when compiling or what about a
> new type like Randy Fisher's libmmx http://min.ecn.purdue.edu/~rfisher/Research/
> Libmmx/libmmx.html

I think the basic idea is to compile unchanged normal C code to optimized code using
SIMD features in the architecture, either those explicitly provided (MMX, 3dN, KNI,
MVI aso) or those implicitly existant (almost all bitwise operators can be done using
the full wordsize).
What you propose is bound to one architecture and requires rewriting code. As soon as
you do that, you could write assembly as well.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019