delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/1997/03/14/20:09:46

From: root AT jacob DOT remcomp DOT fr (root)
Subject: Tiled memory
14 Mar 1997 20:09:46 -0800 :
Approved: cygnus DOT gnu-win32 AT cygnus DOT com
Distribution: cygnus
Message-ID: <m0w5gNY-000AK1C.cygnus.gnu-win32@jacob.remcomp.fr>
Original-To: gnu-win32 AT cygnus DOT com
Original-Sender: owner-gnu-win32 AT cygnus DOT com

Discussions in this group are really boring, and limit themselves 
to some obscure bugs in bash or so. Let's talk about something else.
Something new for a change.

I am adding MMX support to lcc-win32.
As you may know, the MMX introduces a SIMD parallelism to the x86 
architecture. Besides the obvious benefits of 8 bytes memory moves,
and other goodies, this parallelism feature of the new instruction 
set will be a challenge for compiler writers.

I will try to introduce the concept of a 'tiled' vector, using a
special datatype. This vectors will be handled in parallel by the
compiler, i.e. if you declare

	_tiled int vector1[1024],vector2[1024],vector3[1024];

you will be able to write something like:

	vector3 = vector1+vector2;

and the compiler will add those vectors 2 adds in parallel. The
dimensions must be right of course, and be known at compile time.

If you declare:

	_tiled short vector1[2048],vector2[2048];

You will add the 16 bits numbers 4 adds in parallel. With byte 
operations the number goes to 8 operations in parallel. You will
be able to obtain a vector of bits, comparing two strings 8 bytes
at a time (using a _tiled char).

Another new concept is the saturation operations. Using the
_saturated keyword, adds/substracts, etc will be done using saturation
arithmetic instead of normal wraparound. For instance

	_saturated char a = 150,b = 150,c;
	c = a + b;

'c' contains now 255 instead of 300-255=45 as it is now.
This operators can be combined of course.

Special variables will allow you to use directly the mmx registers.
_mm0 to _mm7 denote the mmx registers and are 64 bits wide. This
registers, aliased to the FPU registers, are NOT organized as a stack
and can be addressed individually. The datatype can be described in C as:
typedef union {
	struct {
		int high_32_63;
		int low_0_31;
	} int32;
	struct {
		short high_48_63;
		short high_32_47
		short low_16-31;
		short low_0-15;
	} int16;
	struct {
		char	high_56_63;
		char	high_48_55;
		char	high_40_47;
		char	high_32_39;
		char	low_24_31;
		char	low_16_23:
		char	low_8_15;
		char	low_0_7;
	};
} _mmxData;

Individual bytes/shorts/ints must be individually addressed to be
able to control the pack/unpack operations.

To come back to parallelism, I will borrow many concepts from the
then famous but now forgotten programming language APL. I will
introduce the vector operations as an extension of the normal operations,
and many of the APL goodies like the inner product, the outer product,
the reduce (+/ operator) etc. For instance:
	int sum = +/ vector;
This will add the vector in parallel 2/4/8 elements at a time. The
algorithm should be something like:

	_tiled vector[16];

	_mmx0 = 0;
	_mmx0 += vector[0] + vector[8];
	_mmx0 += vector[1] + vector[9];
	.....
	_mmx0 += vector[7] + vector[15];

To maximize the pipeline effect, we can use:

	_mmx0 = _mmx1 = _mmx2 ... = 0;
	_mmx0 += vector[0] + vector[8];
	_mmx1 += vector[1] + vector[9];
	...
	etc. 
The 8 mmx registers are then added together in _mmx0 at the
end of the operation. This will allow a theoretical 8 stage
pipeline.

Similar to the reduce operator we have the +\ (expand)
operator.

Suppose we have

	_tiled vector1[] = { 1 2 3 4 5 };
	vector1 = +\vector2;
	gives:
	1       3        6          10          15
	(0+1) (0+1+2) (0+1+2+3) (0+1+2+3+4) (0+1+2+3+4+5)
---------------------------------------------------------------

Well, I will stop here, I am wasting bandwidth, that would be
better used discussing /groff/termcap/vi/bash/ls/less/old.

P.S. I still see mail about 'less'. It still exists somehow, even
termcap, even if there are no terminals around for ages...

What is 'less'?
Its goal is to display a text file isn't it? 

Imagine this:

Several years ago, Xerox (who else) researchers published the
results of playing with a graphical control to display text that
presented the text to the user as a ROLL. You rolled text slowly
into view. The eye has been trained by an evolution of millions
of years to see the objects in 3 dimensions, so this text that
rolled from the back left of the screen to the center and again 
to the right gave the eye cues that eased the recognition of text.

A control that does that would be easy to write using the graphic
3D libraries that are everywhere...

Yes but how about the termcap file for that??? :-)

Have fun guys, and stop bashing bash!

-- 
Jacob Navia	Logiciels/Informatique
41 rue Maurice Ravel			Tel 01 48.23.51.44
93430 Villetaneuse 			Fax 01 48.23.95.39
France
-
For help on using this list, send a message to
"gnu-win32-request AT cygnus DOT com" with one line of text: "help".

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019