From: root AT jacob DOT remcomp DOT fr (root) Subject: Tiled memory 14 Mar 1997 20:09:46 -0800 Approved: cygnus DOT gnu-win32 AT cygnus DOT com Distribution: cygnus Message-ID: Content-Type: text Original-To: gnu-win32 AT cygnus DOT com Original-Sender: owner-gnu-win32 AT cygnus DOT com Discussions in this group are really boring, and limit themselves to some obscure bugs in bash or so. Let's talk about something else. Something new for a change. I am adding MMX support to lcc-win32. As you may know, the MMX introduces a SIMD parallelism to the x86 architecture. Besides the obvious benefits of 8 bytes memory moves, and other goodies, this parallelism feature of the new instruction set will be a challenge for compiler writers. I will try to introduce the concept of a 'tiled' vector, using a special datatype. This vectors will be handled in parallel by the compiler, i.e. if you declare _tiled int vector1[1024],vector2[1024],vector3[1024]; you will be able to write something like: vector3 = vector1+vector2; and the compiler will add those vectors 2 adds in parallel. The dimensions must be right of course, and be known at compile time. If you declare: _tiled short vector1[2048],vector2[2048]; You will add the 16 bits numbers 4 adds in parallel. With byte operations the number goes to 8 operations in parallel. You will be able to obtain a vector of bits, comparing two strings 8 bytes at a time (using a _tiled char). Another new concept is the saturation operations. Using the _saturated keyword, adds/substracts, etc will be done using saturation arithmetic instead of normal wraparound. For instance _saturated char a = 150,b = 150,c; c = a + b; 'c' contains now 255 instead of 300-255=45 as it is now. This operators can be combined of course. Special variables will allow you to use directly the mmx registers. _mm0 to _mm7 denote the mmx registers and are 64 bits wide. This registers, aliased to the FPU registers, are NOT organized as a stack and can be addressed individually. The datatype can be described in C as: typedef union { struct { int high_32_63; int low_0_31; } int32; struct { short high_48_63; short high_32_47 short low_16-31; short low_0-15; } int16; struct { char high_56_63; char high_48_55; char high_40_47; char high_32_39; char low_24_31; char low_16_23: char low_8_15; char low_0_7; }; } _mmxData; Individual bytes/shorts/ints must be individually addressed to be able to control the pack/unpack operations. To come back to parallelism, I will borrow many concepts from the then famous but now forgotten programming language APL. I will introduce the vector operations as an extension of the normal operations, and many of the APL goodies like the inner product, the outer product, the reduce (+/ operator) etc. For instance: int sum = +/ vector; This will add the vector in parallel 2/4/8 elements at a time. The algorithm should be something like: _tiled vector[16]; _mmx0 = 0; _mmx0 += vector[0] + vector[8]; _mmx0 += vector[1] + vector[9]; ..... _mmx0 += vector[7] + vector[15]; To maximize the pipeline effect, we can use: _mmx0 = _mmx1 = _mmx2 ... = 0; _mmx0 += vector[0] + vector[8]; _mmx1 += vector[1] + vector[9]; ... etc. The 8 mmx registers are then added together in _mmx0 at the end of the operation. This will allow a theoretical 8 stage pipeline. Similar to the reduce operator we have the +\ (expand) operator. Suppose we have _tiled vector1[] = { 1 2 3 4 5 }; vector1 = +\vector2; gives: 1 3 6 10 15 (0+1) (0+1+2) (0+1+2+3) (0+1+2+3+4) (0+1+2+3+4+5) --------------------------------------------------------------- Well, I will stop here, I am wasting bandwidth, that would be better used discussing /groff/termcap/vi/bash/ls/less/old. P.S. I still see mail about 'less'. It still exists somehow, even termcap, even if there are no terminals around for ages... What is 'less'? Its goal is to display a text file isn't it? Imagine this: Several years ago, Xerox (who else) researchers published the results of playing with a graphical control to display text that presented the text to the user as a ROLL. You rolled text slowly into view. The eye has been trained by an evolution of millions of years to see the objects in 3 dimensions, so this text that rolled from the back left of the screen to the center and again to the right gave the eye cues that eased the recognition of text. A control that does that would be easy to write using the graphic 3D libraries that are everywhere... Yes but how about the termcap file for that??? :-) Have fun guys, and stop bashing bash! -- Jacob Navia Logiciels/Informatique 41 rue Maurice Ravel Tel 01 48.23.51.44 93430 Villetaneuse Fax 01 48.23.95.39 France - For help on using this list, send a message to "gnu-win32-request AT cygnus DOT com" with one line of text: "help".