From: George Marsaglia <geo AT stat DOT fsu DOT edu>
Newsgroups: comp.lang.c,comp.os.msdos.djgpp
Subject: An inline assembler RNG for C
Date: Wed, 17 Feb 1999 13:35:29 -0500
Organization: Florida State University
Lines: 108
Message-ID: <36CB0BF1.23FF9E2@stat.fsu.edu>
NNTP-Posting-Host: dial209.acns.fsu.edu
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 4.08 [en] (Win95; I)
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

After a number of requests for random number generators
in C, I thought it worthwhile to learn enough C to suggest
a few methods.   A posting of the methods in sci.stat.math,
sci.math and sci.math.num-analysis led to a number of responses
and improvements.  The various postings may be found in those newsgroups

or via deja news (a search on LFIB4 may be the most effective).

But one of my most promising methods cannot be implemented
directly in C, and the main purpose of this posting is to
seek help on how it might be done, via inline assembler.

The method is Multiply-with-Carry for 32-bit integers.
This is the idea.  Suppose you have
          a current 32-bit integer x
          a current 32-bit carry   c
          a fixed multiplier, say, a=2083801278
Now form  a*x+c in 64 bits.
Return the bottom 32 bits as the new x,
and keep the top 32 bits as the new carry, c.

The sequence of x's produced by repeating this process
has many desirable properties.  A listing of some of them
is deferred to the end of this posting, in order to
get directly to my appeal for help in C.

The problem is to form a*x+c in 64 bits, then extract the
bottom and top halves. This is easily done in assembler.
For Intel chips, these instructions do the job:
           mov      eax,x
           mov      edx,2083801278
           mul      edx           ; a*x in edx:eax
           mov      ecx,c         ; get carry
           add      eax,ecx       ; right half of a*x+c in eax
           adc      edx,0         ; if overflow, increment edx
           mov      c,edx         ; store new carry
           mov      x,eax         ; store new x
           ret

According to various posts to this group, many C compilers
seem to permit inline assembler calls, but details vary.

If you are familiar with how inline assembler works
for C compilers you use, and can provide a version of
the above for those compilers, a posting might be of
interest to many potential users.

George Marsaglia


Advantages of the 32-bit multiply-with-carry RNG:

It exploits the built-in arithmetic modulo 2^32
that almost all modern CPU's provide.

It is very fast.

It uses multiplication, which experience has shown
does a better job of mixing bits than do other
RNG methods based on +,- or xor.

It has period "billions and billions" of times longer
than those of ordinary congruential generators.

It produces the trailing 32-bits of a congruential
generator modulo a very large prime.
If integers z(n)=a*z(n-1) mod a*2^32-1 are formed,
the sequence z(n) mod 2^32 is the sequence of x's
produced by the 32-bit multiply-with-carry method.

The trailing bits of the x's do not have the regularity
that is present in standard congruential generators
for modulus 2^32.   Indeed, if only the last bits
of successive x's are used to form a large binary file,
that file will pass all the tests in the Diehard Battery of
tests of randomness.  (http://stat.fsu.edu/pub/diehard/cdrom),
as do the full set of 32-bit integers from MWC.
The theory behind MWC is in the file mwc1.ps of the CDROM.

Any choice of multiplier 'a' from this list:
1791398085 1929682203 1683268614 1965537969 1675393560
1967773755 1517746329 1447497129 1655692410 1606218150
2051013963 1075433238 1557985959 1781943330 1893513180
1631296680 2131995753 2083801278 1873196400 1554115554
can serve as multiplier, or any 'a' for which both
 a*2^32-1 and a*2^31-1 are prime.

The method seems ideally suited for modern CPU's.
I have predicted that it will be the system
generator in  future platforms.  Because it is so
well suited for implementation directly in the
CPU, it has been adopted by several manufacturers
in the gaming industry.  If you play a slot
machine in Las Vegas, chances are  your luck---
good or bad---will be determined by the x's in a
32-bit MWC sequence. The second manufacturer to
use the method was sued by the first, on the
grounds that they were the first to provide a
hardware implementation.   Since a mathematical
algorithm cannot be patented, I don't profit from
its use.

But I, and perhaps many followers of this group,  might
profit intellectually,  should any experts provide  inline assembler
versions for C compilers we use.

gm