Overview First, a few thoughts on random number generation. It's hard to do right! The root cause, of course, is that computer algorithms themselves are not truly random. (Hence this library contains pseudo-random generators only.) There are many problems in implementing algorithms correctly and efficiently, and in coming up with good tests for generators and distributions. The history of pseudorandom number generation in simulation work is mostly embarrassing. This library attempts to do a decent job of generating random numbers, as well as documenting how things work and what shortcomings there are. If you want to learn more about random number generation, the bibliography in the Swarm User Guide has useful notes. Knuth is the main reference in this realm, but too old to describe most of the particular generators used here, many of which are drawn from recent literature. |
Following are the other header files imported by <random.h>:
#import <objectbase.h> #import <random/generators.h> #import <random/distributions.h> #import <random/randomdefs.h> #import <random/randomvars.h> |
The objectbase library interface is included to provide the basic object support. randomdefs.h contains some C preprocessor macros and typedefs used in the library.
This reference guide contains the object definitions for generators and distributions (see the list above) and also encodes the inheritance structure through the "Protocols that this protocol uses" section of each protocol. Just click on a (sub-)protocol name to see what methods it implements. (You may want to review the section on Protocols in the Objective-C book here!)
In the protocols described, any protocol that ultimately inherits from CREATABLE defines an object that you can use in your program. (This is part of the Swarm defobj machinery.) In other words, while 'InternalState' is a normal protocol (a list of method definitions), the name `ACGgen' refers to both a protocol and a class that implements that protocol. Similarly, 'GammaDist' defines both a protocol and a class that implements that protocol.
All generators and distributions ultimately inherit from SwarmObject.
1.0.2 -> 1.0.3. Note: The new random library does not work in the same way as the old one. This means that some applications that used the random library provided with the 1.0.2 release will be broken. However, porting these applications to the new random library will be fairly easy since large efforts were made to adhere to the standard set with the last version and some backwards compatibility hooks were incorporated.
1.0, 1.1, 1.2, 1.3, 1.3.1, 1.4. There were no major compatibility issues in these releases.
The random library contains two kinds of objects, the generators which implement different pseudo-random-number algorithms, and the distributions which transform the (uniform) output from the generators into the desired simulated statistical distributions. (The Swarm random library does not implement any true random number generators at this time.)
All these sections have been relocated to the Swarm User Guide
Random library objects do not do anything exotic during the create phase. The competent programmer may subclass these objects in the normal manner.
This section provides implementation details for the current version of the random library.
This is release 1.4.1 of the Swarm libraries. It contains version 0.8 of the random-number library and version 0.81 of the random library documentation.
Look at the Random Library Reference for the objects defined in this library.
`Fat' vs. `Thin' doubles. Note that distributions which use floating point variates from their generators by default draw `fat' doubles (-getDoubleSample) which use two calls to the basic (32-bit) unsigned int sample method (-getUnsignedSample) in order to fill the 53-bit mantissa of a double. If you don't need this much precision, or want to speed up the distributions, you can make the distributions use `thin' samples (-getThinDoubleSample) instead. See the note at the top of random/distributions.h for how to change this behavior. If you do, be sure to remake Swarm (make, make install).
Version 0.8: Changes since version 0.75.
The code was rearranged to conform to create-phase protocol (CREATING-SETTING-USING) ordering.
Some for-loop indices were changed to unsigned integers to eliminate compiler warnings.
A few objects (C2LCGXgen, C4LCGXgen, SWBgen, TGFSRgen) were given their own -drop methods to drop internally allocated arrays properly.
A new -reset method was added to all generators. This method resets the state of the generator to what it was at creation, or at the point when -setStateFromSeed(s) was last used. Counters are also reset.
Version 0.75: Changes since version 0.7.
The method '-getDoubleSample' was redefined to use only double variables in its implementation (instead of long doubles).
The macros used for starting seed generation were changed to avoid a situation where many new generators would be created the same starting seed (if '--varyseed' was not specified.) See the Generator Usage Guide and the Reference Guide for details.
Version 0.7: Improvements over version 0.6.
A host of new generators, located on the web or in the literature, have been added since the last version of Random. There is now a total of 36 different generators defined! Some of these have immense periods, some are very fast, and some have much better statistical properties than the old generators.
A new *type* of generator, the `split' generator, has been introduced in the form of L'Ecuyer's C2LCGXgen and C4LCGXgen generators.
A `split' generator is a long-period generator for which we are able to split the period into arbitrary sub-periods, which we can access quickly. We then configure the generator as having a number (A) of `virtual generators', each of which can address a number (2^v) of sub-segments of length 2^w. These parameters (A,v,w) are user selectable when the generator is created. (As an example, for C4LCGXgen the default values are A=128, v=31, w=41.) The advantage is that the subsegments act as statistically independent streams of random numbers.
In addition to the -getUnsignedSample method, generators now also supply floating point output in the range [0.0,1.0), in the form of these methods:
-(float) getFloatSample; // using 1 unsigned value -(double) getThinDoubleSample; // using 1 unsigned value -(double) getDoubleSample; // using 2 unsigned values -(long double) getLongDoubleSample; // using 2 unsigned values |
Generators may now be started with a single seed, *or* with a vector of seeds whose length is generator dependent. (PMMLCG requires 1 integer for a seed, while MT19937 needs 624 of them.)
Generators now remember what seed values they were started with. They also count how many variates they have delivered (i.e., how many calls to -getUnsignedSample they have serviced.)
There are a few arbitrary seed values, DEFAULTSEED, DEFAULTSEED1, DEFAULTSEED2, DEFAULTSEED3, DEFAULTSEED4 defined. There is also the value FIRSTSEED, which returns the value that the default generator `randomGenerator' was started with.
The macro NEXTSEED will generate a deterministic
sequence of seed values, using and inline LCG and starting with
FIRSTSEED. There is the macro RANDOMSEED, which will be different
every time it is invoked because it depends on program time. And
there is value STARTSEED, which will by default equal NEXTSEED, but
will instead be equal to RANDOMSEED if you start your program with the
--varyseed
or -s
command line
parameter.
The generators have gained a new creation method, '+createWithDefaults: aZone', which creates the generator and initializes it with STARTSEED. Split generators get default values for A,v,w.
Version 0.7: Changes since version 0.6.
The generator classes have changed names to where they all end in '-gen'. A simple search-and-replace in your code will get you up and running again.(Or perhaps you'll want to try one of the new generators?)
A bug in SWBgen was corrected. Code for ACG and SCG was also changed.
The -verifySelf
method is
gone. Instead see the test program located in
/random/testR0. (Available in a separate tarball
at the SFI ftp site.)
The `getState:' method has been named `putStateInto: (void *) buffer', and the `setState:' method is now `setStateFrom: (void *) buffer'. A quick search-and-replace fixes things in your code.
Note: these methods have also changed somewhat, as has the size of the data being saved. As a result, v. 0.7 generators will refuse to `setStateFrom' data saved by v. 0.6 objects.
There should be fewer changes like this in the next release.
Testing Generators. Since v. 0.6 we have done some rudimentary statistical testing of the implemented generators, using Marsaglia's Diehard tests and the ENT tests. The results of these tests are summarized in Generator quality table (now found in the Swarm User Guide), where test results as well as period length, state size and execution times are listed. You can use these data to select a generator that suits your simulation. Some brief comments:
the tests show that old generators SCG and LCG are of poor quality and should be avoided.
the lagged-Fibonacci based generators (ACG, SWB, PSWB) all fail Diehard's `Birthday spacings test', for reasons having to do with their lattice structure. These generators are only conditionally recommended.
The rest of the 32-bit generators (i.e. generators that fill all 32 bits of an unsigned int) pass all tests, and are recommended at this time. (Note that while a test may show that a generator is bad, passing a number of tests does not prove that a generator is good!)
The 31-bit generators all fail the same set of tests. Some of these cannot be passed by a generator whose output has a `stuck' bit. Until I clear up with Prof. Marsaglia how to interpret these results, I believe all the 31-bit generators are in the `recommended' category.
However, a cautionary note: while the PMMLCG generators pass the tests, they have a very short period ( less than 2^31 ) and should only be used for `toy' simulations. You don't want your generator(s) to `go around' and start repeating themselves !
For what it's worth, Professor L'Ecuyer recommends his own C4LCGX and C2MRG3 generators as well as Matsumoto's TT800 (the monster MT19937 hadn't been released yet), and Prof. Marsaglia recommends his own Multiply-With-Carry generators (MWCA, MWCB, C3MWC, RWC2, RWC8="Mother").
Version 0.8: Changes over version 0.75. No functional changes were made. Code was rearranged to conform to create-phase protocol (CREATING-SETTING-USING) ordering.
Version 0.75: Changes over version 0.7. No functional changes were made.
Version 0.7: Improvements over version 0.6.
One new distribution class, BernoulliDist, has been added. It returns binary values (yes/true/1) with a given probability (while the old RandomBitDist has a fixed 50% probability, a fair coin toss.)
Distributions now have a new create method,
'+createWithDefaults: aZone'. This method creates the distribution
object, and also a new generator object for its exclusive use. Each
distribution class has a different default generator class
assigned. These generators are initialized with STARTSEED, which by
default equals the fixed value DEFAULTSEED, but will be equal to the
varying RANDOMSEED if you start your program with the command line
parameter --varyseed
or
-s
.
All distributions have code to interact with the new `split' generators.
UniformIntegerDist and UniformUnsignedDist now allow you to set parameter minValue equal to maxValue. In this case that value is returned every time.
UniformDoubleDist also allows this, even if the set [x,x) is mathematically suspect ...
NormalDist and LogNormalDist now allow you to specify zero Variance, in which case the values returned are the Mean and exp(Mean) respectively.
Version 0.7: Changes since version 0.6.
The distribution classes have changed names to where they all end in `Dist'. A simple search-and-replace in your code will get you back up and running.
The strong distinction between `frozen' and `un-frozen' distribution objects in v. 0.6 has been softened considerably. You may now set and reset the default parameters as often as you wish, and you may make calls for variates with given parameters even if different default parameters have been set.
The generation of uniform(0,1) floating point values has been moved from the distribution objects into the generator objects. Thus, if all you need is a uniform(0,1) double, you have no need of a distribution but can get what you desire from a generator.
Note that the generators fill the mantissa of a double from two 32-bit unsigned values in a different manner from v. 0.6 distributions, so output will be a bit different in the new version.
A bug in LogNormalDist has been fixed.
The `getState:' method has been named `putStateInto: (void *) buffer', and the `setState:' method is now `setStateFrom: (void *) buffer. A quick search-and-replace fixes things in your code.
But note: these methods have also changed somewhat, as has the size of the data being saved. As a result, v. 0.7 distributions will refuse to `setStateFrom' data saved by v. 0.6 objects.
Utility Objects Provided. The following objects have been defined in random/random.m. They may be accessed and used from anywhere in your program.
id <SimpleRandomGenerator> randomGenerator; id <UniformIntegerDist> uniformIntRand; id <UniformUnsignedDist> uniformUnsRand; id <UniformDoubleDist> uniformDblRand; |
Like many Open Source projects, this random-number library is a work in process. Further developments still on the to-do list are detailed below.
The following changes and additions are contemplated for the next release of Random for Swarm (though I don't promise they'll all make it in -- nor when that next release will be):
ADD a few more generators. It's good to have many different types of generators, so you can test your model's results with different generators and ensure that the results aren't artifacts of the generator used. And people seem to insist on inventing new, better ones with longer periods!
ADD more distributions. NOTE: if you have any strong opinions about what distributions or generators need to be added, please e-mail me!
TEST the generators, using Marsaglia's Diehard battery, or L'Ecuyer's tests when/if those become available. This will allow us (a) to make a choice between the implemented generators on the basis of their statistical quality, (b) to decide what old and bad generators to remove, and (c) to detect any bugs in my implementation.
TEST the distributions, to make sure they actually put out numbers according to the probability distribution and parameters used.
ELIMINATE 'bad' old generators on the basis of statistical tests
RETAIN PMMLCG as the only short-period generator, for convenience
ADD an 'empirical' distribution, whose f is defined by a set of user- supplied data
IMPLEMENT a version of getState/setState that is portable across machine architectures, so that simulations may be moved to or duplicated on other machines. (The problem: integers and doubles are stored in different byte orders on different systems.)
IMPLEMENT a proper -drop method for generators that allocate their state vectors dynamically, freeing the state vector memory to avoid `memory leakage'
REVIEW all objects for ways to make the crucial methods run faster
ADD code to make all objects meter their own usage and send the author monthly e-mails in a stealthy manner, so he can monitor usage and perhaps start collecting a usage fee for his efforts ... ;-) (Suggested by Rick Riolo. Thanks, Rick!)
<sthomme@humsci.auburn.edu>
Documentation and Implementation Status
This is version 0.8 of Random. It was donated by Sven Thommesen. Version 0.6 was a reimplementation of most of Nelson Minar's original random with many changes and a new interface. Versions 0.7 and 0.75 added many more generators and distributions and changed the interface somewhat. This version cleaned up the protocol interface definitions and fixed a few small bugs. The documentation was also improved a bit.
We are reasonably sure that the generators and distributions included here have been correctly implemented. The generators have been subjected to a battery of statistical tests, and the results are described in the documentation. The distributions have not been subjected to statistical tests yet. As with any pseudo-random number generation library, the results obtained should be examined closely. A set of test programs which exercises the objects is available on the Swarm web site, and the statistical tests are also available on the web.