Simulation of biological evolution

by Agner Fog

stocc.zip contains two simple C++ programs showing examples of how to simulate biological evolution using a library of non-uniform random number generators. 

File list

The archive stocc.zip contains several files, of which the following are relevant to simulation of evolution:

evolc.htm
This file. Instructions
randomc.htm
Description of C++ class library of uniform random number generators used by the programs.
stocc.htm
Description of C++ class library of non-uniform random number generators used by the programs.
ex-evol1.cpp
Simulation program where selection is based on competition for a limited resource or differential predation.
ex-evol2.cpp
Simulation program where selection is based on differential fertility.
ex-stoc.cpp
Example showing how to use the non-uniform random number generators.
mersenne.cpp
A random number generators to use.
stoc1.cpp stoc3.cpp
Generators of non-uniform random variates with various distributions.
wnchyppr.cpp
Definition of Wallenius' noncentral hypergeometric distribution
fnchyppr.cpp
Definition of Fisher's noncentral hypergeometric distribution
userintf.cpp
Platform-dependent user interface functions
randomc.h stocc.h
C++ header files containing function prototypes and class definitions.

Instructions

Unpack the files randomc.zip and stocc.zip into a new directory. Make sure all the source files are in the same directory.

Compile the file ex-evol1.cpp or ex-evol2.cpp in console mode using a C++ compiler.

Run the program.

Change the parameters as you like by editing the file ex-evol1.cpp or ex-evol2.cpp at the place marked  // define parameters,  and try again.

Additional description of the uniform and non-uniform random number generators used by these programs can be found in randomc.htm and stocc.htm, respectively. Theoretical description of Wallenius' noncentral hypergeometric probability distribution, used for simulating Darwinian selection, is given at www.agner.org/random/theory.

Portability

These programs should work with all modern C++ compilers in all operating systems.

See randomc.htm and stocc.htm for more portability issues.

Theory

These programs show examples of how to make Monte Carlo simulation of biological evolution.

The algorithm works on gene pools rather than on individuals. It does not follow the fate of each individual. Instead, it calculates the probability distribution of possible outcomes and then uses a function which returns a random variable with the desired distribution. This simulation method is much faster than the traditional simulation method based on individuals.

The program assumes discrete non-overlapping generations in a panmictic (perfectly mixing) population. Inheritance is Mendelian. Selection is considered at only one locus, which can hold the genes a or b.

Two different models with different selection mechanisms are provided here. The first model assumes that selection is based on competition for a limited resource or selective predation. The second model involves selection in the form of differential breeding success.

The simulation of model 1 goes through the following steps for each generation:

  1. Mutation. The number of genes of each type that mutate in one generation follows the binomial distribution:
    mutations of gene x = binomial (number of x, mutation rate).
    It is assumed that the mutation rate is the same in both directions.
  2. Breeding. The total number of offspring follows a poisson or normal distribution.
    The number of genes of each type in the gene pool of the child generation follows a binomial distribution, where the probability of each gene is the frequency of this gene in the parent generation.
  3. Split into genotypes. Each individual has two genes at the locus in question. This makes three possible genotypes: (a,a), (a,b) and (b,b). The number of individuals with each genotype is calculated for a random combination of all the a and b genes in the child gene pool. This process requires two calls to the hypergeometric generator, one for each allele.
  4. Limitation. If the carrying capacity of the habitat is exceeded then some individuals must die. The carrying capacity may fluctuate with a normal distribution. Similar fluctuations in the population size may be caused by predation.
    The number of deaths of each genotype follows a multivariate Wallenius' noncentral hypergeometric distribution, where the odds parameter is the reciprocal of the relative fitness of the genotypes.
  5. Generation shift. The child generation becomes the parent generation of the next iteration.

Model 2 has the following steps:

  1. Mutation. Same as above.
  2. Split into genotypes. Same as above, but applied to parent generation.
  3. Split into males and females. The number of individuals of each genotype is split into males and females. This is a binomial distribution with the sex-ratio as probability parameter.
  4. Breeding potential. The maximum number of children that the parent generation can produce follows a poisson or normal distribution. If this number exceeds the carrying capacity of the habitat then it is reduced to the carrying capacity. The carrying capacity may fluctuate with a normal distribution.
  5. Differential fertility. The probability of each child genotype is calculated as follows:
    For all possible combinations of male genotype i and female genotype j, the probability of an i x j mating is calculated from the product of the number of i males and the number of j females. For each of these i x j combinations, the probability of each child genotype is calculated according to Mendel's laws. These probabilities are multiplied by the probability of an i x j mating and by the fitness of the pair, and summated for all combinations of i and j. The fitness of the pair can be either the sum or the product of the fitness contribution of each parent. The number of children of each genotype actually generated is calculated as a multivariate binomial distribution with these probabilities as parameters.
  6. Generation shift. Same as above.

These programs may form the basis of your own simulation experiments. A lot of refinements and extensions to the models are possible, and the library of non-uniform random number generators is at your disposition for the modeling of different selection mechanisms.

A more advanced simulation program, called Altruist, allows structured or viscous populations, group selection, epistatis, punctuated equilibria, etc. You can download it from the link below.

References

More details about the genetics of evolution can be found in a textbook on population genetics.

The theory of probability distributions such as the poisson, binomial, hypergeometric and normal distributions can be found in a textbook on statistics. The Wallenius' noncentral hypergeometric distribution is defined at www.agner.org/random/theory.

Further information on the uniform and non-uniform random number generators used here can be found on www.agner.org/random/.

A more advanced simulation program using these methods is the Altruist program, available from www.agner.org/evolution. This program can simulate models with structured or viscous populations with group selection, epistasis, punctuated equilibria, etc.

Copyright and licence

© 2002-2007 by Agner Fog. General public license statement.