Assignment 6. Wright-Fisher Model

You are a graduate student working on a gene to make "superyeast" in Professor Sue Dohyphal's lab. You have cloned a gene from a martian strain that you believe dramatically increases fitness, called Bio5488. You insert Bio5488 into a chromosome of a lab strain. When you take one cell of the Bio5488 lab strain and place it in a culture of 99 wild-type cells the Bio5488 allele fixes 95% of the time after 1000 generations. Being an expert in Population Genetics (because of your 1st year genomics class), you decide to estimate the approximate fitness of the Bio5488 allele. To do this, you're going to use the Wright-Fisher model.

QUESTION 1: What assumptions does the Wright-Fisher model make?

For this assignment, we want you to find the lowest integer (i.e. 1,2,3....1000000) fitness value that gives 95% fixation rate for a mutant allele. Assume the Bio5488 allele is completely dominant meaning:

fitness(Het) = fitness(Homozygous Bio5488) and fitness(homozygous non-Bio5488)=1.

Before you start to code, WRITE OUT what you want to do (including if you want to use hashes or arrays or whatever). Trust me, this will help a lot.

Here is an outline of the Wright-Fisher Model:

Step I: Initalize a population where each individual gets 2 alleles. In our case, we'll start with everyone homozygous for the non-Bio5488 allele except one individual is heterozygous.

Step II: Randomly pick individuals from the population based on their fitness values. So if the fitness for individual A is 3 and individual B is 1, then it should be 3 times more likely to pick A. Once you pick individual A, you need to pick only one of it's alleles. To do both of these steps you're going to need to make random numbers. To do that in Perl, use the command rand. For instance:

$random_number = rand 20; Will give a uniformly distributed random number between 0 and 20.
$random_number = int rand 20; Will give a random integer between 0 and 19.

Another related function is srand, which you may want to look up, however it is not neccessary. To summarize this step, start at the first individual of the next generation, using random numbers pick an individual from the population based on fitness and then pick one of their alleles. THEN pick the first individual in the next generation's second allele seperately (using different random numbers).

Step III: Check for fixation of either allele. If there is fixation, record which allele fixed. Then start over with the original starting population.

Step IV: Repeat steps I-III 100 times and give the percent of the time that the Bio5488 allele fixed.

Write a program that takes as input: population size, Maximum # of generations and Bio5488 fitness. Output the percent of the time that the Bio5488 allele fixed. Have the program run on the command line as follows:

perl    your_program.pl    Population_Size    Max_#_Generations    Fitness_value

Question 2: What fitness value gives 95% fixation of the Bio5488 allele, is this what you expected (use max generation=1000)?

Question 3: Now do the same for Bio5488 as a recessive allele. What fitness value gives 95% fixation of the Bio5488 allele? What happend and why?


Your scripts and answers are due Friday, Feb 27th, 2009 at midnight.