You are a graduate student working on a gene to make "superyeast" in Professor Sue Dohyphal's lab. You have cloned a gene from a martian strain that you believe dramatically increases fitness, called Bio5488. You insert Bio5488 into a chromosome of a lab strain. When you take one cell of the Bio5488 lab strain and place it in a culture of 99 wild-type cells the Bio5488 allele fixes 95% of the time after 1000 generations. Being an expert in Population Genetics (because of your 1st year genomics class), you decide to estimate the approximate fitness of the Bio5488 allele. To do this, you're going to use the Wright-Fisher model.
QUESTION 1: What assumptions does the Wright-Fisher model make?
For this assignment, we want you to find the lowest integer (i.e. 1,2,3....1000000)
fitness value that gives 95% fixation rate for a mutant allele.
Assume the Bio5488 allele is completely dominant meaning:
fitness(Het) = fitness(Homozygous Bio5488) and fitness(homozygous non-Bio5488)=1.
Before you start to code, WRITE OUT what you want to do (including if you want
to use hashes or arrays or whatever). Trust me, this will help a lot.
Here is an outline of the Wright-Fisher Model:
Step I: Initalize a population where each individual gets 2 alleles. In our case, we'll start with everyone homozygous for the non-Bio5488 allele except one individual is heterozygous.
Step II: Randomly pick individuals from the population based on their fitness values.
So if the fitness for individual A is 3 and individual B is 1, then it should be 3 times
more likely to pick A. Once you pick individual A, you need to pick only one of it's
alleles. To do both of these steps you're going to need to make random numbers. To do
that in Perl, use the command rand. For instance:
$random_number = rand 20;
Will give a uniformly distributed random number between 0 and 20.
$random_number = int rand 20; Will give a random integer between 0 and 19.
Another related function is srand, which you may want to look up, however it is not
neccessary. To summarize this step, start at the first individual of the next generation, using
random numbers pick an individual from the population based on fitness and then pick one
of their alleles. THEN pick the first individual in the next generation's second allele
seperately (using different random numbers).
Step III: Check for fixation of either allele. If there is fixation, record which allele fixed. Then start over with the original starting population.
Step IV: Repeat steps I-III 100 times and give the percent of the time that the Bio5488 allele fixed.
Write a program that takes as input: population size, Maximum # of generations and
Bio5488 fitness. Output the percent of the time that the Bio5488 allele fixed. Have
the program run on the command line as follows:
perl your_program.pl Population_Size
Max_#_Generations Fitness_value
Question 2: What fitness value gives 95% fixation of the Bio5488 allele, is this what you expected (use max generation=1000)?
Question 3: Now do the same for Bio5488 as a recessive allele. What fitness value gives 95% fixation of the Bio5488 allele? What happend and why?