return to homepage

Appendix

All data consist of polymorphism within one species and divergence to an outgroup. The most parsimonious explanation is that a single mutation resulted in the polymorphism. The probability of mis-inference is calculated as the probability that two mutations resulted in the observed data, one which resulted in the polymorphism and one which resulted in a substitution and a mis-inference of the derived and ancestral variant. We do not consider the probability of mis-inference caused by more than two mutations.

DNA sequence data: The probability of a back-mutation, BM, for DNA sequence data is the probability that a substitution occurs during divergence between two species divided by the probability that the mutation is a back-mutation: P{BM} = d/3, where d is the percent divergence between the two species. If transitions occur at twice the rate of transversions then P{BM} can be calculated as follows:

ts = the probability of a transitional mutation: A-G or C-T.
tv = the probability of a transversional mutation A-C, A-T, G-C or G-T.
ts = tv = 1/2, empirically.

P{BM|ts} = d/2
P{BM|tv} = d/4
P{BM} = P{ts}*P{BM|ts}+P{tv}*P{BM|tv} = 3d/8

RFLP data: The probability of a back-mutation for four-cutter restriction fragment length polymorphism (RFLP) data can be calculated using the following probabilities:

Let ANC be the probability that the ancestral sequence is one mutational step away from a given restriction enzyme cut sequence and AC be the probability that the ancestral sequence is cut by a given restriction enzyme.

ANC = (1/4)^3*(3/4)*4
AC = (1/4)^4

Let NS be the probability that no substitution occurs in a sequence of four nucleotides along one lineage from the time the two species split until the present. We take d/2 to be the probability of a substitution along one of the lineages where d is the percent of nucleotide differences between the two species. Let SL be the probability that a substitution causes a loss of a restriction site. Let SG be the probability that a substitution causes a gain of a cut site given it is one mutational step away from a cut sequence.

NS = (1-d/2)^4
SL = (d/2)*4*(1-d/2)^3
SG = (d/2)*(1/3)*(1-d/2)^3

Similarly, let PL be the probability that a polymorphism causes a loss of a restriction site given the ancestral state is a cut sequence. Let PG be the probability that a polymorphism causes a gain of a restriction site given an ancestral sequence that is one mutational step away from a cut sequence. If S is the probability a site is polymorphic in a sample:

PL = 4*S*(1-S)^3
PG = (S/3)*(1-S)^3

Let P1 be the probability that the ancestral state of two species is one step away from a cut sequence and no substitution occurs on either lineage leading to the two species and that the restriction site is polymorphic.

P1 = ANC*PG*NS^2

Let P2 be the probability that the ancestral state does not cut and that a cut site is gained in the lineage leading to the species for which there is polymorphism data and that the restriction site is polymorphic.

P2 = ANC*PL*SG*NS

Let P3 be the probability that the ancestral state is a cut sequence and that it is lost in the lineage leading to the species which does not have polymorphism data and the restriction site is polymorphic.

P3 = AC*PL*SL*NS

Note that P2=P3. The probability of a back-mutation given the cut sequence is absent in the outgroup is (P2+P3)/(P1+P2+P3) = (2d+2d)/(1-d+2d+2d) = 4d/(1+3d).

Let P4 be the probability that the ancestral state is a cut sequence and that no substitution occurs on either lineage and the restriction site is polymorphic.

P4 = AC*PL*NS^2

Let P5 be the probability that the ancestral state does not cut and that a cut site is gained in the lineage leading to the species without polymorphism data and the restriction site is polymorphic.

P5 = ANC*PG*SG*NS

Let P6 be the probability that the ancestral state is a cut sequence and that the restriction site is lost on the lineage leading to the species for which there is polymorphism and the restriction site is polymorphic.

P6 = AC*PG*SL*NS

Note that P5 = P6. The probability of a back-mutation given the cut sequence is absent in the outgroup is (P5+P6)/(P4+P5+P6) = (d/6+d/6)/(1-d+d/6+d/6)=d/(3-2d).
 

 return to homepage