Up to the Phylogenetics main page

## Compute the JC likelihood

The python program below will be modified by you to compute the likelihood of a 4-taxon tree using data from 2 sites. You will check your results using PAUP*.

Be sure to ask for help if you get stuck.

Login to your student account on the cluster and create a file named hw6.py containing the text below:

from math import exp, log

# You should *not* modify this function
# same_list holds number of edges with the same state at both ends
# v is the edge length (same for all 5 edges in the tree)
# r is the relative rate
def sitelike(same_list, v, r):
probsame = 0.25 + 0.75*exp(-4.0*r*v/3.0)
probdiff = 0.25 - 0.25*exp(-4.0*r*v/3.0)
like = 0.0
for n in same_list:
nsame=float(n)
ndiff=float(5-n)
like += 0.25*(probsame**nsame)*(probdiff**ndiff)
return like

same1 = [5, 2, 2, 2, 2, 1, 0, 0, 2, 0, 1, 0, 2, 0, 0, 1]
same2 = [2, 1, 2, 2, 1, 2, 2, 2, 0, 0, 2, 1, 0, 0, 1, 2]

# Results for JC model
v = 0.1
like1 = sitelike(same1, v, 1.0)
like2 = sitelike(same2, v, 1.0)
print('JC model (brlens = %.1f):' % v)
print(' log-likelihood for site 1 =', log(like1))
print(' log-likelihood for site 2 =', log(like2))
print(' total log-likelihood =', log(like1)+log(like2))
print()

# Results for JC+I model
v = 0.1
pinvar = 0.4
r1 = 0.0
r2 = 1.0/(1.0 - pinvar)
#like1 = pinvar*sitelike(same1, v, r1) + (1.0 - pinvar)*sitelike(same1, v, r2)
#like2 =
#print('JC+I model (brlens = %.1f, pinvar = %.1f):' % (v,pinvar))
#print(' log-likelihood for site 1 =', log(like1))
# print(' log-likelihood for site 2 =', log(like2))
# print(' total log-likelihood =', log(like1)+log(like2))
#print()

# Results for JC+G model
# v     =
# shape =
# r1    =
# r2    =
# r3    =
# r4    =
# like1 =
# like2 =
# print('JC+G model (brlens = %.1f, shape = %.1f):' % (v,shape))
# print(' log-likelihood for site 1 =', log(like1))
# print(' log-likelihood for site 2 =', log(like2))
# print(' total log-likelihood =', log(like1)+log(like2))
# print()


module load python/3.10.1


Run this program as follows to compute the site log-likelihoods for the two sites whose data is shown on the tree in Figure 1:

python3 hw6.py


You should see output like that shown below:

JC model (brlens = 0.1):
log-likelihood for site 1 = -1.8775282342232047
log-likelihood for site 2 = -9.887165347740401
total log-likelihood = -11.764693581963606


### The sitelike function

Near the top of your hw6.py file is the definition of a function named sitelike. The function sitelike has 3 parameters. The first, same_list, is a list of 16 numbers representing the number of branches (of the 5 total) in which the state is the same at both ends. There are 16 such values because there are 16 possible ancestral state combinations (this should make sense given the homework assignment you did last week).

The second parameter is v, which is the length to be used for all 5 edges (always enter 0.1 for v).

The third and final parameter is r, which is the relative rate to use when computing the site likelihood. Note that r modifies the value of v inside the transition probability formulas. So, to compute the log likelihood for site 1, just pass the list same1 as the first argument when you call the sitelike function. To compute the log likelihood for site 2, pass the list same2 into the function. Note that sitelike function computes the site likelihood, not the logarithm of the site likelihood.

## Compute the JC+I likelihood

Now modify your Python program to compute the site log-likelihoods under a JC+I model (proportion of invariable sites = $p_{\mbox{invar}}$ = 0.4).The likelihood for each site must take account of two relative rates: $r_1 = 0.0$ and $r_2 = 1/(1 − p_{\mbox{invar}})$. Note that the function sitelike does all the hard work for you and you will not need to modify it! Pass in the rate you want to use as the last argument to compute the site likelihood given that rate. Here is the general formula for a mixture model with two categories:

Note that I’ve done like1 for you. Compare that line with the above formula to see how to compute each of the four quantities needed for the site likelihood calculation. You will need to uncomment the lines I’ve commented out and provide the rest of the line beginning like2 =. Your program should print out the two site log-likelihoods and the total log-likelihood as was done for the JC model.

## Compute the JC+G likelihood

Modify the Python program again to uncomment the remaining lines in the section labeled “Results for JC+G model” and fill in the code needed to compute the site log-likelihoods under a JC+G model (gamma shape=0.2, 4 categories).

You can use PAUP* to obtain the 4 relative rates. Start PAUP* interactively (i.e. do not specify a file name when starting PAUP*)

module load paup/4.0a-166
paup


and then issue the command

gammaplot shape=0.2 ncat=4


The relative rates that you need are in the right-most column of the table at the end of the output. PAUP* also shows you the boundaries of the 4 categories, but you do not need to use those values.

lscores 1 / sitelike userbrlen;