Homework 6 (Bloodroot problem)

Background

You will analyze a (real) data set of five flowering plant sequences. Three sequences are from plants in the poppy family: Sanguinaria (Bloodroot), Eschscholzia (California Poppy), and Bocconia (Tree Poppy). The remaining two sequences are from monocots: Oryza (Rice, grass family) and Disporum (Fairy Bells, crocus family). The 2 monocot taxa are quite distantly related to the 3 dicot taxa in the poppy family, so we expect the phylogeny to have an edge that separates Sanguinaria, Eschscholzia, and Bocconia from Oryza and Disporum.

What to do

Download the following data file (sequence alignment) to your account on the cluster:

curl -O https://gnetum.eeb.uconn.edu/courses/phylogenetics/bloodroot.nex

Carry out the following analyses using PAUP*:

use the alltrees command to perform an exhaustive search using maximum likelihood (using the default model, which is HKY85 with trs:trv ratio 2 and empirical nucleotide frequencies);
use the lscores command to obtain the log likelihood of the best tree; and
use the bootstrap command to perform a bootstrap analysis (1000 replicates).

You should create a PAUP* nexus-formatted command file that carries out all three analyses (like the run.nex file we created in the first lab).

Do all of the above for each of these three sets of sites:

Left half: Include only sites 1-180
Right half: Include only sites 181-402
Concatenated: Include all sites (1-402)

To include only sites 181-402, for example, use the exclude all command followed by the include 181-402 command in PAUP. You can either do everything in one PAUP command file or create three separate command files, one for analyzing the left half, the right half, and all sites.

What to turn in

For each of the three analyses, draw (using your hand, not FigTree!) the bootstrap consensus as an unrooted tree and show the bootstrap frequencies on the internal edges. To save writing, you can abbreviate the taxa as follows: S (Sanguinaria), E (Eschscholzia), B (Bocconia), O (Oryza), and D (Disporum).
Add together the log-likelihoods for the “left half” and “right half” and compare it to the log-likelihood from the “concatenated”. Is the data less surprising if you allow each half of the gene to have its own tree topology or is the data less surprising if all sites share the same topology?
Which of the 3 bootstrap analyses yields the highest confidence (sum the two bootstrap values)? Does this agree or disagree with your answer for question 2?
To make it easier for me to determine what you did, please send me your PAUP* command file. If you have created more than one command file, please put them in a directory and create a zip file of that directory.

Food for thought

Think about how you might go about explaining these results. You do not need to tell me your thoughts: we will discuss this in lecture after everyone has finished it.