Homework 6 (Bloodroot problem)
Up to the Phylogenetics main page
Background
You will analyze a (real) data set of five flowering plant sequences. Three sequences are from plants in the poppy family: Sanguinaria (Bloodroot), Eschscholzia (California Poppy), and Bocconia (Tree Poppy). The remaining two sequences are from monocots: Oryza (Rice, grass family) and Disporum (Fairy Bells, crocus family). The 2 monocot taxa are quite distantly related to the 3 dicot taxa in the poppy family, so we expect the phylogeny to have an edge that separates Sanguinaria, Eschscholzia, and Bocconia from Oryza and Disporum.
What to do
Download the following data file (sequence alignment) to your account on the cluster:
curl -O https://gnetum.eeb.uconn.edu/courses/phylogenetics/bloodroot.nex
Carry out the following analyses using PAUP*:
- use the
alltreescommand to perform an exhaustive search using maximum likelihood (using the default model, which is HKY85 with trs:trv ratio 2 and empirical nucleotide frequencies); - use the
lscorescommand to obtain the log likelihood of the best tree; and - use the
bootstrapcommand to perform a bootstrap analysis (1000 replicates).
You should create a PAUP* nexus-formatted command file that carries out all three analyses (like the run.nex file we created in the first lab).
Do all of the above for each of these three sets of sites:
- Left half: Include only sites 1-180
- Right half: Include only sites 181-402
- Concatenated: Include all sites (1-402)
To include only sites 181-402, for example, use the exclude all command followed by the include 181-402 command in PAUP. You can either do everything in one PAUP command file or create three separate command files, one for analyzing the left half, the right half, and all sites.
What to turn in
-
For each of the three analyses, draw (using your hand, not FigTree!) the bootstrap consensus as an unrooted tree and show the bootstrap frequencies on the internal edges. To save writing, you can abbreviate the taxa as follows: S (Sanguinaria), E (Eschscholzia), B (Bocconia), O (Oryza), and D (Disporum).
-
Add together the log-likelihoods for the “left half” and “right half” and compare it to the log-likelihood from the “concatenated”. Is the data less surprising if you allow each half of the gene to have its own tree topology or is the data less surprising if all sites share the same topology?
-
Which of the 3 bootstrap analyses yields the highest confidence (sum the two bootstrap values)? Does this agree or disagree with your answer for question 2?
-
To make it easier for me to determine what you did, please send me your PAUP* command file. If you have created more than one command file, please put them in a directory and create a zip file of that directory.
Food for thought
Think about how you might go about explaining these results. You do not need to tell me your thoughts: we will discuss this in lecture after everyone has finished it.