PHYLIP (Phylogenetic Inference Package) provides a set of "classic" phylogeny programs that have been available since 1980 Phylip Home Page.
Unfortunately, in part because they were written in the 80's, the user interface is quite primitive, and in some ways somewhat hostile. Fortunately, the PHYLIP programs have been repackaged as part of the EMBOSS software package, which provides a much more modern command line interface around the PHYLIP programs. In addition, EMBOSS provides some other very helpful programs for producing files in the correct format.
This workshop will use the EMBOSS programs on interactive.hpc to construct evolutionary trees using protein and DNA sequences. It is possible to run the workshop on hpc, but you will NOT be able to use the EMBOSS versions of the programs.
This series of exercises will be your homework for Wednesday, March 14. Please do the exercises in a new biol4230/hwk6 directory. Though we will do this exercise interactively today, please create a phylip.sh shell script file that shows exactly the steps you used to do the analyses.
seqret -help -verboseAll of the EMBOSS programs have a -help option, that you will need to use to learn how to specify the program input and output file names, and other options.
muscle -stable -in gstm.alib -out gstm.a_aln(the -stable option ensures that the output alignment is in the same sequence order as the input)
By default, muscle writes out the result in FASTA format, which you can use to produce the DNA alignment. You may also want to write out the alignment in Clustalw format (option -clw) to look at alignment conservation.
Looking at either the FASTA or ClustalW format multiple sequence alignment, how many gaps do you see? Do you think a different alignment program would produce a different multiple sequence alignment?
tranalign -asequence gstm.nlib -bsequence gstm.a_aln -outseq gstm.n_aln
Look at the gstm.n_align file. Is it in PHYLIP format?
seqret -osformat2 phylip -sequence gstm.a_aln -outseq gstm.a_phyto reformat gstm.a_aln and gstm.n_aln alignments in FASTA format into PHYLIP format (gstm.a_phy, gstm.n_phy).
Use the fdnadist program to build a matrix of DNA distances from gstm.n_phy
fprotdist -sequence gstm.a_phy -outfile gstm.a_dist fdnadist -sequence gstm.n_phy -outfile gstm.n_dist -method f
ffitch -datafile gstm.a_dist -outtreefile gstm.a_dist_tree -outfile gstm.a_dist_log -outgrno 19When you run the program, it will ask for an (optional) -intreefile, which you do not need (or have). Just hit return, or create a file with a blank line in it (not empty, it must have one newline). If you call it "blank-line.txt", you can run:
ffitch -datafile gstm.a_dist -outtreefile gstm.a_dist_tree -outfile gstm.a_dist_log -outgrno 19 < blank-line.txtAnd the program will run properly.
cat gst_m.pdist_tree gst_m.ddist_tree gst_m.ppars_tree gst_m.dpars_tree gst_m.dml_tree > gst_m.all_trees fconsense -intreefile gst_m.all_trees
3. — multiple alignment and gaps
7. — are the trees the same, which are the orthlogs
10. — which parts of tree are consistent, which method identifies more orthologs