Traditionally, students have done these exercises from the command line, either on the Mac's or on compserve1.cshl.edu.
However, we have just extended the FASTA WWW pages to support HMM searching, so you may be able to do your searches by combining CHAPS (to do the multiple alignment and build and calibrate the HMM, and the new HMM site.
| On the PCs: | On the Macs: |
|---|---|
double click the "SSH Secure Shell Client" icon found on the lower left of your desktop;
|
Open the "Terminal" (the black screen icon in the dock) application. |
pwd or
hostname or more /ecg/slib/pir1.
HMMER 2 is installed in your path, in /ecg/seqprg/bin.
The files mentioned in the tutorial are in /ecg/data/hmmer/demos/.
Copy any/all of these files to your home directory.
cp /ecg/data/hmmer/demos/globins* . (<<-- the "." is important)
hmmbuild to build an HMM from an
alignment (for example, the alignment
globins50.msf in demos/). hmmbuild globins.hmm globins50.msf
/ecg/slib/blast/wormpep and/or /ecg/slib/pir1 database
with your globin hmm:
hmmsearch globins.hmm /ecg/slib/pir1 > glob_src1.pir_outCheck to see how many globins the hmm can find in pir1. (View the file by typing:
more glob_src1.pir_outThen do a search against the C. elegans wormpep database:
hmmsearch globins.hmm /ecg/slib/wormpep > glob_src1.worm_outLook at the E() values for the high scoring worm globins and non-globins.
hmmcalibrate to determine some
statistics for your new HMM, so that HMMER can estimate
E-values fairly accurately in any subsequent searches you do
with the HMM.hmmcalibrate globins.hmm
hmmsearch globins.hmm /ecg/slib/wormpep > glob_src2.worm_out
Can the C. elegans globins (worm_glob.html) found by
hmmsearch be identified by single sequence search
(blast, PSI-blast, fasta, ssearch) ? What is the higest scoring
unrelated sequences? Is the worm globin NP_495806 (gi|17536653) found with the globin HMM?
hmmalign to align a large set of
globins (globins630.fa) using the model you've
built from the smaller set of 50.
hmmalign globins.hmm globins630.fa > globins630.out
more globins630.out
hmmsearch to start the search
with some known globins (demos/globins630.fa), and then
maybe "parse" a multidomain globin (Artemia globin is in
demos/Artemia.fa,
cp /ecg/data/hmmer/demos/Artemia.fa .
We will do the example described by Sean Eddy in the handout entitled Eddy (1998) "Multiple alignment and multiple sequence based searches" Trends Guide to Bioinformatics, 15-18
sra4_caeel
in /ecg/data/eddy/sra.lib to your unix directory:cp /ecg/data/eddy/sra.lib sra.lib
sra.lib sequences.
clustalw sra.lib(Alternatively, copy the multiple alignment
cp /ecg/data/eddy/sra.aln sra.aln
hmmbuild to build a multiple alignment of the sra.aln alignment.
hmmbuild sra.hmm sra.aln
hmmcalibrate to calibrate the sra.hmm.
hmmcalibrate sra.hmm(This takes a few minutes.)
hmmsearch to search SwissProt (this will take a long time -
please do it in groups).
hmmsearch sra.hmm /ecg/slib/swissprot > sra_hmm.results
more sra_hmm.results
Compare a few of your results to what you can find at the PFAM website's SwissPFAM repository.
You might also try running PSI-BLAST via the CHAPS interface using the sra.lib files for input. Also,
compare the form of the PSI-BLAST profile (PSSM) with the HMM model
file.