Workshop IV - Exercises with profile HMM methods

1. Practical use of a profile HMM software package

Traditionally, students have done these exercises from the command line, either on the Mac's or on compserve1.cshl.edu.

However, we have just extended the FASTA WWW pages to support HMM searching, so you may be able to do your searches by combining CHAPS (to do the multiple alignment and build and calibrate the HMM, and the new HMM site.
On the PCs:On the Macs:
double click the "SSH Secure Shell Client" icon found on the lower left of your desktop;
  • click "Quick Connect" in the upper left corner
  • hostname: compserve1.cshl.org or courseXX.cshl.org
  • user: courseXX
  • password: [ask us]
Open the "Terminal" (the black screen icon in the dock) application.
In both cases the login is "courseXX" and the password is "[ask us]". You should now get a new window with a UNIX command line prompt. Try typing a few simple UNIX commands such as pwd or hostname or more /ecg/slib/pir1.

HMMER 2 is installed in your path, in /ecg/seqprg/bin. The files mentioned in the tutorial are in /ecg/data/hmmer/demos/. Copy any/all of these files to your home directory.
cp /ecg/data/hmmer/demos/globins* . (<<-- the "." is important)

I. Core functionality

  1. Use the program hmmbuild to build an HMM from an alignment (for example, the alignment globins50.msf in demos/).
    hmmbuild globins.hmm globins50.msf

  2. Search the /ecg/slib/blast/wormpep and/or /ecg/slib/pir1 database with your globin hmm:
    hmmsearch globins.hmm /ecg/slib/pir1 > glob_src1.pir_out
    Check to see how many globins the hmm can find in pir1. (View the file by typing:
    more glob_src1.pir_out
    Then do a search against the C. elegans wormpep database:
    hmmsearch globins.hmm /ecg/slib/wormpep > glob_src1.worm_out
    Look at the E() values for the high scoring worm globins and non-globins.

  3. Use the program hmmcalibrate to determine some statistics for your new HMM, so that HMMER can estimate E-values fairly accurately in any subsequent searches you do with the HMM.
    hmmcalibrate globins.hmm

  4. Do the wormpep search again with the calibrated globin.hmm for new globins.
    hmmsearch globins.hmm /ecg/slib/wormpep > glob_src2.worm_out

    Can the C. elegans globins (worm_glob.html) found by hmmsearch be identified by single sequence search (blast, PSI-blast, fasta, ssearch) ? What is the higest scoring unrelated sequences? Is the worm globin NP_495806 (gi|17536653) found with the globin HMM?

  5. Use the program hmmalign to align a large set of globins (globins630.fa) using the model you've built from the smaller set of 50.

    hmmalign globins.hmm globins630.fa > globins630.out
    more globins630.out

  6. Use the program hmmsearch to start the search with some known globins (demos/globins630.fa), and then maybe "parse" a multidomain globin (Artemia globin is in demos/Artemia.fa,
    cp /ecg/data/hmmer/demos/Artemia.fa .
    the results are in (demos/Artemia.html).

II. Comparison of HMM searching with other approaches

We will do the example described by Sean Eddy in the handout entitled Eddy (1998) "Multiple alignment and multiple sequence based searches" Trends Guide to Bioinformatics, 15-18

  1. Use the FASTA search page and/or PSI-BLAST to compare sra4_caeel (gi 1176649) to the SwissProt database.

  2. Using PSI-Blast, identify the protein family.

  3. Login to the unix server.

  4. Copy the library of 32 sequences clearly related to sra4_caeel in /ecg/data/eddy/sra.lib to your unix directory:
    cp /ecg/data/eddy/sra.lib sra.lib

  5. Use the clustalw program to align the sra.lib sequences.
    clustalw sra.lib
    (Alternatively, copy the multiple alignment
    cp /ecg/data/eddy/sra.aln sra.aln

  6. Use hmmbuild to build a multiple alignment of the sra.aln alignment.
    hmmbuild sra.hmm sra.aln

  7. Use hmmcalibrate to calibrate the sra.hmm.
    hmmcalibrate sra.hmm
    (This takes a few minutes.)

  8. Use hmmsearch to search SwissProt (this will take a long time - please do it in groups).
    hmmsearch sra.hmm /ecg/slib/swissprot > sra_hmm.results
    (You can then view the results by typing more sra_hmm.results

    Compare a few of your results to what you can find at the PFAM website's SwissPFAM repository.


    You might also try running PSI-BLAST via the CHAPS interface using the sra.lib files for input. Also, compare the form of the PSI-BLAST profile (PSSM) with the HMM model file.