Computational Genomics, November 2011

Workshop III - BLAST, PSI-BLAST, and HMMER3

NCBI BLAST WWW site

CHAPS allows you to enter a set of sequences, generate a multiple alignment, and use that multiple aligment for a PSI-BLAST search.

Additional information on the CHAPS program, which takes a set of sequences, produces a multiple alignment, and then uses the multiple alignment with PSI-BLAST, can be found here.


    Looking at profiles/PSSMs -- the effect of diversity
  1. Using the CHAPS WWW page [pgm], make a multiple alignment and generate a PSSM using the two sequences: gstm1_human, gstm2_human run CHAPS. After generating the alignment with Do multiple alignment now, select Generate PSSM Now.

    Examine the PSSM (position specific scoring matrix). Compare the values to BLOSUM62.

    The weights of each residue on shown on the right half of the PSSM

  2. Try the same process with: gstm1_human, gstm2_human, gstm3_human, gstm1_mouse run CHAPS. Does the scoring matrix or weighting change much?

  3. Try the sequences gstm1_human, gstm3_human, gstp1_human, gsta1_human, gstt1_human, gsto1_human, ptgd2_human, run CHAPS. Now look at the the scoring matrix and weighting.


    Iterative searching with PSI-BLAST
  4. Do a BLASTP [pgm] search of PIR1 using gstt1_drome (gi|121694). Check the scores for GSTA1_RAT, GSTA4_RAT and GSTM2_RAT?

  5. Using the ECG2011 PSI-BLAST page, search the PIR1 database using gstt1_drome (gi|121694).

    1. Set Iterations to 10 and E() cutoff to 1e-4. Are the E()-values for GSTA1_RAT and GSTA4_RAT the same as the ones you saw in problem 4 above? Does PSI-BLAST ever include a non-glutathione transferase homolog?

    2. Do the same search turning composition statistics off. Check the E()-values for GSTA1_RAT and GSTA4_RAT.

    3. Search with gstt1_drome (gi|121694) against PIR1 setting the E()-cutoff to 0.01. Do any non-homologs obtain scores better than 0.01?

    4. Try the same search setting the E() cutoff to 0.2. What is the final E()-value for SYEP_HUMAN Bifunctional aminoacyl-tRNA synthetase.


    Searching with Hidden Markov Models - HMMER3

  6. Do a PHMMER [pgm] search of PIR1 using gstt1_drome (gi|121694). Check the scores for GSTA1_RAT, GSTA4_RAT and GSTM2_RAT?

    Notice that PHMMER reports two sets of E()-values and scores for each library sequence, a "full sequence score" and a "best 1 domain" score. Which alignments have more than one domain?

    Compare the alignment lengths for GSTF3_MAIZE and SSPA_ECO57 using PHMMER, BLASTP, and SSEARCH [pgm].

  7. Try searching using the PSSM's you generated in the CHAPS/PSSM section. Search the swissprot database, which has been annotated to indicate most GST homologs.
    1. Search with two sequences: gstm1_human, gstm2_human run CHAPS [pgm]
    2. Search with four sequences: gstm1_human, gstm2_human, gstm3_human, gstm1_mouse run CHAPS [pgm].
    3. Try the sequences gstm1_human, gstm3_human, gstp1_human, gsta1_human, gstt1_human, gsto1_human, ptgd2_human, run CHAPS [pgm].
    In each of the searches, try to determine how broad the initial search was, and watch out for high-scoring unrelated sequences.


  8. Go to the hmmsearch compare two sequences page and search for fibronectin domains in the Drosophila sevenless protein. You can download the fibronectin domain HMM from Pfam PF00041 or get it here . Paste/upload it into the upper box, and use it to compare to 7LESS_DROME.

    1. How many domains are found?
    2. What are their lengths?
    3. How many are significant?
    4. How does the mapping of domains compare to the annotation in SwissProt?
    5. Which parts are most reliably aligned?

  9. Use the same HMM to search for FN3 containing proteins in pir1 using hmmsearch. How do the individual domain E-values compare in other FN3 containing homologs?


Computational Genomics Home Page