Computational Genomics, November 2006

Workshop II - BLAST, PSI-BLAST, and PSI-SEARCH

NCBI BLAST WWW site


These exercises use a variety of servers at the course, and remotely: list servers.

CHAPS allows you to enter a set of sequences, generate a multiple alignment, and use that multiple aligment for a PSI-BLAST search.

Additional information on the CHAPS program, which takes a set of sequences, produces a multiple alignment, and then uses the multiple alignment with PSI-BLAST, can be found here.


    Looking at profiles/PSSMs -- the effect of diversity
  1. Using the CHAPS WWW page, make a multiple alignment and generate a PSSM using the two sequences: gstm1_human, gstm2_human run CHAPS. After generating the alignment with Run ClustalW Now, select Generate PSSM Now.

    Examine the PSSM (position specific scoring matrix). Compare the values to BLOSUM62.

    The weights of each residue on shown on the right half of the PSSM

  2. Try the same process with: gstm1_human, gstm2_human, gstm3_human, gstm1_mouse run CHAPS. Does the scoring matrix or weighting change much?

  3. Try the sequences gstm1_human, gstm3_human, gstp1_human, gsta1_human, gstt1_human, gsto1_human, ptgd2_human, run CHAPS. Now look at the the scoring matrix and weighting.


    Iterative searching with PSI-BLAST
  4. Using the ECG2006 PSI-BLAST page, search the PIR1 database using gstt1_drome (gi|121694).
    1. Set Iterations to 10 and E() cutoff to 1e-4. Are the E()-values for XURTG and GSTA4_RAT the same as the ones you saw in problem 3? Does PSI-BLAST ever include a non-glutathione transferase homolog?

    2. Do the same search turning composition statistics off. Check the E()-values for XURTG and GSTA4_RAT.

    3. Search with gstt1_drome (gi|121694) against PIR1 setting the E()-cutoff to 0.01. Do any non-homologs obtain scores better than 0.01?

    4. Try the same search setting the E() cutoff to 0.2. What is the final E()-value for SYEP_HUMAN Bifunctional aminoacyl-tRNA synthetase.

  5. Do the same series of searches using OPSD_HUMAN (gi|129207). Set the E() cutoff to 1e-4 and search the PIR1 database for 10 iterations. Compare the converged results for a search with and without composition-based statistics.

  6. Try searching using the PSSM's you generated in the CHAPS/PSSM section. Search the swissprot database, which has been annotated to indicate most GST homologs.
    1. Search with two sequences: gstm1_human, gstm2_human run CHAPS
    2. Search with four sequences: gstm1_human, gstm2_human, gstm3_human, gstm1_mouse run CHAPS.
    3. Try the sequences gstm1_human, gstm3_human, gstp1_human, gsta1_human, gstt1_human, gsto1_human, ptgd2_human, run CHAPS.
    In each of the searches, try to determine how broad the initial search was, and watch out for high-scoring unrelated sequences.
    Looking at Profiles/PSSMs -- Statistics
  7. We have set up a special version of PSI-BLAST - PSI-SEARCH - that provides three statistical estimates - the normal PSI-Blast statistics, statistics from the distribution of unrelated scores calculated by SSEARCH using the PSI-BLAST PSSM, and statistics calculated using PRSS. The program first does a search at the NCBI, and then uses the PSSM used in that search to do a Smith-Waterman and PRSS against the same library.

    Try using PSI-SEARCH with the sequence GSTM1_HUMAN (121735) against the SwissProt database. At the end of each iteration, look at the E() values calculated in the three different ways for new sequences about to be included (or not) in the next iteration. Pay particular attention to the discrepancies between BLAST and SSEARCH/PRSS after the second iteration. Examine some of sequences where the E()-values differ substantially, and consider whether the "homologies" are genuine.


Computational Genomics Home Page