Computational Genomics, Oct 27, 2016

PSI-BLAST and PSSMs

NCBI BLAST WWW site


These exercises use the ECG2016 BLAST, ECG2016 CHAPS, ECG2016 PSI-BLAST WWW pages.

ECG2016 CHAPS allows you to enter a set of sequences, generate a multiple alignment, and use that multiple aligment for a PSI-BLAST search.

Additional information on the CHAPS program, which takes a set of sequences, produces a multiple alignment, and then uses the multiple alignment with PSI-BLAST, can be found here.


    Looking at profiles/PSSMs -- the effect of diversity
  1. Using the CHAPS WWW page, make a multiple alignment and generate a PSSM using the two sequences: gstm1_human, gstm2_human run CHAPS. After generating the alignment with Run ClustalW Now, select Generate PSSM Now.

    Examine the PSSM (position specific scoring matrix). Compare the values to BLOSUM62 by identifying some highly conserved positions (':'), and look at the matrix at those positions.

    The weights of each residue on shown on the right half of the PSSM.

  2. Try the same process with: gstm1_human, gstm2_human, gstm3_human, gstm1_mouse run CHAPS. Does the scoring matrix or weighting change much?

  3. Try the sequences gstm1_human, gstm3_human, gstp1_human, gsta1_human, gstt1_human, hpgds_human, run CHAPS. Now look at the the scoring matrix and weighting. (Again, look at highly conserved sites and compare to BLOSUM62.)


    Iterative searching with PSI-BLAST
  4. Using the ECG2016 PSI-BLAST [pgm] page, search the PIR1 database using to compare Honey bee glutathione transferase D1 NP_001171499/ H9KLY5_APIME [seq] (gi|295842263) to the PIR1 Annotated protein sequence database. Set Iterations to 5 and E() cutoff to 1e-4. Are the E()-values for GSTA1_RAT and GSTA4_RAT the same as the ones you saw in problem 3?

  5. Do the same search turning composition statistics off. Check the E()-values for GSTA1_RAT and GSTA4_RAT. Again, look at the Pfam structure of the most distant significant homolog.

  6. Using the ECG2016 PSI-BLAST [pgm] page, search the PIR1 database using to compare Honey bee glutathione transferase D1 NP_001171499/ H9KLY5_APIME setting the E()-cutoff to 0.01. Does PSI-BLAST ever include a non-glutathione transferase homolog?

    The honey bee GST has the same the domain structure of gstt1_drome at Pfam. Compare that to the domain structure of the most distant statistically significant sequence you found (again at Pfam ).

    Try the same search setting the E() cutoff to 0.2. What is the final E()-value for SYEP_HUMAN Bifunctional glutamate/proline--tRNA ligase.

  7. Do the same series of searches using OOHU (gi|476517). Set the E() cutoff to 1e-4 and search the PIR1 database for 10 iterations. Compare the converged results for a search with and without composition-based statistics.

  8. Try searching using the PSSM's you generated in the CHAPS/PSSM section. Search the swissprot database, which has been annotated to indicate most GST homologs.

    1. Search with two sequences: gstm1_human, gstm2_human run CHAPS

    2. Search with four sequences: gstm1_human, gstm2_human, gstm3_human, gstm1_mouse run CHAPS.

    3. Try the sequences gstm1_human, gstm3_human, gstp1_human, gsta1_human, gstt1_human, gsto1_human, hpgds_human, run CHAPS.
    In each of the searches, try to determine how broad the initial search was, and watch out for high-scoring unrelated sequences.


    Computational Genomics Home Page