Biol4230 - PSI-BLAST and PSSMs

fasta.bioch.virginia.edu/labs/biol4230/psiblast_demo.html


These exercises use the UVa BLAST, UVa CHAPS, UVa PSI-BLAST WWW pages.

UVa CHAPS allows you to enter a set of sequences, generate a multiple alignment, and use that multiple aligment for a PSI-BLAST search.

Additional information on the CHAPS program, which takes a set of sequences, produces a multiple alignment, and then uses the multiple alignment with PSI-BLAST, can be found here.


I. Watching PSI-BLAST (PSI-SEARCH2) work iteratively
  1. Using the PSI-SEARCH2 program, compare GSTM1_HUMAN against the PIR1 database using the PSI-SEARCH2 [pgm] WWW page.
    1. How many glutathione transferase homologs do you see after the first iteration?
    2. What is the E()-value of the most distant glutathione transferase? What is the most distant homolog with significant (E()<0.001) similarity? In what organism is it found? What is it's percent identity?
  2. Continue the search through a second and third iteration.
    1. After each iteration, note the number of significant homologs (E()<0.001) and the percent identity of the most distant significant homolog.
    2. Based on the percent identity, guess-timate the equivalent PAM/VT matrix after the first iteration, the second iteration, and the third iteration.

II. Looking at profiles/PSSMs -- the effect of diversity
  1. Searching with one sequence - Using the BLASTP program, compare GSTM1_HUMAN against the PIR1 database using the BLAST [pgm] WWW page.
    1. How many glutathione transferase homologs do you see?
    2. What is the E()-value of the most distant glutathione transferase? What is the most distant homolog with significant (E()<0.001) similarity? In what organism is it found?
  2. Building a PSSM (2 sequences) Using the CHAPS WWW page, make a multiple alignment and generate a PSSM using the two sequences: gstm1_human, gstm2_human run CHAPS [pgm].
    1. Build a multiple sequence alignment by selecting and
    2. Next, select
    3. Now run a BLAST search against PIR1 with GSTM1_HUMAN and the Multiple Sequence alignment you just built.
    4. How many glutathione transferase homologs do you see?
    5. What is the E()-value of the most distant glutathione transferase? What organism is it from?
    6. What is the percent identity of the most distant statistically significant glutathione transferase?
    7. Go back to the BLASTP/PSIBLAST search page with the multiple sequence alignment and search the shuffled SwissProt database. Record the alignment length, percent identity, and E()-value for the highest scoring shuffled sequence.

      Based on this result, are PSI-BLAST statistical estimates likely to be accurate? Why or why not?

  3. Searching with a diverse PSSM Try the sequences gstm1_human, gstm3_human, gstp1_human, gsta1_human, gstt1_human, gsto1_human, hpgds_human, run CHAPS [pgm].
    Repeat the previous steps for building a Multiple Sequence Alignment and sending it to PSI-BLAST.
    1. Build a multiple sequence alignment by selecting and .
      Take a quick glance at the multiple sequence alignment; how many large gapped regions do you see?

      Copy/paste the multiple sequence alignment to another file/document, so that you can compare it to the next multiple alignment.

    2. Next, select
    3. Now run a BLAST search against PIR1 with GSTM1_HUMAN and the Multiple Sequence alignment you just built.
    4. How many glutathione transferase homologs do you see?
    5. What are the E()-value of the most distant glutathione transferase, and the percent identity of the most distant GST with E() < 0.001.?
    6. Look at the near-significant alignment with SYEP_HUMAN. What is the percent identity? The alignment length? Do you think SYEP_HUMAN is likely to be a homolog?
      Examine domain content of SYEP_HUMAN at Pfam. Does it contain a homologous domain? Use the "show" domain scores to confirm your conclusion. Is the domain you found statistically significant? What is another strategy for confirming the relationship?
    7. Go back to the BLASTP/PSIBLAST search page with the multiple sequence alignment and search the shuffled SwissProt database. Record the alignment length, percent identity, and E()-value for the highest scoring shuffled sequence for this more diverse PSSM.
  4. Effects of non-homologs Try the sequences gstm1_human, gstp1_human, gstt1_human, narj_eco57, dyr_bpt4, and tpis_rabit (the last three are non-glutathione transferases) run CHAPS [pgm].
    Repeat the previous steps for building a Multiple Sequence Alignment and sending it to PSI-BLAST.
    1. Build a multiple sequence alignment by selecting and .

      Comparing this multiple alignment to the previous one you saved, does one look dramatically better than the other?

    2. Next, select
    3. Now run a BLAST search against PIR1 with GSTM1_HUMAN and the Multiple Sequence alignment you just built.
    4. How many glutathione transferase homologs do you see?
    5. Can you tell the glutathione S-transferase homologs from non-homologs?
    6. What are the E()-value of the most distant glutathione transferase? What is the percent identity of the most distant GST with E()<0.001.
    7. Go back to the BLASTP/PSIBLAST search page with the multiple sequence alignment and search the shuffled SwissProt database. Record the alignment length, percent identity, and E()-value for the highest scoring shuffled sequence for this very diverse PSSM.

biol4230