Biol4230 - PSI-BLAST and PSSMs

fasta.bioch.virginia.edu/labs/biol4230/psiblast_demo.html


These exercises use the UVa BLAST, UVa CHAPS, UVa PSI-BLAST WWW pages.

UVa CHAPS allows you to enter a set of sequences, generate a multiple alignment, and use that multiple aligment for a PSI-BLAST search.

Additional information on the CHAPS program, which takes a set of sequences, produces a multiple alignment, and then uses the multiple alignment with PSI-BLAST, can be found here.


Looking at profiles/PSSMs -- the effect of diversity
  1. Searching with one sequence - Using the BLASTP program, compare GSTM1_HUMAN against the PIR1 database using the BLAST [pgm] WWW page.
    1. How many glutathione transferase homologs do you see?
    2. What is the E()-value of the most distant glutathione transferase?
  2. Building a PSSM (2 sequences) Using the CHAPS WWW page, make a multiple alignment and generate a PSSM using the two sequences: gstm1_human, gstm2_human run CHAPS [pgm].
    1. Build a multiple sequence alignment by selecting and
    2. Next, select
    3. Now run a BLAST search against PIR1 with GSTM1_HUMAN and the Multiple Sequence alignment you just built.
    4. How many glutathione transferase homologs do you see?
    5. What is the E()-value of the most distant glutathione transferase?
    6. What is the percent identity of the most distant statistically significant glutathione transferase?
    7. Go back to the BLASTP/PSIBLAST search page with the multiple sequence alignment and search the shuffled SwissProt database. Record the alignment length, percent identity, and E()-value for the highest scoring shuffled sequence.
  3. Searching with a diverse PSSM Try the sequences gstm1_human, gstm3_human, gstp1_human, gsta1_human, gstt1_human, gsto1_human, hpgds_human, run CHAPS [pgm].
    Repeat the previous steps for building a Multiple Sequence Alignment and sending it to PSI-BLAST.
    1. Build a multiple sequence alignment by selecting and .
      Take a quick glance at the multiple sequence alignment; how many large gapped regions do you see?
    2. Next, select
    3. Now run a BLAST search against PIR1 with GSTM1_HUMAN and the Multiple Sequence alignment you just built.
    4. How many glutathione transferase homologs do you see?
    5. What are the E()-value of the most distant glutathione transferase, and the percent identity of the most distant GST with E() < 0.001.?
    6. Look at the near-significant alignment with SYEP_HUMAN. What is the percent identity? The alignment length? Do you think SYEP_HUMAN is likely to be a homolog?
      Examine domain content of SYEP_HUMAN at Pfam. Does it contain a homologous domain? Use the "show" domain scores to confirm your conclusion. Is the domain you found statistically significant? What is another strategy for confirming the relationship?
    7. Go back to the BLASTP/PSIBLAST search page with the multiple sequence alignment and search the shuffled SwissProt database. Record the alignment length, percent identity, and E()-value for the highest scoring shuffled sequence for this more diverse PSSM.
  4. Effects of non-homologs Try the sequences gstm1_human, gstp1_human, gstt1_human, narj_eco57, dyr_bpt4, and tpis_rabit (the last three are non-glutathione transferases) run CHAPS [pgm].
    Repeat the previous steps for building a Multiple Sequence Alignment and sending it to PSI-BLAST.
    1. Build a multiple sequence alignment by selecting and .
      Looking at the multiple sequence alignment, does it look a lot different from the mltiple sequence alignment from the previous sequences?
    2. Next, select
    3. Now run a BLAST search against PIR1 with GSTM1_HUMAN and the Multiple Sequence alignment you just built.
    4. How many glutathione transferase homologs do you see?
    5. Can you tell the glutathione S-transferase homologs from non-homologs?
    6. What are the E()-value of the most distant glutathione transferase? What is the percent identity of the most distant GST with E()<0.001.
    7. Go back to the BLASTP/PSIBLAST search page with the multiple sequence alignment and search the shuffled SwissProt database. Record the alignment length, percent identity, and E()-value for the highest scoring shuffled sequence for this very diverse PSSM.

biol4230