Computational Genomics, November 2006
Workshop II - BLAST, PSI-BLAST, and PSI-SEARCH
NCBI BLAST WWW site
These exercises use a variety of servers at the course, and remotely: list servers.
CHAPS
allows you to enter a set of sequences, generate a multiple alignment,
and use that multiple aligment for a PSI-BLAST
search.
Additional information on the CHAPS program, which takes a set of sequences,
produces a multiple alignment, and then uses the multiple alignment
with PSI-BLAST, can be found here.
Looking at profiles/PSSMs -- the effect of diversity
-
Using the CHAPS WWW
page, make a multiple alignment and generate a PSSM using
the two sequences: gstm1_human, gstm2_human run CHAPS. After generating the alignment with
Run ClustalW Now, select Generate PSSM Now.
Examine the PSSM (position specific scoring matrix). Compare the values to BLOSUM62.
The weights of each residue on shown on the right half of the PSSM
-
Try the same process with: gstm1_human, gstm2_human, gstm3_human,
gstm1_mouse run CHAPS. Does the scoring matrix or weighting
change much?
-
Try the sequences gstm1_human, gstm3_human, gstp1_human, gsta1_human,
gstt1_human, gsto1_human, ptgd2_human,
run CHAPS. Now look at the the scoring matrix
and weighting.
Iterative searching with PSI-BLAST
-
Using the ECG2006 PSI-BLAST page, search the PIR1
database using gstt1_drome (gi|121694).
-
Set Iterations to
10 and E() cutoff to 1e-4. Are the E()-values for
XURTG and GSTA4_RAT the same as the ones you saw in problem
3? Does PSI-BLAST ever include a non-glutathione transferase homolog?
-
Do the same search turning composition statistics off. Check the
E()-values for XURTG and GSTA4_RAT.
-
Search with gstt1_drome (gi|121694) against PIR1 setting the E()-cutoff to 0.01. Do any non-homologs obtain scores better than 0.01?
-
Try the same search setting the E() cutoff to 0.2. What is the final E()-value for
SYEP_HUMAN Bifunctional aminoacyl-tRNA synthetase.
-
Do the same series of searches using OPSD_HUMAN (gi|129207). Set the
E() cutoff to 1e-4 and search the PIR1 database
for 10 iterations. Compare the converged results for a search
with and without composition-based statistics.
-
Try searching using the PSSM's you generated in the CHAPS/PSSM section.
Search the swissprot database, which has been annotated to indicate most GST homologs.
- Search with two sequences: gstm1_human, gstm2_human run CHAPS
- Search with four sequences: gstm1_human, gstm2_human, gstm3_human,
gstm1_mouse run CHAPS.
- Try the sequences gstm1_human, gstm3_human, gstp1_human, gsta1_human,
gstt1_human, gsto1_human, ptgd2_human,
run CHAPS.
In each of the searches, try to determine how broad the initial search was, and watch out for high-scoring unrelated sequences.
Looking at Profiles/PSSMs -- Statistics
-
We have set up a special version of PSI-BLAST - PSI-SEARCH - that
provides three statistical estimates - the normal PSI-Blast
statistics, statistics from the distribution of unrelated scores
calculated by SSEARCH using the PSI-BLAST PSSM, and
statistics calculated using PRSS. The program first does a
search at the NCBI, and then uses the PSSM used in that search to do a
Smith-Waterman and PRSS against the same library.
Try using PSI-SEARCH with the sequence GSTM1_HUMAN (121735) against the
SwissProt database. At the end of each iteration, look at
the E() values calculated in the three different ways for new
sequences about to be included (or not) in the next iteration. Pay
particular attention to the discrepancies between BLAST and
SSEARCH/PRSS after the second iteration. Examine some of sequences
where the E()-values differ substantially, and consider whether the
"homologies" are genuine.
Computational Genomics Home Page