Biol4230 - Similarity Searching Exercises II

These exercises use programs on the FASTA WWW Search page[pgm] and the BLAST WWW Search page [pgm].

In the links below, [pgm] indicates a link with most of the information filled in; e.g. the program name, query, and library. [seq] links go to the NCBI, for more information about the sequence. In general, you should click [pgm] links, but not [seq] links.

1. Exploring domains with local alignments --- Calmodulin
  1. Use lalign to examine local similarities between calmodulin CALM_HUMAN and itself. (Set Annotate query and Annotate target to Interpro domains.)
  2. Use plalign to plot the same alignment. How many repeats are present in this sequence.

2. Exploring domains and alignment over-extension -- cortactin (SRC8_HUMAN)

Compare SRC8_HUMAN [pgm] to compare human cortactin SRC8_HUMAN [seq] the SwissProt protein sequence database.

  1. Looking at the top five alignments, how many cortactin orthologs do you see? (ortholog, same protein, different species).
  2. In the SRC8 HUMAN:CHICK alignment, both the query and the subject (library) sequences align seven cortactin domains and an SH3 domain. In addition, two regions (one before the cortactin domain cluster and one after) are well conserved, but do not have annotated domains (NODOM). Are these non-domain (NODOM) regions as well conserved as the annotated domains?
  3. Look at the SRC8_HUMAN:HCLS1_MOUSE alignment. How many cortactin domains does HCLS1_MOUSE contain? How much score does the NODOM spanning the region between cortactin domains and the SH3 domain contribute? Why is it included in the alignment? Is it likely to be homologous?
  4. Is the NODOM between the cortactin domains and the SH3 domain likely to be homologous in the SRC8_HUMAN:DBNLB_XENLA alignment?
  5. In the SRC8_HUMAN:LASP1_HUMAN alignment, the alignment extends to include several Nebulin_repeat domains. Do you think there is a Nebulin_repeat domain in SRC8_HUMAN? Why do you think those domains are aligning?
  6. What scoring matrix should be used to reduce over-extension from the SH3 domain? Scoring Matrix Summary

3. Exploring domains and over-extension with local alignments -- death associated protein kinase (DAPK1_HUMAN)
  1. Look up the domain structure of DAPK1_HUMAN at Pfam [pgm].
    1. What are the major (PfamA) domain regions on the protein?
    2. Which of the domains is repeated?
    3. In a local (LALIGN) alignment, where would you expect to see overlapping domains like those in Calmodulin (CALM_HUMAN) and Cortactin (DAPK1_HUMAN)?
  2. Use lalign/plalign [pgm] to examine local similarities between DAPK1_HUMAN and itself. Check the options to "Annotate query" and "Annotate target". Do you see the domains you expected from Pfam? Do they map in the same places?
  3. Repeat the LALIGN/PLALIGN analysis lalign/plalign [pgm], but select the subset of the protein where the repeated domains are found (within 50 residues). Looking at the first or second non-identical self-alignment:
    1. What is the overall percent identity of the alignment?
    2. What is the range in identity accross the different aligned Ankryin domains?
    3. Do the ends of the first alignment correspond to the domain boundaries?
    4. How long are the ankyrin domains?
  4. Based on the percent identities you saw in part (c), what would the appropriate scoring matrix be to accurately identify the ankyrin domains?
    1. Using a "correct" scoring matrix, are the alignment boundaries more accurate?
    2. What is the percent identity of the alignment (did you pick the right matrix?)

Where to get the FASTA package:

The "normal" FASTA WWW site:

Contact Bill Pearson: