fasta.bioch.virginia.edu/mol_evol     tinyurl.com/fasta-demo

Programming for Biology -- 2024 - Similarity Searching Exercises


These exercises use programs on the FASTA WWW Search page and the BLAST WWW Search page.

In the links below, [pgm] indicates a link with most of the information filled in; e.g. the program name, query, and library. [seq] links go to the NCBI, for more information about the sequence. In general, you should click [pgm] links, but not [seq] links.


Identifying homologs and non-homologs; effects of scoring matrices and algorithms; using domain annotations

Use the FASTA search page [pgm] to compare Honey bee glutathione transferase D1 NP_001171499/ H9KLY5_APIME [seq] (gi|295842263) to the PIR1 Annotated protein sequence database. Be sure to press , not .

  1. Take a look at the output.

    1. How long is the query sequence?
    2. How many sequences are in the PIR1 database?
    3. What scoring matrix was used?
    4. What were the gap penalties? (what is the penalty for a one-residue gap? two residues? -- this is a "trick" question)
    5. What are each of the numbers after the description of the library sequence? Which one is best for inferring homology?
    6. How similar is the highest scoring sequence? What is the difference between %_id and %_sim? Why is there no 100% identity match?
    7. Looking at an alignment, where are the boundaries of the alignment (the best local region)? How many gaps are in the best alignment? The second best?

  2. Homologs, non-homologs, and the statistical control.

    1. What is the highest (worst) E()-value shown? What should the highest (worst) E()-value calculated in the search be (approximately)?
    2. Which alignment has the worst statistically significant (E()<0.001) score? Do you think this sequence is likely to be homologous?
    3. What is the highest scoring (most significant) non-homolog? (The non-homolog with the highest alignment score, or the lowest E()-value.) Why do you think it is not homologous? Look for positive evidence (e.g. a non-homologous domain) for non-homology.

      You can use the domain diagrams (colors) to identify distant homologs, and, by elimination, the highest scoring non-homolog. You can also use the Sequence Lookup link to Uniprot to look at Domains and Families.

    4. If the statistical estimates are accurate, what should the E()-value for the highest non-homolog (the highest score by chance) be? (This is a control for statistical accuracy.)
    5. What is the E()-value of the most distant homolog shown (based on displayed domain content)? Could there be more distant homologs in the database?
    6. How would you confirm that your candidate non-homolog was truly unrelated? (Hint - compare your candidate non-homolog with SwissProt or QFO78/Uniprot Ref for a more comprehensive test.)

  3. Domains and alignment regions

    1. There are three parts to the domain display, the domain structure of the query (top) sequence (if available), the domain structure of the library (bottom) sequence, and the domain alignment boundaries in the middle (inside the alignment box). The boundaries and color of the alignment domain coloring match the Region: sub-alignment scores.
    2. Note that the alignment of Honey bee GSTD1 and SSPA_ECO57 includes portions of both the N-terminal and C-terminal domains, but neither domain is completely aligned. Why might the alignments not include the complete domains?
    3. Is your explanation for the partial domain alignment consistent the the belief that domains have a characteristic length because they represent compact folding units? How might you test whether a complete domain is present?

  4. Note that this Honey bee glutathione transferase shares significant similarity with both sequences from bacteria (SSPA_ECO57, stringent starvation protein) and mammals. How might you test whether the stringent starvation protein is homologous to glutathione transferases? (Hint - compare your candidate non-homolog with SwissProt for a more comprehensive test.)

  5. Repeat the GSTD1 search [pgm] using the BLASTP62/-11/-1 scoring matrix that BLAST uses.
    Re-examine the honey bee/SSPA_ECO57 alignment. (In the afternoon lecture and workshop, we will discuss how different scoring matrices have different preferred evolutionary distances. BLOSUM50 is more sensitive for alignmets that are about 25% identical, while BLOSUM62 works best for 30% identical alignments. Keep that in mind while answering the following questions.)

    1. Do both Glutathione transferase domains align in the honey bee protein?? Does that imply that one of the domains is not present? Look at the alignments to the homologs above and below SSPA_ECO57. Based on those. aligments, do you think the Glutathione-S-Trfase C-like domain is really missing from the honey bee protein?
    2. Why did the alignment become shorter?
    3. Why might a domain appear to be present in the first (BLOSUM50) search, but not in the second (BLOSUM62)?
  6. Do the same Honey bee GSTD1 search (295842263) using the Course BLAST [pgm] WWW page.
    1. Take a look at the output.
      1. How long is the query sequence?
      2. How many sequences are in the PIR1 database?
      3. What scoring matrix was used?
      4. What were the gap penalties?
      5. What are the numbers after the description of the library sequence? Which one is best for inferring homology?
      6. Looking at an alignment, where are the boundaries of the alignment (the best local region)?

    2. What is the highest scoring non-homolog?

    3. How do the BLASTP E()-values compare with the FASTA (BLASTP62) E()-values for the distantly related mammalian and plant sequences?

  7. One of the slides this afternoon will show an alignment between human and bovin PCBP2 (Poly(rC) binding protein 2). One of the missing regions of the human protein is from residues 267-279 in PCB2_HUMAN (Q15366) when aligning A0A3Q1LX34_BOVIN. Use TFASTX to compare residues 267-289 of PCB2_HUMAN to the DNA sequence from the cow PCBP2 gene by pasting in the cow genomic sequence from Chr5:26543511:26562251 (from the end of exon 10 to the beginning of exon 11):
    >5 dna:primary_assembly primary_assembly:ARS-UCD1.3:5:26543511:26562251:1
    ATAATTAGGTTTTTTAAACAACGTAAGCCGAAGGGGGATGTGGGAAAAGGTCACCAGGAA
    AAAAAAGTCTCCACACTGGTTCTCCTAAGCCCTAAGAGCACTGTACTTTCATTACTCTGG
    CAATATACACTCAGGCTAAGTGGAGACTATCAGAGGCCTTCCACCATTTTACTCAGGGCA
    TCTCTACTCTTATGGAGAGAGAACTGTGATCTTGGGATACAAAAAAGAATAAGGCTTTGG
    ACAATTTAGAACAATCTTAAGTGGATCAATGAGTTAAAGTTGATCTTACAGGTCTTTAAT
    CCTCTAAGTTATGACAGGGTTGGTACAGACAGGTTGGTGGGGATAAAAGGGTAAAGGGTC
    CTGTTTGAGTAATCAGAGTTGATTTAGGATAAACAGGCAGGATCTTGCCAAATTTGAGGC
    AATGTCACTGGTTGGTTGTTTGAGATTTAAGTAATACTAGTTCAGAGGAATGCAGACAGA
    AGCTGGCAAACTTTATTAGACTTTTCCCAACTTGCCTCCCCATTCTGGGTAAAGAAAAAA
    CTTCTTTCCTACACAGCAGTCAACTAAGGTTCTTACTGTGGCAACTTTATTACAACTAGA
    AACACAAATCTGGCCAAGGAGGACAGCTGATAACTTAACCATGAAATAATTAAAAGAAAA
    AAAAAGTTAACTGAGAATTTTTTATTCATTAACACAGAGCCTAACACATTGTTATATTGC
    AAGGCAGAATAAAGCTACAAAAAGTAACAGAATGAGAAAGAAGAGCAAGAACTGTGGCTC
    TAGATAAAGAATTCAACAAGTCAGACTCTGAAACAGGCACTCAGCAGAAATGGCCTAAAC
    CTAAACACAGTTTCCCATAGCCCCTACCCCCACCCTGTCATCCCCCCGAAACAGTAAAAA
    GGAAACGGTCACAAGCTCACAATTGTTTGGTTCTCACCAATATTAACAAAGGGGAAAAAA
    AACCATGAAATACTTGAGGGGGTGCCCCAAGACATATGCCATCAGTAGGGAGAGAATCTC
    TGTTTCTCGCCTTTCAGTTTGTTAGTTACACATGGCACAGCAGAGGTGGTGGTGGTGGTA
    GCTGAAGCCCCGGTACCCTTAGCAGCAGACACAACATTTAGAGCCAGGAGAGGGATGGGT
    TTCATGCCAAGCAGACTGGAGACACAAGGGGCGGTCGGAAGAGGGTTGGGGAGGCTGGAG
    GGTGCGTCGGAGGTCCCCGCTGTGGCGAGCGAGGGGGTGGTGGTGGAGGAGGAAGAAGAA
    GAAGAAGGGGTGGAGGGTTGAGAGAGGTTGATGCTGAGGTGATCGGGGATCGTGACGGAG
    GCTGCCTGGGAGGAGGGTTTAGCGTTTTCTAAACTGTAACAGAGGAAAAAAAGGAACAAA
    AAAAAAAGGGTGGGTGGAAGGGAAGACCACAAAAGGGCAAAGTACAGGACAAAATCCGCT
    TATCAGTAAGTGCCTCTGTTGCCAACTAAGAAAACTTCTATCAGCTTGTGACAAGCTTTT
    ATTCTGTGTCCCACCCTCTAAAAATGCATACTTTGCCAAATTCCATTCCAATGCTATCGA
    GAACTTCTACGTGAAGAAAAGGCTGACGGACTGAGAAAAGACATCACACACTTTCATGTA
    CCTGACATACTGATGCAACCCAAAAGGCAGAAAAAGATTTCAAGCAGAAAATGAAAGAAT
    ACTCATCAATTCCCAAACAAATAAATGTTTCATCCACCAAGCCCTTGTGAAGCTTTGGGA
    CACTAACCCCGAGAATTTTTCAAATTGTTTTTTTTTCCCCTGTAATTAATATCAAAATTT
    GTTTCCATTGCCAGTCAACTAACTAATATATCAGGATAGAAAGGGACAAGAGCCCAAAGT
    AGGTCACTCATTTCAAACTCTGTAGCATTTTCAATGAAAATTGCTGAAAACAGAAGCATA
    GATAAGCCTTATTTCTGGTGATGGGTGAGGAGAAAAGAAGGTTCTTTTGTTTAACTGACC
    ATGCTTCTTACACAGACTTTACAAAACTAGAGAAGAGAAACTTCATGCACTGCTGAAAAA
    TTCAGTTTAATGAACTAAAAACATCACTTCTACTTCTAAAATGAGCTCTAAGTAATAGTG
    ATGACCTGTAGCTCCCAATGCCTCCATGGGCCTATGCCAGAACTGCAGGTATGTGCTGAA
    GTCTGTTGGACATCCTATGGCATAAGACTAACTCTTCCAAGGCAGGGAGGCTCCTCAAGA
    CCCCCATTTTGTCTACCTAAATCCAAGGACAGTTAAAATCCATAAAAGCTGCCACAATCA
    GCAGAAACTCTCAAAGTCTATTACTAAAAACAGAGACATCTTTCTTGCCTAGTATTCATA
    ATAAAATCAGTCTTCTTTCTGTGTTCAAATTGGGCCAATATGAGTTAGCTGCGTGCATGC
    ACAGCCCTTGATTGTTAATAAATCAAGGTTTTTTTTAAAGTTTTGACCCAAATCCACAGC
    TTGGGCAGTTTCATGTAAAAAGGATGGTCCTGAAAAGAGTGACTAAATCAAGAAATGAAG
    GCGGACCCAAGTTTTGCTTCCTTAAGACTTAGCCATCAACCCAGTAGAGGGGTCAGGCAA
    GCAGGTTACATGGAAACACTATCTGACAATATTCAGTGCACATCTACTCAAGCAGACGCG
    TCCAGCTTGCAAGAACGTAATTCTTGCCTCTTAACAATACAGCACCACAGTCAGAAACAG
    GGAAACAAACCCTTGCAAACAGCAAGACGATAGAAAGCAGATCTAGGTATAGGAATCTTT
    TCTACACAAAGACAAGAAAATACCTCAAGGCATAAAGAGATATGATAACAGATGGAACCA
    CTACTGGGAAACTGATAAATTCTTACAGGGGATAAAAGACCATGGAATAAATATATGGTC
    AGCGTAAGCATGGCAAAAAATAGACCCCATACCATACTTATGAATTCCTTTCCCTTGCAC
    TACACATTTAGGAACCCATAAAATCTGTCTATACAACAGAGACCAACTAATTGTGACAGA
    GATAAACAACTTAGGTAAAAGTCTACATAAAAGGGAAAACTAAGTCCCTTACGGAATGGG
    CTGCTTTAAGATGTTAGTAAGCAATTAAGAACATGACTTAAGCACATGTGCTGAGAAAGC
    AAGAAATACAAAATGCCCTGCAAGATGCCCATTAACTGTAACAGTGCTGCTGCAACTGTG
    TGTGTTAACATTAGCTACCTTTGGATAAGAAAAGTAAAACCTCAGCAGGTCAATAAGGGA
    TGAGATGAGATGGTGGGGGGTAAAGAAAATACTAAGCATTTCTAGAAACTTCCAATACCT
    GTACCTCTCCTTTCACACAAGCTCTAAACAATGGAACACAATGGCACAAATGGATGTGTT
    CAGAATAAACAGTAATGACTGCTTCGAAGAAACTATCTTGCTGTCATTATCTTTATTTCA
    ATTAAGGGTGCTTGGGAGCCTAAACTCAGTGAAGTATTAGTCTGTATGACACCTACAATG
    AGACCTACATTTGGTTTTAAGTCAGTATGGGTGCTCCAGCCAGAGCAGTCCTACAAAGTG
    AAGACATGCTCCATACCTTCCATTCCCCAAACCAACAAAAAGTCAAATGCAAACAATCCA
    CTTCCCTTTCCCCAACCCCATGCAAACTCATTACAGTCTGGTTACTTTCCCCCCCTATAC
    TAAAGGATGAGAAGTAGAGCTAGATACTTTTTCTTTTGCAATCTCTACCCTTCCCATTTC
    CATCCCCCAGAGAATCCCATACTTAATATTACAGTGTAACTTTTCCCAACACACACATTA
    CTTGTGTTGGCACCAATAACACAAAAGATAGGGAACCCTTACCTGACATTGATTAGATAT
    TGAGCCAGGCTAATGCTGGCAGCAGATCCAGTGATGGTAACCTGCCTATCAGTAGATCCT
    TCCACTGGGTTCGCAATTTTGATCTGCGCCCCAGACATCTGACGGATCTCATTGATTTTG
    GCGCCTTGACGCCCGATTATGCAGCCAATCAACTTAAGGAAGAAAAACAAGGCATACCAA
    ATTAAGAGTTGCATGGCTAGGTCCCAGCATTCCACCCCACCCCTCAGGCTTCTATACCTC
    AGGCCCACAAACATGGCATTCAACTCTGAACTGAGTGACTCCTAGGACCATTAGCTCATT
    TCTTTCAGAGTACCAATTCCAAAGACAAAGCAGTGCAAAGATGATGATAAACCTATTCTG
    TTTTTGGCAATAAAGATGCCTTAACTCATCCTGCCATGATTATTACTGTACTATAGTGAC
    AAGGATTCTGGCTAATTTCTAACGCTTTTGCCTAGGAAAAAAGTAAACCATTGGTCAACA
    TTTGCACTCTCCAATACCGTAGACACTAATAATGTCATTACTGAACATTGATGTGGCCAG
    TGAGATTGACAAACTTGAGTTTATCATTTTATTATTCTCAATTCAAACAGCTACATGTGC
    TTAGTGACTACCATGTTGGACAGCACAGATAACATCTAATGACCACTCCATTAGGGATTA
    CATATGCAGATTCTACAAGATGCAAAAGCAAATGGGAATTTCTATTTGAGACAAAAAGTT
    TCCTCTGGCTCAAGATAACTCACACAATAATTTCATCAATCTCCATCAGTTTTTATTTGA
    GATTTTACAGTTTTAAGAATTTTAGTGCCTGTAACTCCATTTTCTTCCCCAAGGTGTTAA
    GAGTAGCCCAACCTCTGTTTTTACTTACATCGTTTGGAATGGTGAGTTCATGAGAAGTAG
    TCTGAGCAGATGCATCCAAACCTGCTAAAAGACAGAAGACAGCAAAGAAGTTCAAGGCAC
    AGCATACAACTCCCGCTGGTTTGCACATTAACCTAACAGAAGGAAAATGAACTGGAAACC
    CAGCATTGGCTGTTGGGATTCATCTCTAACTTGGAAGCCTCCAAAGAGGAAAGAGGGGGA
    GAGGGGTGTAGAAATAATGCTGGAAAGGTTCCTCTGTACCAACAAAAGAGGCTGATTTCA
    TGCCATTTCCTGCAAAGCCCAGGCTATTAGGGTCTAAGGGGGTGGAAGCACGGGGCAAGG
    GGATGTGGTATCCTCAATTCTCAGGACCGAAATCTAACCAAGTACTTCTGCACTTCTCCC
    CTTTTCTGGTTGGATTTCCTATTGTTGTATCCCAATTTTATCTCACCTGATGCTATCACC
    AGCACCACTACCATGCCTATTACTGGGTAACATGGACTTTTGGCACACATGCAGAGTTGC
    TTTAAATATACCTATACCAGATTAACCAATCCCAACACCCTTACCCTAACCTACACTTAA
    AGGTCAGTGAGAAAGAAATGGCAGTGAGGTGTGGGAATTTCATTTCTCCAAACTACTCCC
    TGTCCTTTTGCTCAATAATTTGTTCTGGTTTTTGCTTTCCAAAGAAAAAGAAGAAAAAAA
    AAAAAAAAATCCTATCAGCAAACACCCATACCCATCTATATGACACCTGGACTAGACAAG
    GCCAGCTGGCTAACTTCTAAGTCTAGTCCTGCTCTCAGCTGTGGCTGCTGAAACAACATT
    TTTAGAAGGGGGTAGGCCAGATCCATCAAGGGACCCTCAGGGGCACCAGGCCCACCCCCG
    CTTTGCTCTCTCAGCTCCTTTTTGCCATGTGCATGTTCTGCTGGTATGTCATGGGCAGTG
    AAACAAGACTCCCACCAGTTTATTAATATAACAGTAGAATACTACAGATTTTTTTTTTAA
    TCATTACCCCAATAGCCTTTCACCTCTGGAGAGCTGGATTCAATGCCTGTTAAGATACAA
    GTGAAAATGAGTTTGGTGGTCTCATCCTAACCCTATCGGACAAGTTATTCCTTATATAGA
    AAAGCCCAAAAAAACAAAACAAAACAAAAACACACTACACAAATGGATTGTTTCTACCCA
    GATCTATGAATAGCTGGTGGAGAGAAAGGGGGTGCCAATCAGATCAAAGGGGCAGCGATA
    AGGACGGGTTAGAAGAAATATCAGCCTACCCTTTGCCTGACAAGCATCCCCAGATCTCAC
    CCCAAACCTTCCAAGGGCCTTTCTTTTTTCTCTTTTAGAATAACTCACCCCAAATGAAAA
    CCAAGTTAACAAACATATTACCTCCTTCGGTTTAGCCATAGAAAACAGCCAAGTTTAAGC
    CAGCAGGACGCTGGGTAATGAAGTTTGTTTTAAACAGATGTCAAATGGAGACCTTGGGGC
    TGGCCTCTGTCTTGTACTACACCTTACAAAAATCAACACTGAGGTATCCATACCACTGAA
    TCCGGTGTTGCCATGCGTCATGGGAAAATGAGACTGTTGCATTGCCAACTGGTGCAGCTT
    GGTCAACTGCAGGAAGGAAACAGAAGAGGTTGACATTAAAAGAGTAACACAGCCAAAGCT
    CAAGCACATTCAAATTGGAAAATGTAGAGCTCCTGATCCTCAAAGTTGGCTACTACTCAA
    ACTATGACAGAGGCTAAGGAATGGGGCTGGTATCACAATCCTGGGGATGGGCACAAGAAG
    GTGAGGGTTAGAAATAAGAGTATCAGTACAAAGAGAATATATGCGTTTGGGGTGTGTGAA
    ATGCAGCAGCACCGCCTGACAACCACCCCCTTGTGGGCTATTTAATGAAGGTGTTTGCAG
    CCATTGCAACAGTTGCCCCAGGGACAAGCACAGACAGGCTAGGTAAACAAACTGTGCAGT
    GGTGCTAGGGAAAAACACACCATGCAGGGTTCCGTCAGTACTGGGAGGAGGGAAGGAAAT
    GAGGGGACAGGGAGACTGCAAGGAGGAGAGATTAAGGCGAAACCAAACTGGGTTTAGCAA
    CCCTGACCCTATGATTTCTATAGCTATTTTTCTTCATCTATATAAACTGTCCTGGCTGTT
    GTAAAACCATCTCTCTTCAAGTCTACTTCTCCTCCGGGGGCCTACCACTAACCAAAAGCC
    AAACTCCAACTTTTCAAATCTAAAGCTCAGCACCTGAAAAGGTTAGTAAGGGCAGAGAGT
    AAGTGCAGAAGGAAATGCCCTTAAAAGTCTAAATACCAGCACTTTTTAAATTGGAGTTTT
    CAGTGACAAGGAATTTCACTGAGTTGAATCCAAATATTTCACTTCCTAACCCTCCCTGTG
    TAAATAAGTCCCTTTCCCACCCCTCCAGAAGCTTTTTACATTACCACCTCAGACTGAGTA
    CACAAAAGCGAACCCAACCTCCCACTCCCTCACCCTCCCTCCATAGAAAGGACTTCATGC
    AAAGATGCACAGTCCACTGTGAATTTACATTGTGCCCTGGGATAACCAATGACGGTAAAC
    AGTGGAACATTTCTCTTTCAGAGTAATCTGTAACACGGGAGAAAGTAGAATAATCTATTA
    TAATTTCCAGGGATTAAATTTCAAAACAAGGTGCTACTTTTTCCTGATTAAAGATTATTT
    AAGGGCCAAAATACAGACTTCACCTTGTGAGGGAAACAAGACTGCAAATGTTTGAGACCC
    AGCTCACTTCTGCCTACATGGAGAAAGGACTGAGACAATTTTGGAAGGGAGAGGGAGCTT
    TTCAAAACAAGTGAAAACAAAACTTACATCTGGCTGTGGAATGGCATACTGTCCTTGAAT
    GGTATAGGCCTACAAAAGGAGGAAAACAGTTCCTATTAAAAACCAATCACGGACAGCCAC
    CCAGTTGGGGGGTGGGGGGGTAGGGGGAGACATTTCAACTAGCCAGGAAGCCAGAGTTGA
    ACCAGAGAATGGGGAGGCCAGGGGTTGAGCAGTAGACGGTATCTCACAAATACAGTCAGA
    CCCGAGTTGGCCTTCTTTATACCCTAAATATGATGGCTTATTTCAGGATGCCTATAAAGG
    GAGCCATGTTCCCTGCAATCAAAGACAAAATAACACATTTGTTTCTTAACGAAAATGTGG
    GAGATAGGATTAAACAATGCCTTGATTGCCTGAGCCCCCTTCTTTCTCCCTTTGTACAAG
    ATGGCCTAGGAGGAAAATGATTTTTTTTCCTCAACCACCCCAACCCTTCTACCAATCATG
    GTGACATGTTTTATATCTGCACAGACCCACACGGCATGTTAAAACAAAACACTTCCCTTT
    TCCTATCTGACTTTTCTCTACATCTGCCTAATGATCTTAAGTGCACTGGAAGCTAGTTGG
    GTCCCTAGCACTAGGGAGCTGACCATAGGAATACATATAGAAAAAAAAAAAAGAAAAAAT
    CTTATGTTCTAGGTCCTGCTTTCTCCCACCTCCCCTCCATAAAAGGCACTTGCAGCTCCC
    ACTACGCCCTCCTCCCCCCACACCATAACCTCCCAAAAAGGGAGCTGGGTCTTCAGGGCA
    ATGGGGTGGGGGGCAAGGAGAATCAGGAAGGGAAGAAAGGAGCAGCTGGTTAACAGGATG
    TGAAGCTTCTCACATCACAGTGGCCAGTACCCCCAGAAGTGTACCCAGGTGGGGGTGGGA
    AGGTGAAAATGGGGTAGGAGGAATAGGAGACAGGGAAAAGGGGTTGAACAAAAAAAGGAA
    AAAAAAAAGGGAAAAAAAAAAAAAAAGAGTTGGTGTTACCATCTGGTGGGCTACGCAGCC
    TGTGTGAAATGAGGTGGGGGGGGTCAGTCCAGGTCCCCGTGGACAGGTGGTGTCCACAGC
    ACAAAAATCACCAGCGCCACTGGCCCCCCGCGTGTTGGCAGTCTCGGCTGAGGCGGAAGG
    ATTGAGTGAGCCTTGGAGACTGAACATCCCCTCTCACCTCTAGAGGTGGTCCCTCCAGGT
    CAGGGTTGAGGCACATGGACGGGGTGGTGTGGGGAAAGCTCGCACTGTCGCTGCCTGTGC
    TGTACCTGTCCTGTGGATAAATAGAGGATTTCAGTCTCCAGGGCTGTCAATCCGGCACCT
    TGTCACCAGCATGAAATTCACACAAAAAAAGTTGTTTAATTTTTTCTTTTCCTTTAAAAA
    GGAGCTCACAACACAATTTGAAACGAGCTGCAATCAGTGAATTAAAAAAAAAAGTTACCT
    ATAGTGGCAGATTTCACACTGCACAATTGGACTATTTTCAGTAAGAATGAAGGAGTGGGG
    CAACAGAGCTGATGCCAGCTCCTGGTCAATGTGGGATTAAACTGCACCCCTATAGAGCCC
    ACTTTGTCACAAAGGAACAAAATACACAGGTTCACCTGTGGACAATGCATAAACTGGTCC
    ATTTAAAAAGGCCCATGGGAATAATCCAAGACAGGGCCACTCATGGAAGAAGAAGCTGCA
    AGCTTTTAGAGTGGTCAACTGAATAAAGGGGAGGGAGGGAAGGAAAGCAAAGAGGTAAGA
    AGGGGGAAGTGGGAAGGCGAAGGAAGGAAAGGGAAATGAAGGTAGAATTGTAAACAGAGA
    GGACTGCTGCTGTGCGGTATGGACTCTAGATAATCCCAAGTGACCTGGAAGAATCCAGAA
    AAGTAGGAACTTGTAGTTCCAGGGGAGAAGGGAAAACACCTGAAGCTAAACCTGCTACTC
    TAACTCTGGGCCTTTGTTTATCCGCCTAACTACCCAGACTCATGAAGCACACTTTGACCA
    AACCAGTTTGTACAATGCTAGTATTAAAAAGTACAAAACTGTGATATGATTAACTTCCTA
    TGAAAGCTTCCAGAATTACAAGACAAACTAAGTTCCCCAAAACTTGTTCAGTTTTATGAC
    AATTCAATAGAATGGGAATGAGATGGACTCTAAGAGCTCCCCCAGTTTAAATACAAGATG
    GAAATAGGTGAGTTATCTGCGTTTGTTAACAGGTCAGCCCCACGTAGGAAAAGGGACACT
    TGTACTTGTTTAGGCAGCTACCAAAATGAAGGTCACTATTTCAGCCTGAATCCCAATTAT
    TTTAAAACTGACCTATGCCTCAAACGCTAACCAAAAAGCATAAAACTAACATGGTGCTAA
    TCAGTGCCTTTTGTCATAAAACTCTAACTTCAGAATAAACCATGTACCAAGGATCCAGAG
    TTTAGGTTGGTGAAGGAGGAGAGGCCAAGGGGTGCTTTGGTGTTCTTACTCTAACACCAA
    AGGTCACAGCCCAACATGCTCTTCCCGCTCCTAGACACTGAACCCTTACCCCAGCTGATT
    TCAGTTATGCCATGTGGGTTCAAACCAGTGTTTGAGGTTAAGAGCCCAGGGACAACACAC
    TGGAATTTGGGAGTGAAGGGGTAGGAATAGGGGAGGTGAGGTGGGGAAAAGGGGAAGTTT
    GTAGGACACTCAATTTCAGGAGCCCCCAATTTGAGACCACCATCCAAGAATCTATTTCAG
    AGGTCCAACTCCTTATTATGTTTTATTTGGACCTCTAAAGAACACCCCACACATGAAAAA
    ACATGGATGGGGTGGGGGTAGGGAGAGCAGTGACATCAAAGCAATTTTATCTTTTCTTCT
    AACCCAACCCACCTTAAATTACTAAGTAGATTACCTCTTCCGAGTGCCTACAAAACTCCC
    AATGCCTGCTAATAATAAGGTGGCATTCATAGCTTTCTTCAGTACAAAAAATAAAACTGA
    AAGCAGGTCAAACTATGGCCAGAGTACAGATTTCAGCAAGCATGCAGAGTGGCCTGGTGT
    GCTGCTGGCAAAGTTGACATTTCAAATACAATCTTGTTCACCTTAGCCCAGCAATAAAAA
    ACTTCTTACCTGACCACCTGCAAAGATGACCGGAGAACTGGACGGCTTGGGCCGGTACGG
    GATGGTCACGCCCTTCGGGGGGGACTGGGAGAGAGTCAGAGGGGAAAAAAACAAAAAGAG
    ATAAGGTTTCAAAGCTGCTTCAGCTGTGTGGCTTAAGCTCACACAGGTCAGTTAAATTAA
    AAACTTTATTATTTCCACCCCTTCACCTATCAACTTTGTCTCTGGCTAGATCTAATCTCA
    AATCATCAGCAGTTATTTGCCTCTAAAACAGTAAGATGGTCCGTGGACTATACAGTTCAC
    ACTAATTCTCATCAAGGCAGATTCCCCTGTAATAATTTAGGCATTATTTCCATGTGTTCA
    AATTGTGCTAATTAACACAAAAAACCAAAACCTCTAGATGTACCCATGTACCTTTGTAAC
    TACATTTTGACCTGAATAGGAAATAAAATCAAATAGGAAATAAAGCTCCCAAACTTTTTA
    TCTAGGGTAATTCAGGACTTTCCATTAATAGCTCAATGATTCAGTCAATAGTTTCCTAAA
    CTCTACCTAAAAGTTCTACTTTCCTTTCTCACCAATATAGGACCAATTCCCTATGACAAT
    TCTAGAAAGGTTTAATTCTCCTTAATTTTATTATTCTTTTTGTTAACTGAATAGCATCCA
    CCAGGACCTCGTAAGTCATTCAGAAACACTACTATGGATATTTTTTGTTATTGTAGATCA
    TTTTGAATAATCCTGCCTTATAACATAGAAATTAATTTAAACATATCAAAGTTGCATGTA
    GCCAAAATACTGTAACCTGAAAGGCAATTTAACTTAGCTTCTCAAGACCCCAGATACATC
    TTTTGTCCTCAAATTCCAAGTAGCAAATCTCCCCCCTACCAACCCTGCTAAAATTGCTAT
    AAACCAAATGATCACTGCTGCTTAAGTTTCTATTTTTGCTTGGACTATAAGAAAAGTCAA
    AGCTGGGAGAAGGAAAAGATGAACTCAGGGAACCTCTTCCCCACCCCCATGTGAGGCTTC
    TAACAGCCTCACCTAATTCTAGTCTTACTAAATATAAATCAGTCTGTCTCCCTAAATCAC
    CCTCCCCCCCATTCTCAGCATGAGCCTGAGGGCTTACCTCCAACATAACCACGCAGATCT
    GTTTCACACACTCAATGATGGATTGCGGAATGCCAGCAATAGTGATGGCCCGCTCAGTTG
    AGTTGGGGAGCATATCCCCTGCCACTTGGACCTGAGCCCCTGTACTCTTTGGGGTAAAAA
    GAAATTTTAAAAATCAGAGAAACAGATCATTCATACCATTTACATCTCCCTGCAGAAATT
    TTAGAGTCTAAAACTCAGTAAACATTTATCCCATCTCCTATCTAACCCTTTTAATCCTCC
    CTCCCCTCTCTTTCCTTATTCCAGATAAGATTGTTTAATCACAGTTATAAATAAATTAGC
    CTCTGAAAGCTGAGAATAAGCCTGAATAACAGCCTAAATTTAACCTGCTCTTAAGAGAAA
    GGTCCTTCTACACAGTTTCCTCTCATTTAATGATAGCTCTGTGGGTCATCAAACAAAATA
    TTCATAAATATCACCCACCTTTTATTCAGCAAATGTTTATATTAAGCCCATTCTATGTGC
    CAAGCACCATCCCAGGTACTTGGAATATATCAATGAGCAAAATAAAAATCCCTGCTCACA
    TAGTACGTACAATAGGCTCCCAGATAAGTCCAATTCGAATAGGAGCTGTTCAAATTTCAT
    CCCAACATCCAATTCGGTAACTCCCAGATATAAATAAAAGCTCCCATAATTCACCATTAA
    CTTTCATAATTTTGAAGGGAGAATTACAACTATATCCCAATATACCTTCATTTCTTCCAA
    GCTATTTAAGAATTTGGCCAAGAAAAACACTTGATATTATTACATATCAAATTGTAAGTT
    CTTTAGACATAAAAAAGGATCATGTAGCCTCATGGAAAAGTCACCAACAGGTCCCACGCA
    ATAGCTGGAAATTCTGGCTAACTTTTCTGATTTACTGTTGTTGAAATATATGAAATCTAA
    GAAAAATATGCATATCAACCAAATTATAGTCTTCTAGGACTATTACTTTAATAAAAATCT
    TCATCTCAATAAATGCTAACTTTTCCTAGATAATAAAAGCTAGTAGAGGAAGCTCATTTA
    GAATTTACACTTCAAAAAGAGAAAGAGCACAAATTAACCAACCTCTCGTATTTCCTTGAT
    CTTGCAACCACCTTTCCCAATGAGAGAGCCACACTGACTAGCAGGGACCACCAGCCTCAG
    GGTGACTGGGGGTCTACTGGCAGCTGTGCTATTGGTCATAGAGCTGCTGATGTCCTACAA
    AAAGAAATTAGTCAAAACACCACAAGAATATACAAGTTTAAACAGATCCCTGGGATCCCC
    TAACTGGCATAAAACAACCTTATAAGGGACATATTATGTCATGATCTCCAACAACATTTA
    AACATTGCCCCTGCCCCAAACAATAACAACTTATTACTAAAAAAATAAAATGGCCTACAA
    ATATCCTCAGATTTAGCCTTAATATATGTTCCTTCATGCCAGAAGCAGTGTTCTTAGTAG
    AGAGAAAACTTATTTTGTTAAGTAAAATAAGTTTTCTCAGCAAAATAAAACAATACTGGT
    GAAGTCAACATTAAACATGAAGAGAAAAGATGAGACAATGCAACCCAAATTCAAGTGACC
    AATTTTGTCCTCTTAAACTAGGTAGAGATTAAAAATAGTGTACTTTTATTTATCCTTTTT
    TCTTGGTATGAAATTGAAGCTGAGTACTTACCCCAATCACAACTGAAAAGTTGTAGATTT
    TAAACTAAAAGACTGTATTGGCTTCAAAGCCCATAATCTCTCCCATAACAATATGCTGCA
    TCTTGTCTATAAACTAATTAAATTTATCCTTCTGAGGATTTAGCTTAATCTCTATGTATA
    TAGTATGTATTTAGATAGACCTGGGAATTTCTCCGACATAAGCCCCTTAAGTCAAGACCA
    TGTTTCTTATTTGTTAAAGTATGAATAGATTTTCTGAATTTTTTTAAAGCTACAAGCTCT
    GGATTAGCAAGGTCTTACAGGTCAGAAAGATAACATTTCTGATCTTGGCAAGAGTAAGAT
    TTCAGGCACCATTAAGGCATTAGCAACAGCCCAGAAGCCCTCCTCTGCCTGCAGTATCAA
    TTGCATTATTAGCAGCATTACTACAGAAGCTGGGAGGCTGAAAACTGGATGAACCAAGGG
    CAATGATGGTTCAGCACTGTAAGGTTGAGAAAGTTTAATAAACAGAAAAAAAATCCCCAC
    TATTGACATTTTGCTTCTGATAACTATTCCCTCCCCTCCAAGTATAGCTCTGACAGTTAC
    CAGGCTTTTTTTTAAATGTGTAACTGCCTACCTGTATTTATAGAGTATATTAGTATTTAG
    CAGGGCTCCTAACAAATCCATATTAATTTAGTTCACTCAACTAAGAGTCAGGGGACATAA
    TTATGAACACAATGGTGGCATACCTTCAAATTGTTGGTAAAGAAGTAAAAGATGCTGTTT
    TTTCACATTTTTTAGAGTCCCTAATAACAAACTATTTGGGGTTTTTTTGTTCTTGTTTTT
    AAATTCAAGCACAGTCAATGTACACTATGGTTTCAGGTGTACAAGTATTGTTTGACATTT
    AAATTCATTAAAAAAATGATCAAAGTAAGTCTAGTAACCGTTTGTTACCAAACAAAGTTA
    TTACAGTATTACTGACTATATTCCTTACACTGTATGTATAAACCCCCAATGACTCATAAT
    TGGGAAAATATGGAATGCTTCACAAATTTGTGTACTATCCTTGCACAGGGGCCACATCAA
    TCTCTTCTGTATCATTCCAATTGTAGCACATGTGCTGCCAAAGTGAGCACTCTAACTATA
    GCTGGAAGTTAGGGGTGCTGACCCCTCACCCCAGTCAAAAGTCCAAGTATAACTTATCCA
    TAGTTCCACATTCACAGAGTCAACCAATCGCTGACTGTAATTCCACAGTACATATTTTAT
    ACTACTGTAATACATTATTTATTGAAACAATCAAAGTACAAGCAGACCAGCATAGTTCAA
    ACTCATGTTCTTCTATGGCCAATGGTATTTTTCTAACCAACTGAAGTGACTTTTTAACTA
    TAAATGGTCCAATTTTGGGCATGCTGAATGGGAGTAAAGAACCCATTTACAGAGCTTTAT
    GACCTAGTCTGTTCTTCAATTTTAAAAGCTCTTGGAAATCACTAAGCTATCAAGAAGCAA
    TAAAAAATAAAGAATGAGGGGCAGGGAACAACAAACCTCTTCCAGTTTATCAATGATCAT
    AGCAAAGGCTTTGAAGATGGCATTAGTGGGTCCAGCCAAAGTGATAATCCTTTCAGGACA
    ATTCCCTTCTGAGATGTTTATACGTGCACCACTCTGGATAGAAACAGAGCCAAAAGGTTA
    ATAGACAAAAAAACCTAAGCACAGACACACATACAAATATATAAATAATCCCTCTTAAAC
    AAGACGGGCAAGAAGCTGCCTATGTACATGATCCTGATAAAAGGCATCATCAACATCATT
    TAAAATACAGCTCCATATTTATCCTCCAACAAACTACAGAAGCTGCTTTAAGTCTTGAAC
    GCAGACTTCCTAAAACATTTCCCAGCATACTATTTGTCATACAAATAACCCAAGCCAGAA
    AAATACCAGCTTCTCCTATTAAGATCTTCCCTTTTCTGGGACACTTCCCTGGTGGTTCAG
    TGGTTAAGACTCCATGCTTCCACTGCCCAGGTTGGGGAACTAAGATCCCATGTTCCCACA
    GTGCACCCAAAATATTAAAAAAAAAAAAAAAAAAAAACAACTTCTCCCTACCAATCTCTC
    ATTAAACAGAAACTTTGCTTTTGTTTTAGAAGTGTCTAACTTTCCAAGTTAACAGTCCCT
    ATCCCTCCATCCTCAATAACTCACCTCCTCGCGCATCTTCTTAACTGATTCTCCTTTCTG
    AAAGAGGGAAAGTCAAAAATGAGTTCCAATTATTATTTAAACCCAGGCTGACTAAATAGT
    GATTAAGAGTTCAACTAATGCCTTACCTTTCCGATGATACTGCCAACTTCCTACAATGGA
    GAATACTAGTGTCAAATACAAGGAAACTTGAAGCATTGGTTATCACAAAAAGTAATTTTT
    AAAACAATCCAAAGCAATGGAAAAACTAGGATTCACTGGGAAGGGGAATGGAAAGGCTTA
    TTATTTTCTTTGCTCAAAACATAAGGGAAGCTATCAAAGAATTGTCAAGAAATAGTGGAG
    TAATAAAGTGCTGTCTTTAATATAAATCATTTTTAAAAATATCAAATACCAGTTCACCTT
    AAAGTTACCCAAAATTGTTTGGGCAGTGGAAATCTACCTATCACAAGCTATAACAACATC
    CATGAGAAAAATCTTATCATCCAGATAGTTTTGGCTCTAGCCTCTGGTTAAGCCTTTCGT
    GCTCAGTGGAGCTTCCTAGTAAAGTCTGTCCATTTTCCCACTACAGTCTCATCTAGTCCA
    CCAGAGGAGGAGTGCTAAAGCCAGACTTCCCACAGAGCTAATCTGATGAGCTCCAAGATC
    CATGCCCTCTCTTCCTGGCCCCTCCTCTAGCAGTTACTATGACCCAATTTCCCAAGCTGT
    TGCATACCTTTCCATGCATAAGTAGCCGGATGGTGAGAGTGACATTTAATCCACCTTCAA
    TCACTCCGGTGTCCATGTCGAGCAGTGTTCTGAGGAGCTGGACTTTGAGTGGTCAAGTCT
    TTGGTCACTGGTGGGGGTGAAAGCCAAAAACTGAAATGCAAACATGACCAAACTCAGAGT
    AGGAAACAATAGAAATATTTCCCAATGTTACTTTAGTCATTATCAATGTTTTCTAGCTGA
    CTCTCTTCCTTAAATGACAGTGTATCAGTCGGCAGGCTTGTATACATCCAGCACACTGCA
    TTAACAGCTCAGATTCTGTTACAAAGTATATAAAGCAACTTTCCAGTCTACTCTCATTAA
    TGATAGGGTTTCCGCTGTCTCTCAGGATGTGAAAGATTCCTTCAATTGGCAATGGTTCTA
    AGGGGAACTGCTGAACCTGCACCAGTTGTCCTTTAGCAGAATAAGCTTCCAAAGAATACA
    GAAACACCTGGAAATCATGAGGGCAAACCTCAAGCTAACTATACACTGAGGGCTATGACT
    TTGGCTTCCCTATTATGAAGCCATTCAGGTTGAAAGCTACAAAGGACACATGTAAATGTT
    TTTATTGGCAGCAATTTTTAGGTTAAAAAAATTAAGATGTATTTTCACCTTAAAATTCTG
    AATGTGAATACACGTCACCAAATAATTTTTAAAAGATTTAAATCAACAACCTCACTGAAA
    AAGGCATTATCTGAAATGCCTTCTATGCAAGCTGGGTTCCACATTCCCCTTTGGGAAATG
    ACTCAATACATTAAGTGCAGCCTGAGTCGTCTTTTCCCAACCAAACTTGATAGTCACTCA
    AGATCTGAATTTACCTGTTTCCGCATTAACCTGTTACTACTCAAGTCTTTATTCAAATTA
    TTTAATCTTAAATTTTCTTCCAAAGGAAACAATCAAAATGATAGCTGGTATCTCTAAAAA
    TTTATTGTGTGGTTGTTTTAAGCAACAGCAGTAATGAGACTTTCCTACTGAGGCCCTTTC
    CCCACGTAAGCACTAAAGCAATGTTACTAACGTTTCTGCTTTAACGGATAAAGAATGCTT
    TTGAATATACATCAGCCTACAATGCTTCCTTAGGACCTAACAGTGTTAGTTTTGAAAAGG
    ACCTCCCCAAAGTGGTGAAATTTCTCCAGTGACATTTTTAGTAGCCACCTCTTCAAAACA
    ACATCAATGATCTCTCCTACAGCCAGTCTTCCTCCTTCACAAAACCAGCCACCTTTTCAA
    CTTTGCTGTAACACAAAGCTGTATCCTAAACATCCTAAAAGTGTTAAGTGGCTGCCAGAC
    CATCCTGTATATAGCAAGCTGCCCCACGGCTACCTCAAGTATGTGATTACTTAAAAGAAG
    GGCTTTGCATCCCCAAAGAACCTCATTCCCCCATTTTCAGAATATAATTGCTGAAAAAAA
    TCCCAGAATTCAACACTTTTCACTCAAGGAAAAACAACTGTACTGGGGGTAGGGAGAAAA
    TAGTTGGGTCTGTAAGGACGACTTTAGTAAGACATGGAAAAAATTTGGCAGTTTAATCCC
    TTATCCTCACTCACAGAATAAGAAGCACTTTACAAATGCCATGACAAAAGCAAATTCGAA
    AAGGCAAGAAAATTGGAAACCTATTTGAAGCCATTTCTTGAGGTATGCCACAAAGGTGCA
    TTTCCAACACCAAACCACTCAAGAAATCCCTGTCATGTTTCTCTCCTGACTAGGAGAATG
    TAAGTGTACCCAACCCCAGGAATCCAGCTGTTAATCCTGCATCCATTACCCCTTCTCCTA
    TAAATAGCTCGGTCTGGGTCTATAGCGGCAGGCATTTTGTAATTTCAAAATTCGTTACCC
    CGAGAAAACGTCCTGAATAACTCGGTAACTGGGAGCCCGTCTACCCTTTCTTTAACTCCA
    CAGCCCCCCTCCCCTTGGCCATCCAAGCATTTTCCACTCAAAAAATGTACCCCCCCATCC
    CTCCACTAGAAAAAAAGGAAAAAATAAACGGTTCGGTCGGGGGGGGGCGCTGCGATCCAG
    GGGGAGGGGCCAGACTAGTAGCGCCTCCTCGAGTGGCTGCGCCAGCGCCTGACCCCCGCG
    GGGAGCCGCGCTCACCCGCCCGGCCAGCAAGCCGGCCATCGCCTCCCCCACCCCCCTCAA
    CCAACAAATCCACCGCCGCCCCCCCCCACCGCGCGCGCGCCCTCCTTGACTCCTGCGCAG
    CCGCCACCGGGAAAAGGCGATTGGAAGGGGAGGAGGCCTTGCGCGAGGCCTACTAGGCCT
    CATCGGAGCCAGGGCCTCCCGTGAGCGGCCGGCCAGGCAGAGGGAGGGGGCGGGCGCTCA
    CGCGGGACTACGCGAGGCCGGGGCTCGTGACGAGCACCGGAGCCCGCTAAGAAAGATGGC
    AAGAGTAGAAACAGAGCCATAGCCGCGGGCGGAGGGAAAAATTTGAAGCTTACCTGCAGG
    CGCAAGGCGACTGAGGGACAAAGGGGAGCGGGCGGGAAGGGGAAGTGCGGGAGGCCGGGG
    CGGAACAATAGGGGTGGGCGG
    
    Compare the subset range of PCBP2_HUMAN 267-279 to the cow genomic region using "Compare sequences" with TFASTX [pgm] using the "MD20" and "BLASTP62" matrices. How long are the alignments? The percent identity? What the bit scores? What are the E()-values of the two alignments?

Where to get the FASTA package: github.com/wrpearson/fasta36

The "normal" FASTA WWW site:

Contact Bill Pearson: wrp@virginia.edu