FASTA Sequence Comparison at the U. of Virginia

New: Annotation features available for
SwissProt/PIR1 library searches.

UVa FASTA Server

To search large sequence databases (particularly DNA databases), please use the FASTA WWW service at: EMBL-EBI.

If you are interested in using the FASTA WWW service for teaching a class, please email me (wrp@virginia.edu) and I can make arrangements for you to use a Beowulf cluster of FASTA servers.

ProgramDescription
FASTA Compares a protein sequence to another protein sequence or to a protein database, or a DNA sequence to another DNA sequence or a DNA library.
SSEARCH Perform a rigorous Smith-Waterman alignment between a protein sequence and another protein sequence or a protein database.
GGSEARCH/ GLSEARCH Perform a rigorous Global/Global (GGSEARCH) or Global/Local (GLSEARCH) alignment between a protein sequence and another protein sequence or a protein database. GGSEARCH implements a Needleman-Wunsch like alignment, except that affine gap penalties are used. GLSEARCH is most appropriate for global searches with a domain, which require local alignments within proteins.
FASTX/
FASTY
Compares a DNA sequence to a protein sequence database, translating the DNA sequence in three forward (or reverse) frames and allowing frameshifts.
PRSS/
PRFX
Evaluates the significance of pairwise similarity scores using a Monte Carlo analysis. The Smith-Waterman local similarity score for the two sequences is calculated, and then the statistical parameters Lambda and K from the random scores calculated by aligning the first sequence to 200 to 1000 times shuffles of the second sequence. PRFX does a similar shuffle, but compares a translated DNA sequence to a protein sequence using the FASTX algorithm.
 
PRSS/PRFX can either shuffle the second sequence in its entirety (uniform shuffling) or shuffle the second sequence in 20 residue segments (window shuffle). This latter strategy preserves local amino acid composition biases, e.g. in transmembrane segments.
TFASTX/
TFASTY
Compares a protein sequence to a DNA sequence or DNA sequence library. The DNA sequence is translated in three forward and three reverse frames, and the protein query sequence is compared to each of the six derived protein sequences. The DNA sequence is translated from one end to the other; no attempt is made to edit out intervening sequences.
FASTS/
TFASTS
Compares set of short peptide fragments, as would be obtained from mass-spec. analysis of a protein, against a protein (fasts) or DNA (tfasts) database. A different format is required to specify the ordered peptide mixture:
>mgstm1
MILG,MLLEYTD,MGDAP
indicates three peptide fragments were found: MILG, MLLEYTD, and MGDAP. The commas (,) are required to indicate the number of fragments in the mixture, but there should be no comma after the last residue.
LALIGN/
PLALIGN
Compares two protein sequences to identify regions of internal sequence similarity. While FASTA reports a single alignment between two sequences, LALIGN will report several sequence alignments if there are several similar regions. LALIGN reports sequence alignments and similarity scores. PLALIGN plots out a graph of the sequence alignments, which looks much like a "dot-plot".
GREASE Kyte-Doolittle hydropathy plot. A modern transmembrane prediction program is available from tmpred
GARNIER/
CHOFAS
Protein secondary structure prediction (provided for teaching purposes only). Much better prediction algorithms are available from psipred and PredictProtein

Other sequence analysis tools

Lowercase low-complexity regions with Pseg

Generate a random protein sequence at Expasy: randseq