FASTA Sequence Comparison at the U. of Virginia


The FASTA web interface has been simplified, with new WWW pages. The same programs and databases are available. If you find problems with the new arrangment, please send email to wrp@virginia.edu.


This FASTA server has been overstressed recently by users trying to search too many databases at once. As a result, many searches are being dropped.

Please use the FASTA WWW service at: http://www.ebi.ac.uk/fasta33/ to search large sequence databases.

If you are interested in using the FASTA WWW service for teaching a class, please email me (wrp@virginia.edu) and I can make arrangements for you to use a Beowulf cluster of FASTA servers.


ProgramDescription
FASTA Compares a protein sequence to another protein sequence or to a protein database, or a DNA sequence to another DNA sequence or a DNA library.
SSEARCH Performs a rigorous Smith-Waterman alignment between a protein sequence and another protein sequence or a protein database, or with DNA sequence to another DNA sequence or a DNA library (very slow).
FASTX/FASTY Compares a DNA sequence to a protein sequence database, translating the DNA sequence in three forward (or reverse) frames and allowing frameshifts.
TFASTX/TFASTY Compares a protein sequence to a DNA sequence or DNA sequence library. The DNA sequence is translated in three forward and three reverse frames, and the protein query sequence is compared to each of the six derived protein sequences. The DNA sequence is translated from one end to the other; no attempt is made to edit out intervening sequences. Termination codons are translated into unknown ('X') amino acids.
FASTS/TFASTS Compares set of short peptide fragments, as would be obtained from mass-spec. analysis of a protein, against a protein (fasts) or DNA (tfasts) database. A different format is required to specify the ordered peptide mixture:
>mgstm1
MILG,MLLEYTD,MGDAP
indicates three peptide fragments were found: MILG, MLLEYTD, and MGDAP. The commas (,) are required to indicate the number of fragments in the mixture, but there should be no comma after the last residue.
LALIGN/PLALIGN Compares two protein sequences to identify regions of sequence similarity. While FASTA and TFASTA report a single alignment between two sequences, LALIGN will report several sequence alignments if there are several similar regions. LALIGN can identify similarities due to internal repeats or similar regions that cannot be aligned by FASTA because of gaps. LALIGN reports sequence alignments and similarity scores. PLALIGN plots out a graph of the sequence alignments, which looks much like a "dot-plot".

PRSS/ PRFX Evaluates the significance of pairwise similarity scores using a Monte Carlo analysis. Similarity scores for the two sequences are calculated, and then the second sequence is shuffled 200 to 1000 times and compared with the first sequence. PRSS can use one of two shuffling strategies. One strategy simply keeps the amino acid composition of the entire shuffled sequence identical to the unshuffled sequence. The second, local shuffle, destroys the order but preserves the composition of small (10 - 25 residue) segments of the shuffled sequence. PRFX does a similar shuffle, but compares a translated DNA sequence to a protein sequence using the FASTX algorithm.
GREASE Kyte-Doolittle hydropathy plot. A modern transmembrane prediction program is available from tmpred
GARNIER/CHOFAS Protein secondary structure prediction


Download the FASTA package from ftp.virginia.edu


Other sequence analysis tools

Lowercase low-complexity regions with Pseg

Generate a random protein sequence at Expasy: randseq


NCBI BLAST program versions


BLAST at the NCBI