SeqEvolver simulates the evolution of a collection of protein or DNA sequences, with user-defined substitution rates and evolution time (details below).
Enter sequence(s) in FASTA format ONLY:
ParametersSeqEvolver works by creating a distance-specific mutation probability matrix for each evolutionary distance. It calculates the distance by multiplying the mutation rate for the sequence by the time factor. The numbers given in the rates field are paired up with sequences in the sequence field in order. The rates and time can be in any units you like, but the product (evolutionary distance) should be in PAM (Point Accepted Mutations) units (see Dayhoff, 1972). You must enter the same number of rates as sequences (unless you check the "same rates" checkbox); otherwise the program will squawk at you. The same time is applied to all sequences, so enter only one number in the time field. You can use SeqEvolver's input parameters to your advantage for sequence evolution simulation. If you want to mutate a sequence a specific distance, say 40 PAMs, just enter a rate and time combination that will multiply to 40 (like 1 and 40). If you want to evolve a single sequence several different distances, say 20, 40, 80, 120, enter 1 for the time, then the distances for the rates, and then four copies of the sequence. If you want one sequence mutated to one PAM distance many times, just enter the number of copies in the 'Make X homologues for each sequence/rate pair. Note: Residues that SeqEvolver does not recognize are ignored and never mutated. The web interface will silently ignore these residues, while the command-line version will will produce a non-fatal error message. Be aware that SeqEvolver's PAM probability matrices currently do not include entries for amibiguous amino acids (e.g. B, Z) or ambiguous nucleotides (e.g. W, R, N).
Protein Mutation ModelMutation of protein sequences is based on the model of Jones (Jones et al., 1992, see also Dayhoff 1972). Optionally, SeqEvolver will create insertions and deletions (indels) in protein sequences. SeqEvolver uses the Benner indel model (Benner, et al. 1993), in which the occurrence of indels but not the average length of indels increases with PAM distance. The indels appear in the mutated sequences as lower-case amino acids for insertions and dashes (-'s) for deleted amino acids.
DNA Mutation ModelTwo models of mutation of DNA sequences are available: a uniform model, in which transitions (A<->G, C<->T) and transversions (A<->T, G<->C) occur with equal probability, and the more realistic biased model, in which transitions occur about three times as often as transversions (States et al., 1991). Currently, no model of indels is implemented for DNA.
Download/FeedbackIf you need to evolve a lot of sequences, you need to use the regular Perl command-line version of SeqEvolver. Here "a lot" here is defined as greater 1 megabyte of total data ($CGI::POST_MAX = 1024*1000 bytes). If you plan on evolving many sequences often, please consider using the command-line version of SeqEvolver on your local machine. You can obtain the command-line version of SeqEvolver here. (You may also find this man page for the command-line version of SeqEvolver useful.) Email me at jtr4v@virginia.edu with comments, praise, complaints, questions, or to obtain a copy of the cgi script to run this web interface on your server.
Benner, S., Cohen, M. and Gonnet, G. (1993) Empirical and Structural Models for Insertions and Deletions in the Divergent Evolution of Proteins. J. Mol. Biol., 229, 1065-1082.
Dayhoff, M.O., Eck, R.V., and Park, C.M. (1972) A Model of Evolutionary Changes in Proteins. Atlas of Protein Sequence and Structure, 5, 89-99.
Jones D.T., Taylor W.R. and Thornton J.M. (1992) The rapid generation of mutation data matrices from protein sequences. Bioinformatics, 8(3), 275-282.
Reese, J.T.*, Wood, T.C.* and Pearson, W.R. (2004) SeqEvolver - Simulating The Evolution of Biological Sequences. In preparation.
States, D., Gish, W. and Altschul, S. (1991) Improved sensitivity of nucleic acid database searches using application-specific scoring matrices. Methods: A Companion to Methods in Enzymology. 3(1), 66-70.
Wood T.C., Pearson WR. (1998) Estimating the Extent of Gene Transfer by Simulating Proteome Evolution. Microbial & Comparative Genomics 3:C-37 Abstract.