Multiple Sequence Alignment -- Online Resources
Randall F Smith, GlaxoSmithKline Pharmaceuticals
CSHL Computational Genomics Course
¨ PBIL's Tools for
Multiple Alignments: http://pbil.univ-lyon1.fr/alignment.html
An extensive list of
multi-alignment resources, including lists of multiple alignment servers,
software, alignment editors, etc. Appears to be updated more
regularly than the VSNS page below.
¨ Multiple Alignment Resource
WWW Page (VSNS BioComputing Division):
http://www.techfak.uni-bielefeld.de/bcd/Curric/MulAli/welcome.html
Another extensive list
of multiple alignment resources, including on-line tutorials
¨ UBiC Bioinformatics Links Directory: Multiple Sequence
Alignments
http://bioinformatics.ca/links_directory/?subcategory_id=120
An general list of online
resources for bioinformatics, with a section covering multiple alignments
MULTI-PROGRAM SER
¨ MPI Bioinformatics Toolkit: http://toolkit.tuebingen.mpg.de/sections/alignment
Provides web-based access to a number of different multiple alignment programs,
including ClustalW, Kalign,
MAFFT (v. FFT-NS-2), Muscle, ProbCons, and T-Coffee
¨ MIGenAs
(Max-Planck Integrated Gene Analysis System): http://www.migenas.org/home/index.jsp
Web-based access to Muscle, TCoffee, Dialign2,
POA, and
PCMA. Note: to use, click on the “START
TOOLKIT” link and login to the system as “Guest” (no password needed); use the
“Update” button to display results
¨ BCM Search Launcher: Multiple
Sequence Alignments: http://searchlauncher.bcm.tmc.edu/multi-align/multi-align.html .
Web-based access to ClustalW
1.8, MAP, PIMA 1.4, MSA 2.1, and BLOCK MAKER
WEB SER
¨ MSA: Lipman DJ, Altschul SF, & Kececioglu JD (1989) PNAS 86:4412-4415.
¨ PIMA: Smith RF & Smith TF (1992) Protein Engng
5:35-41. (Available via the BCM Search
Launcher, above)
¨ Clustal-W: Thompson, JD, Higgins, DG, & Gibson, TJ (1994) Nucleic Acids Res.
22:4673-4680. (ClustalW
WWW Server at EBI: http://www.ebi.ac.uk/clustalw
;
Clustal-W, Clustal-X
software packages (Most platforms): ftp://ftp-igbmc.u-strasbg.fr/pub
)
¨ MAP: Huang, X (1994) CABIOS 10:227-235 . (Available via the BCM Search Launcher,
above)
¨ Block Maker: Henikoff S, Henikoff JG, Alford WA, Pietrokovski
S (1995) Gene-COMBIS, Gene 163, GC 17-26. (http://blocks.fhcrc.org/blocks)
¨ PRRP/PRRN: Gotoh O (1996) Significant improvements in accuracy
of multiple protein sequence alignments by iterative refinements as assessed by
reference to structural alignments. J. Mol. Biol. 264:823-838. (http://prrn.ims.u-tokyo.ac.jp/)
¨
DCA: Stoye J (1998) Multiple sequence alignment with the
Divide-and-Conquer method. Gene 211:GC45-56 (http://bibiserv.techfak.uni-bielefeld.de/dca/submission.html)
¨ ITERALIGN: Brocchieri L & Karlin S (1998) A symmetric-iterated multiple
alignment of protein sequences. J. Mol. Biol. 276:249-264 (Web site no longer
active).
¨
T-COFFEE, M-COFFEE, 3D-COFFEE (EXPRESSO):
Notredame C, Higgins D, Heringa J
(2000) T-Coffee:
A novel method for multiple sequence alignments. J. Mol. Bio. 302:205-217.
Wallace IM, O’Sullivan O, Higgins D, Notredame C
(2006). M-Coffee: combining multiple
sequence alignment methods with T-Coffee.
Nucleic Acids Research 34:1692-1699.
Armougom F, Moretti S,
Poirot O, Audic S, Dumas P, Schaeli B, Keduas V, Notredame C (2006) Expresso: automatic
incorporation of structural information in multiple sequence alignments using
3D-Coffee. Nucleic Acids Research 34:W604 (http://www.tcoffee.org/)
¨ SAM-T99: Karplus K, Hu B (2001) Evaluation of protein
multiple alignments by SAM-T99 using BALIBASE multiple alignment test set.
Bioinformatics 17:713-720. (http://www.cse.ucsc.edu/research/compbio/HMM-apps/T99-query.html).
¨ PCMA: Pei
J, Sadreyev R, Grishin NV
(2003) PCMA:
fast and accurate multiple sequence alignment based on profile consistency.
Bioinformatics 19:427-428. (ftp://iole.swmed.edu/pub/PCMA/)
¨ ProAlign: Loytynoja A, Milinkovitch MC
(2003) A hidden
Markov model for progressive multiple alignment. Bioinformatics
19:1505-1513. (http://ueg.ulb.ac.be/ProAlign/)
¨ MAVID: Bray
N, Pachter L (2004) MAVID: Constrained ancestral alignment
of multiple sequences. Genome Research 14:693-699. (http://baboon.math.berkeley.edu/mavid)
¨ MUSCLE: Edgar RC (2004)
MUSCLE: multiple sequence alignment with high accuracy and high
throughput, Nucleic Acids Research 32:1792-97. (Home Page: http://www.drive5.com/muscle/; MUSCLE Web Server at EBI: http://www.ebi.ac.uk/Tools/muscle/index.html
; Use an alignment editor, e.g., Jalview, to view
alignment)
¨ Align-m: Walle IV, Lasters I, Wyns L (2004) Align-m -- a new algorithm for
multiple alignment of highly divergent sequences. Bioinformatics 20:1428-1435.
(binaries: http://bioinformatics.vub.ac.be/software/software.html)
¨
¨ POA: Grasso
C, Lee C (2005)
Combining partial order alignment and progressive sequence
alignment increases alignment speed and scalability to very large alignment
problems. Bioinformatics 20:1546-56. (Note: the "POA Online" server
appears to be using the older 2002 version of POA: http://www.bioinformatics.ucla.edu/poa/)
¨ DIALIGN:
¨ PRALINE:
¨
MAFFT: Katoh K, Kuma K, Toh H, Miyata T (2005)
MAFFT version 5: improvement in accuracy of multiple sequence
alignment; Katoh
K, Toh H (2008)
Recent developments in the MAFFT multiple s equence
alignment program. Brief Bioinform 9:286-98. (binaries & source: http://align.bmr.kyushu-u.ac.jp/mafft/software/
)
¨
ProbCons: Do CB, Mahabhashyam MS, Brudno M, Batzpglou S
(2005) ProbCons: Probabilistic
consistency-based multiple sequence alignment. Genome Res. 15:330-40. (http://probcons.stanford.edu )
¨
SPEM: Zhou H,
Zhou Y (2005) SPEM:
improving multiple sequence alignments with sequence profiles and predicted
secondary structures. Bioinformatics 21:3615-21. (http://sparks.informatics.iupui.edu/index.php?pageLoc=Services;
Note: alignments can take hours).
¨
Kalign(2): Lassman T, Sonnhammer,
ELL (2005) Kalign – an accurate and fast
multiple alignment algorithm. BMC
Bioinformatics 6:298-. Lassmann T, Frings O, Sonnhammer EL
(2009) Kalign2 : high-performance multiple alignment
of protein and nucleotide sequences allowing external features (http://msa.sbc.su.se).
¨
PRANK: Loytynoja A, Goldman N (2005) An algorithm for
progressive multiple alignment of sequences with insertions. PNAS 102:10557-10562 (See also comments in
Higgins et al 2005 PNAS 10411-10412); Loytynoja A,
Goldman N (2008) Phylogeny-aware gap placement prevents errors in sequence
alignment and evolutionary analysis. Science 320:1632-5.
(Linux, Mac, Windows binaries and source code:
http://www.ebi.ac.uk/goldman-srv/prank/prank/
)
¨
PSAlign: Sze S-H, Lu Y, Yang Q (2006) A polynomial time
solvable formulation of multiple sequence alignment. Journal
of Computational Biology 13:309-319. (Source code: http://faculty.cs.tamu.edu/shsze/psalign/)
¨
Probalign: Roshan U,
¨
ProDA: Phuong TM, Do CB, Edgar RC, Batzoglou S
(2006) Multiple
alignment of protein sequences with repeats and rearrangements. Nucleic Acids
Research 34:5932. (C++ source: http://proda.stanford.edu)
¨
COBALT COBALT: constraint-based alignment tool for multiple protein sequences
(2007) Bioinformatics 23:1073-9. (Linux executables: ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/cobalt; incorporates CDD, PROSITE
similarities/constraints)
¨
Opal Wheeler TJ, Kececioglu JD (2007)
Multiple alignment by aligning alignments. Bioinformatics 23:559-68. (http://opal.cs.arizona.edu )
¨
RE-MuSIC Chung YS, Lee WH, Tang CY, Lu CL (2007)
RE-MuSIC: a tool for multiple sequence
alignment with regular expression constraints.
Nucleic Acids Research 35 (Web Server issue):W639-44. (http://140.113.239.131/RE-MUSIC
)
¨
PROMALS(3D): Pei J, Grishin NV (2007) PROMALS: Towards accurate multiple sequence
alignments of of distantly related proteins.
Bioinformatics 23:802; Pei J,
Kim BH, Grishin NV (2008) PROMALS3D: a
tool for multiple sequence and structure alignments (PROMALS: http://prodata.swmed.edu/promals;
PROMALS3D: http://prodata.swmed.edu/promals3d/promals3d.php
; Warning: program can be slow)
¨
eProbalign Chikkagoudar S, Roshan U, Livesay D (2007)
eProbalign: generation and manipulation of
multiple sequence alignments using partition function posterior
probabilities (2007) Nucleic Acids Res 35 (Web Server
issue):W675-7. (http://probalign.njit.edu )
¨
GramAlign Russell DJ, Otu HH, Sayood
K (2008) Grammar-based distance in progressive
multiple sequence alignment. BMC
Bioinformatics 9:306. (http://bioinfo.unl.edu/GramAlign.html)
¨
SeqAn Rausch T, Emde AK, Weese D, Doring A, Notredame C,
Reinert K (2008) Bioinformatics 24:187-92. (Linux, Windows
executables: http://www.seqan.de/projects/msa.html
)
¨
FSA Bradley RK, Roberts A, Smoot M, Juvekar S,
Do J, Dewey C, Holmes I, Pachter L (2009) Fast
Statistical Alignment. PLoS Computational Biology.
5:e1000392. (http://orangutan.math.berkeley.edu/fsa/
)
¨
NRAlign Lu Y, Sze SH (2009)
Improving accuracy of multiple sequence alignment algorithms based on
alignment of neighboring residues.
Nuclec Acids Res 37:463-72. (software:
http://faculty.cs.tamu.edu/shsze/nralign
)
EDITORS/VIEWERS/PRINTING UTILITIES
¨ Pfaat: Caffrey DR, Dana PH, Mathur
V, Ocano M, Hong EJ, Wang YE, Somaroo
S, Caffrey BE, Potluri S,
Huang ES (2007) PFAAT version 2.0: a tool
for editing, annotating, and analyzing multiple sequence alignments. BMC Bioinformatics 8:381. (http://pfaat.sourceforge.net)
¨ QAlign: Sammeth M, Rothganger J, Esser W, Albert J, Stoye J, Harmsen D (2003) QAlign:
quality-based multiple alignments with dynamic phylogenetic
analysis. Bioinformatics 19:1592-1593.
(http://gi.cebitec.uni-bielefeld.de/qalign
; Note: This package provides an graphical user
interface for a number of multiple alignment programs, including CLUSTALW, DCA,
DIALIGN, and T-COFFEE).
¨ Jalview: Clamp M, Cuff J, Searle SM, Barton GJ (2004) The Jalview
alignment editor. Bioinformatics 20:426-7. Waterhouse AM, Procter JB, Martin DM, Clamp M,
Barton GJ (2009) [ Jalview
Version 2] Bioinformatics 25:
1189-91. (http://www.jalview.org/index.html).
¨ JAE: Jemboss Alignment Editor: Carver TJ, Mullan LJ (2005) JAE: Jemboss
Alignment Editor. Appl. Bioinformatics 4:151-4. (http://emboss.sourceforge.net/Jemboss/).
¨ Also see the list on the PBIL's Tools Page, above.