Multiple Sequence Alignment -- Online Resources

Randall F Smith, GlaxoSmithKline Pharmaceuticals

CSHL Computational Genomics Course

 

GENERAL RESOURCES

¨       PBIL's Tools for Multiple Alignments:    http://pbil.univ-lyon1.fr/alignment.html

An extensive list of multi-alignment resources, including lists of multiple alignment servers, software, alignment editors, etc. Appears to be updated more regularly than the VSNS page below.

¨       Multiple Alignment Resource WWW Page (VSNS BioComputing Division):

http://www.techfak.uni-bielefeld.de/bcd/Curric/MulAli/welcome.html

Another extensive list of multiple alignment resources, including on-line tutorials

¨       UBiC Bioinformatics Links Directory: Multiple Sequence Alignments

http://bioinformatics.ca/links_directory/?subcategory_id=120

An general list of online resources for bioinformatics, with a section covering multiple alignments

MULTI-PROGRAM SERVERS

¨       MPI Bioinformatics Toolkit:     http://toolkit.tuebingen.mpg.de/sections/alignment
Provides web-based access to a number of different multiple alignment programs, including ClustalW, Kalign, MAFFT (v. FFT-NS-2), Muscle, ProbCons, and T-Coffee

¨       MIGenAs  (Max-Planck Integrated Gene Analysis System):   http://www.migenas.org/home/index.jsp

Web-based access to Muscle, TCoffee, Dialign2, POA,  and PCMA.  Note: to use, click on the “START TOOLKIT” link and login to the system as “Guest” (no password needed); use the “Update” button to display results

¨       BCM Search Launcher: Multiple Sequence Alignments:   http://searchlauncher.bcm.tmc.edu/multi-align/multi-align.html . 

Web-based access to ClustalW 1.8, MAP, PIMA 1.4, MSA 2.1, and BLOCK MAKER

 

 WEB SERVERS AND SOFTWARE PACKAGES FOR INDIVIDUAL PROGRAMS

¨       MSA:                            Lipman DJ, Altschul SF, & Kececioglu JD (1989) PNAS 86:4412-4415. Gupta SK, Kececioglu JD, Schaffer AA (1995) J. Comput. Biol. 2:459-472. (http://www.psc.edu/general/software/packages/msa/manual/manual.html )

¨       PIMA:                          Smith RF & Smith TF (1992) Protein Engng 5:35-41.  (Available via the BCM Search Launcher, above)

¨       Clustal-W:                Thompson, JD, Higgins, DG, & Gibson, TJ (1994) Nucleic Acids Res. 22:4673-4680.  (ClustalW WWW Server at EBI: http://www.ebi.ac.uk/clustalw ;
Clustal-W, Clustal-X software packages (Most platforms): ftp://ftp-igbmc.u-strasbg.fr/pub )

¨       MAP:                            Huang, X (1994) CABIOS 10:227-235 .   (Available via the BCM Search Launcher, above)

¨       Block Maker:          Henikoff S, Henikoff JG, Alford WA, Pietrokovski S (1995) Gene-COMBIS, Gene 163, GC 17-26. (http://blocks.fhcrc.org/blocks)

¨       PRRP/PRRN:          Gotoh O (1996)  Significant improvements in accuracy of multiple protein sequence alignments by iterative refinements as assessed by reference to structural alignments. J. Mol. Biol. 264:823-838.   (http://prrn.ims.u-tokyo.ac.jp/)

¨       DCA:                            Stoye J (1998)  Multiple sequence alignment with the Divide-and-Conquer method. Gene 211:GC45-56 (http://bibiserv.techfak.uni-bielefeld.de/dca/submission.html)

¨       ITERALIGN:           Brocchieri L & Karlin S (1998)  A symmetric-iterated multiple alignment of protein sequences. J. Mol. Biol. 276:249-264 (Web site no longer active).

¨       T-COFFEE, M-COFFEE, 3D-COFFEE (EXPRESSO):

Notredame C, Higgins D, Heringa J (2000)  T-Coffee: A novel method for multiple sequence alignments. J. Mol. Bio. 302:205-217.  
Wallace IM, O’Sullivan O, Higgins D, Notredame C (2006).  M-Coffee: combining multiple sequence alignment methods with T-Coffee.  Nucleic Acids Research 34:1692-1699. 
Armougom F, Moretti S, Poirot O, Audic S, Dumas P, Schaeli B, Keduas V, Notredame C (2006) Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee. Nucleic Acids Research  34:W604  (http://www.tcoffee.org/)

¨       SAM-T99:                  Karplus K, Hu B (2001)  Evaluation of protein multiple alignments by SAM-T99 using BALIBASE multiple alignment test set. Bioinformatics 17:713-720. (http://www.cse.ucsc.edu/research/compbio/HMM-apps/T99-query.html).

¨       PCMA:                         Pei J, Sadreyev R, Grishin NV (2003)  PCMA: fast and accurate multiple sequence alignment based on profile consistency. Bioinformatics 19:427-428. (ftp://iole.swmed.edu/pub/PCMA/)

¨       ProAlign:                  Loytynoja A, Milinkovitch MC (2003)  A hidden Markov model for progressive multiple alignment. Bioinformatics 19:1505-1513.  (http://ueg.ulb.ac.be/ProAlign/)

¨       MAVID:                      Bray N, Pachter L (2004)  MAVID: Constrained ancestral alignment of multiple sequences. Genome Research 14:693-699. (http://baboon.math.berkeley.edu/mavid)

¨       MUSCLE:                   Edgar RC (2004)  MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Research 32:1792-97. (Home Page: http://www.drive5.com/muscle/;  MUSCLE Web Server at EBI:  http://www.ebi.ac.uk/Tools/muscle/index.html ; Use an alignment editor, e.g., Jalview, to view alignment)

¨       Align-m:                    Walle IV, Lasters I, Wyns L (2004)  Align-m -- a new algorithm for multiple alignment of highly divergent sequences. Bioinformatics 20:1428-1435. (binaries: http://bioinformatics.vub.ac.be/software/software.html)

¨       ABA:                            Raphael B, Zhi D, Tang H, Pevzner (2004)  A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res. 14:2336-46. (Linux binary: http://nbcr.sdsc.edu/euler)

¨       POA:                            Grasso C, Lee C (2005)  Combining partial order alignment and progressive sequence alignment increases alignment speed and scalability to very large alignment problems. Bioinformatics 20:1546-56. (Note: the "POA Online" server appears to be using the older 2002 version of POA: http://www.bioinformatics.ucla.edu/poa/)

¨       DIALIGN:                 Subramanian AR, Weyer-Menkhoff J, Kaufmann M, Morgenstern B (2005)  DIALIGN-T: an improved algorithm for segment-based multiple sequence alignments. BMC Bioinformatics 6:66;   Subramanian AR,  Kaufmann M, Morgenstern B (2008)  DIALIGN-TX:  greedy and progressive approaches for segment-based multiple sequence alignment (CHAOS + DIALIGN:  http://dialign.gobics.de/chaos-dialign-submission;  DIALIGN-TX:  http://dialign-tx.gobics.de/submission?type=protein)

¨       PRALINE:                 Simossis VA, Kleinjung, Heringa J (2005)  Homology-extended sequence alignment.  Nucleic Acids Res. 33:816-24.  Simossis VA, Heringa J (2005) Nucleic Acids Research 33:W289-W294. ( http://ibivu.cs.vu.nl/programs/pralinewww/)

¨       MAFFT:                      Katoh K, Kuma K, Toh H, Miyata T (2005)  MAFFT version 5: improvement in accuracy of multiple sequence alignment;   Katoh K,  Toh H  (2008)  Recent developments in the MAFFT multiple s equence alignment program.  Brief Bioinform 9:286-98.  (binaries & source:  http://align.bmr.kyushu-u.ac.jp/mafft/software/ )

¨       ProbCons:                 Do CB, Mahabhashyam MS, Brudno M, Batzpglou S (2005)  ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 15:330-40. (http://probcons.stanford.edu )

¨       SPEM:                         Zhou H, Zhou Y (2005)  SPEM: improving multiple sequence alignments with sequence profiles and predicted secondary structures. Bioinformatics 21:3615-21. (http://sparks.informatics.iupui.edu/index.php?pageLoc=Services; Note: alignments can take hours).

¨       Kalign(2):                Lassman T, Sonnhammer, ELL (2005)  Kalign – an accurate and fast multiple alignment algorithm.  BMC Bioinformatics 6:298-.   Lassmann T, Frings O, Sonnhammer EL (2009)  Kalign2 : high-performance multiple alignment of protein and nucleotide sequences allowing external features (http://msa.sbc.su.se). 

¨       PRANK:                     Loytynoja A, Goldman N (2005)  An algorithm for progressive multiple alignment of sequences with insertions.  PNAS 102:10557-10562 (See also comments in Higgins et al 2005   PNAS 10411-10412);   Loytynoja A, Goldman N (2008) Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis.   Science  320:1632-5. (Linux, Mac, Windows binaries and source code:   http://www.ebi.ac.uk/goldman-srv/prank/prank/ )

¨       PSAlign:                    Sze S-H, Lu Y, Yang Q (2006) A polynomial time solvable formulation of multiple sequence alignment.  Journal of Computational Biology 13:309-319. (Source code: http://faculty.cs.tamu.edu/shsze/psalign/)

¨       Probalign:                Roshan U, Livesay DR (2006)  Probalign: Multiple sequence alignment using partition function posterior probabilities.  Bioinformatics 5-Sept-2006 Advance Access. (C++/C source: http://www.cs.njit.edu/usman/probalign/)

¨       ProDA:                       Phuong TM, Do CB, Edgar RC, Batzoglou S (2006)  Multiple alignment of protein sequences with repeats and rearrangements. Nucleic Acids Research 34:5932.  (C++ source:  http://proda.stanford.edu)

¨       COBALT                     COBALT: constraint-based alignment tool for multiple protein sequences (2007)   Bioinformatics 23:1073-9.  (Linux executables:  ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/cobalt;  incorporates CDD, PROSITE similarities/constraints)

¨       Opal                             Wheeler TJ, Kececioglu JD  (2007)   Multiple alignment by aligning alignments.   Bioinformatics 23:559-68. (http://opal.cs.arizona.edu )

¨       RE-MuSIC                 Chung YS, Lee WH, Tang CY, Lu CL (2007)   RE-MuSIC: a tool for multiple sequence alignment with regular expression constraints.  Nucleic Acids Research 35 (Web Server issue):W639-44.  (http://140.113.239.131/RE-MUSIC )

¨       PROMALS(3D):     Pei J, Grishin NV (2007)   PROMALS: Towards accurate multiple sequence alignments of of distantly related proteins.  Bioinformatics 23:802;   Pei J, Kim BH, Grishin NV (2008) PROMALS3D:  a tool for multiple sequence and structure alignments   (PROMALS:  http://prodata.swmed.edu/promals;
PROMALS3D:  http://prodata.swmed.edu/promals3d/promals3d.php ;  Warning:  program can be slow)

¨       eProbalign               Chikkagoudar S, Roshan U, Livesay D  (2007)  eProbalign: generation and manipulation of multiple sequence alignments using partition function posterior probabilities  (2007)   Nucleic Acids Res 35 (Web Server issue):W675-7.  (http://probalign.njit.edu )

¨       GramAlign               Russell DJ, Otu HH, Sayood K  (2008)   Grammar-based distance in progressive multiple sequence alignment.  BMC Bioinformatics 9:306.  (http://bioinfo.unl.edu/GramAlign.html)

¨       SeqAn                          Rausch T, Emde AK, Weese D, Doring A, Notredame C, Reinert K  (2008)   Bioinformatics 24:187-92.  (Linux, Windows executables:  http://www.seqan.de/projects/msa.html )

¨       FSA                               Bradley RK, Roberts A, Smoot M, Juvekar S, Do J, Dewey C, Holmes I, Pachter L (2009) Fast Statistical Alignment. PLoS Computational Biology. 5:e1000392.  (http://orangutan.math.berkeley.edu/fsa/ )

¨       NRAlign                     Lu Y, Sze SH (2009)  Improving accuracy of multiple sequence alignment algorithms based on alignment of neighboring residues.  Nuclec Acids Res 37:463-72.  (software: http://faculty.cs.tamu.edu/shsze/nralign )

 

EDITORS/VIEWERS/PRINTING UTILITIES

¨       Pfaat:                           Caffrey DR, Dana PH, Mathur V, Ocano M, Hong EJ, Wang YE, Somaroo S, Caffrey BE, Potluri S, Huang ES (2007)   PFAAT version 2.0: a tool for editing, annotating, and analyzing multiple sequence alignments.  BMC Bioinformatics 8:381.  (http://pfaat.sourceforge.net)

¨       QAlign:                      Sammeth M, Rothganger J, Esser W, Albert J, Stoye J, Harmsen D (2003)  QAlign: quality-based multiple alignments with dynamic phylogenetic analysis. Bioinformatics 19:1592-1593.   (http://gi.cebitec.uni-bielefeld.de/qalign ; Note: This package provides an graphical user interface for a number of multiple alignment programs, including CLUSTALW, DCA, DIALIGN, and T-COFFEE).

¨       Jalview:                     Clamp M, Cuff J, Searle SM, Barton GJ (2004)  The Jalview alignment editor. Bioinformatics 20:426-7.  Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ (2009) [ Jalview Version 2]  Bioinformatics 25: 1189-91.  (http://www.jalview.org/index.html).

¨       JAE:                             Jemboss Alignment Editor: Carver TJ, Mullan LJ (2005)  JAE: Jemboss Alignment Editor. Appl. Bioinformatics 4:151-4. (http://emboss.sourceforge.net/Jemboss/).

¨       Also see the list on the PBIL's Tools Page, above.