New: Annotation features available for Uniprot/SwissProt/PIR1 library searches.

Sequence Library Downloads

Downloading and Installing FASTA3

Downloading Sequence Libraries

Protein and DNA sequence library files can be downloaded from many different sources, including the NCBI and EMBL-EBI.

Library formats

The FASTA programs work with many different library formats; you will not need to run file conversion programs or formatting programs to search sequence libraries with FASTA. However, the FASTA programs assume that libraries are in FASTA format; to search libraries in other formats, the format type must be specified with the file name, e.g.

fasta36 -q mgstm1.aa "/slib/ncbi/refseq_protein 12"

would search the NCBI refseq_protein library in NCBI/BLAST formatdb format.

Supported popular library formats include:

FormatDescription
0 (default) FASTA format
1 Genbank flatfile
3 EMBL-EBI/Swissprot flatfile
5 GCG/PIR flatfile
6 GCG compressed binary
12 NCBI BLAST formatdb version 2 (current version)
16 MySQL SQL query
17 PostgresQL SQL query

Protein and DNA sequence databases

Today, there is little reason to choose one sequence database provider over another - particularly for DNA sequence libraries, which are synchronized nightly between NCBI, EMBL-EBI, and the DDBJ. For protein sequence libraries, both NCBI and EMBL-EBI offer very comprehensive, but very redundant collections of protein sequences, e.g NCBI NR and EMBL-EBI/PIR Uniprot, but both groups also offer much higher quality curated databases, e.g. NCBI refseq_protein and EMBL-EBI SwissProt. Because sequence similarity searches are more sensitive when smaller databases are used, it makes the most sense to search a smaller, higher quality database first, and then search more comprehensive databases only if no significant similarities are found in the initial search.