Clean FASTA file (for MEME)

Clean FASTA files for MEME

This page was set up to "clean-up" sequence downloads from the UCSC browser so that they can be used as inputs to the MEME suite of programs. In particular, MEME requires that each uploaded sequence have a distinct sequence identifier, and the format used by UCSC does not always ensure that the text between the ">" and the first space is unique.

In particular, a UCSC list of sequences can have an identifier like:

>mm9_ct_UserTrack_3545_0 range=chr2:67108861-67108870 5'pad=0 3'pad=0 strand=+ repeatMasking=lower

By default, the clean_fasta script (this web page) will convert those lines to:

>mm9_ct_UserTrack_3545_0_range=chr2:67108861-67108870_5'pad=0_3'pad=0_strand=+_repeatMasking=lower

If the Extract CHR:coordinates from UCSC is checked, then the output descriptions look like:

>chr2:67108861-67108870

This page will not work with very large datasets. A perl script that will work with large data sets is available here: clean_fasta.pl

A. Paste in FASTA sequence

Or upload query from file:

Extract CHR:coordinates from UCSC