Using cisfinder

Using cisfinder (Sharov and Ko, 2009); the basics http://lgsun.grc.nia.nih.gov/CisFinder/

**Note cisfinder requires that you first register to use the tool.


  1. Begin with a set of fasta sequences, for example ZNF286A_fdr0_summits_seq.fa. You can either paste in the sequences or upload them as a text file at the top of the page. Once you upload a sequence file it will be stored for you in the cisfinder database for later use.

    You can also enter a second sequence as a control. This is a powerful tool for motif finding, because you can use the control sequence to match special features of your input file that are not related to the motif, per se. For example you can enter control sequences that correspond to the "lowest scoring" (below you cutoff) peaks from a ChIP experiment; or input DNA sequences with similar GC content, etc.

  2. Below the upload window you will find a series of tabs with pulldown menus. The top tab allows you to select the sequence file from which you would like to identify over-represented motifs. The tab below lets you select a control file; other tabs can be used at later stages of the analysis Once you select your sequence file (and control if desired), scroll down below and select "identify motifs". A new page should appear.

  3. Here you can decide whether to:
    1. Include or mask repetitive sequences
    2. Select whether or not you expect multiple motifs or one per sequence
    3. Choose cutoffs such as fdr, motif similarity score etc.
    These choices will give you different results: in particular, the decision to permit repeat sequences or not will influence the final ouput.

  4. Once parameters are selected, select "Show elementary motifs". A new page will appear with a range of predicted motifs, with the most likely appearing at the top. The frequency of the motif is an important factor; other ranking numbers will include the information content ("info"), score and fdr (see the paper to find out exactly how these numbers are calculated). In general, a good ChIP experiment will identify a motif in a large fraction of the submitted sequences. However, the "information content" of the motif will depend upon the TF some TFs do not bind very specific sequences, and if that is true your "info" score may be low.

  5. Now you can go back to the first results page and select "Show clusters of motifs". This feature is a strong plus of the cisfinder program, and allows you to identify frequent pairings of different TFBS that might occur in your data. It can also find longer versions of the elementary motifs.

  6. In the test data set, one very strong, longer motif (26 bp) appears in 85 of the 104 input sequences with a very good score. This motif looks familiar; how can I find out if it resembles a known motif?

    Click on the sequence of the motif, and a new page will open, allowing you to save this motif into your portfolio. Under "add pfm to file" select "New file", then press "save motif". Now reload the first (sequence entry) page.

  7. The third pulldown menu is labeled "Motif file"; from this menu you can select your saved motif. Now scroll down and select "compare motifs".

  8. In the new page, you can select the motif collection to search against and the similarity score; the default database is one compiled by cisfinder from transfac, Jasper and other data and is a good start. Select "compare motifs".

  9. My results show that the favored motif in my chipseq set matches 0.91 with a known motif! Which is it?

ECG Home page