Identifying conserved DNA sequences and transcription factor binding sites in two or more related segments of DNA


zPicture and Mulan: "do it yourself" alignments for sequences up to 1Mb each:
zpicture.dcode.org
mulan.dcode.org/

zPicture and Mulan produce alignments between chromosome regions (up to 1 Mb each) that can be used to identify exons and other conserved functional regions.

The ECR browser has precomputed alignments for many vertebrate species with access to Rvista (see below).


Conserved regions from Drosophila - eyegone (eyg)
  1. At the MULAN site, enter 3 for the number of sequences to align.

  2. MULAN can align two types of data, FINISHED sequences, where all of the sequences are contiguous, and DRAFT, for multiple, not necessarily ordered or aligned, sequences are input.

    Click the vertical SELECT link on the left center for FINISHED sequences.

  3. On the sequence entry page, MULAN can download genome sequences from UCSC. Use the upload link under SEQUENCE 1 (and later SEQUENCE 2 and SEQUENCE 3) to enter the following coordinates:
    Organismcoordinates
    1.D. melanogaster (dm3, April 2006 ) chr3L:12,450,477-12,474,838
    2.D. simulans (droSim1)chr3L:11832451-11856670
    3.D. Virilis (droVir2)scaffold_13049:9,296,946-9,329,999
    You must press TWICE for each coordinate entry.

    After you upload the third sequence, the program will begin its alignment.

  4. Before showing the multiple alignment output, MULAN asks you to check the tree. The default tree is appropriate (for three taxa, there is only one unrooted tree), but you can see a more sensible looking tree by using (seq3, (seq1, seq2). Approve the tree and continue.

  5. After the analysis is done, you have the option to view the 3-way alignment (Dynamic visualization, top panel), or each of the two 2-way alignments. 2-way alignments can be visualized using either a PIP-plot (Pairwise dynamic plots) or a Dot-plot.

    Look at the Dot-plot first, and compare D. melanogaster (seq1) to D. virilis (seq 3). Identify regions in D. melanogaster that are missing in D. simulans. You can also compare D. melanogaster and D. virilis, but they are very similar.

  6. Now go back and examine the Dynamic visualization (click on the PIP plot). Identify each of the lines in the graphic (where are the ECR's plotted?, where are the genes?).

  7. In addition to highlighting conservation between two sequences, MULAN (and zPicture) can search for transcription factor binding motifs (but only for sequences that are less than 1 Mb. Go back to the main results page and click MultiTF. Select the insects transcription factor set, select all, and submit

    Look at how transcription factor binding sites line up with conserved regions.


Conserved regions from vertebrates - Tbx18
  1. At the zPicture site, click on SEQUENCE 1 / Upload, select the human genome, hg18 (March 2006) version, and enter the Position:
    chr6:85500876-85530618
    and Submit and then Submit again.

    For SEQUENCE 2, select Upload and select mouse genome mm9 (July 2007), Position:
    chr9:87,599,034-87,626,095

    If you do not know the coordinates of your region of interest, you can identify syntenic homologous regions by using BLAT at UCSC to align your own DNA sequences to a UCSC genome (human, mouse, rat, etc.) BLAT is limited to 25 Kb, so you need to truncate the sequence coordinates.

  2. Again, use the Dot-plot option to examine the overall synteny between the two regions, focussing on insertions and deletions. Then use the Dynamic display to examine the PIP plot. Try to identify the insertion/deletion regions in the PIP plot.

    Note the correspondence/non-correspondence between ECRs (extremely conserved regions) and exons. You will not see conservation in repeat regions, because they have been masked out.

    Invert (Base-top switch) the reference genome and note the changes in the plot.

  3. You can also use rVISTA to look for regulatory sites in your alignment. rVISTA is like MultiTF in MULAN. For the human/mouse (or human/chicken, below), you should use the vertebrate set of transcription factors.

  4. For a more distant comparison, examine chicken. For SEQUENCE 2, select Upload and select chicken, Position:
    chr3:79,654,026-80,251,165
    and Submit and then Submit again.

    1. How well are exons preserved between human and chicken?
    2. Are there well-conserved regions that do not map to human exons? Select one of the longer conserved regions, click on it, and see a FASTA file of the aligned region. Try taking that file and blastx/fastx'ing it against the swissprot, human, or refseqprotein database, to see if this conserved region might code for a protein.


The zPicture program can also be used to align/order/orient reads against a reference genome. Pick your favorite gene region of interest, and compare it to a annotated region from a related genome. For bacterial sequences, you should be able to download gene sequences from the ENSEMBL site, and upload them to zPicture or MULAN


Course Home Page