Visualizing histone modification data
The purpose of this exercise is to become familiar with how histone modification data can be displayed in the UCSC Genome browser and to understand some of the simple properties of histone modifications that can be observed directly in the browser. We will use histone modification data produced by ENCODE. We are assuming that the initial analysis tasks of mapping reads and identifying "peaks" or broader enriched domains have already been done. We will look at three different histone modifications: H3K4me3, H3K36me3 and H3K27me3. The K4 and K36 methylation are usually associated with activation (I will call them activating marks) and the K27 is most commonly associated with repressed transcription (a repressive mark). The underlying biology is far from clear, and so names like "activating" might ultimately not be completely accurate.
We begin by looking at these marks over an entire chromosome, and we will use the data produced by the Broad Institute for the ENCODE project in normal human skeletal muscle myoblasts (HSMM) cells. The steps to setup the browser tracks are as follows:
- Go to http://genome.ucsc.edu
- Reset the browser with the "reset" button
- Select hg18 and click "submit"
- Select "hide all tracks"
- Select chr2 and zoom all the way out (hit the 10X zoom out button a few times)
- Scroll down to the "regulation" section and click the "Broad Histone" link
- We will focus on HSMM cells (tough to pick a favorite ENCODE cell line -- most are boring!)
- Check only the boxes for H3K4me3, H3K27me3 and H3K36me3 in HSMM
- Set the "peaks" display mode to "dense"
- Looking at the 3 "profile" tracks, right click each (in turn) and select "configure HSMM H3..."
- Configure the appearance of the tracks. Set vertical viewing range max=4, smoothing window=8 pixels
- Turn on the "UCSC Gene" track by setting visibility to "squish" and at the same time uncheck the "show splice variants" box to remove clutter.
At this point you should be able to see profiles of scores along the genome (called "wiggle" tracks) for each of the 3 histone modifications. If you don't think you were successful in following the configuration steps, you can turn on the above setup by simply following this link. The three profiles should not look too different from each other at this level of resolution. Notice also that the profile track scores are more frequently hitting the ceiling we have imposed (i.e. a max score of 4, considering smoothing). These scores are proportional to the amount of data, and in many cases it will not be a good idea to put different tracks on the same scale. At this point we can also see that these three histone modifications tend to be enriched where genes are also enriched.
Now we will look a bit more closely so we can actually see some differences between the different tracks. We will reconfigure the browser as follows:
- Zoom in around a 10Mb region either using the cursor or entering the coordinates in the text box. I will use the region chr2:20,000,000-30,000,000, but you should see similar features anywhere that has a good density of genes.
- Change the display mode for the UCSC Genes to "pack".
- Notice that the "peaks" for H3K4me3 don't seem to match the tops of the profiles very well. Change the H3K4me3 "S" track vertical display range max to 20, and the smoothing window to 2 pixels.
Again, if you had some problems configuring the browser, this link should set you up. There are several things to notice here:
- Gene Deserts: If you used the same interval as me, notice the gene desert to the left: not much action in that region for the histone modifications we have selected -- at least in HSMM cells. If you selected some different interval, check to see if there is a large region without genes, and whether that region is depleted for our histone modifications.
- Complementarity of K36 and K27: Notice that the H3K36me3 and H3K27me3 are roughly complementary. These also tend to correspond roughly to genes.
- Peaks vs. Domains: Notice that the shapes of the H3K36me3 and H3K27me3 profiles are very different from the H3K4me3. The latter appear as "peaks" while the former are much wider.
Now we will look even more closely at the relationships between the three marks, and how each relates to genes. The only reconfiguration step we will take is to:
- Zoom in around a 1Mb region. I will select chr2:26,000,000-27,000,000
And if needed use this link. There are several things to observe:
- Complementarity of K36 and K27: These two marks still look roughly complementary, but we can see some places (for example under ABHD1 and PREB in my interval) where we might suspect the algorithm for identifying enrichment could be more accurate. Algorithms for identifying regions of enrichment generally have much more information than we are able to process visually, but current methods are far from perfect. Places where a region of enrichment seems to pass through an interval with lower values in the profile could actually correspond to intervals of low mappability. These are usually corrected by the algorithm. The example of ABHD1, however, seems easily mappable based on the H3K36me3 track.
- K4 at promoters: Most of the promoters visible in my interval have a sharp peak of H3K4me3. Not all do, however. In the interval I selected, the CIB4 gene has no K4 peak at its promoter. You might also notice that in general the peaks of K4 appears stronger at promoters of genes in regions of H3K36me3 enrichment.
We have looked at differences between histone modifications, but what about differences between cell types? The ENCODE project also produced histone modification data for H1 ESCs. This link will turn on those tracks, and take us to the HOXD cluster. There are several interesting observations we can make about on the histone modifications around these genes:
- The HOXD genes are in a large domain of H3K27me3 enrichment in both HSMM cells and H1 ESCs, with the domain being much longer in the HSMM.
- There are lots of H3K4me3 peaks inside of this K27 domain: these have been called "bivalent" promoters, and are believed to be "poised" for expression. The K4 mark at promoters is typically associated with active states, and the K27 with repression. Put them together and you have something that could become active in response to a very precise signal. At least that's the theory.
- Notice to the left and right of the HOXD cluster are the KIAA1715 and MTX2 genes. Both are longer than 50 kbp, and both are separated from the HOXD cluster by more than 50 kbp.
- The intergenic space flanking the HOXD cluster is enriched for K27 in the somatic cell (HSMM) but not the stem cell, where the K27 seems to end at the boundaries of the HOXD cluster. Similarly, looking past the flanking genes the K27 appears again in HSMM, but not in the H1 ESCs.
- The flanking genes (KIAA1715 and MTX2) are covered in clear domains of H3K36me3 in the HSMM, which extend almost exactly from their TSS to TTS. Although there is some enrichment in the stem cell for K36 along the bodies of these genes, it is very weak.
ECG Home page