Genome Browsers -- UCSC and IGV

The goal of this exercise is to gain some experience with the UCSC Genome browser (genome.ucsc.edu), and the Integrated Genome Viewer (IGV).

  1. Go to the UCSC Genome browser UCSC and find the human GSTM1 gene.
    1. How many different versions of the human genome are available?
    2. Which one are you using?
    3. Pick one of the human genes (e.g. Refseq NM_000561, and click on it to see the genome browser)

  2. Zooming in and out:
    1. How much Human Genomic DNA are you seeing? How many GSTM genes?
    2. Use the zoom out option to see the next closest GSTM gene. Now how much DNA are you seeing?
    3. Use the << to move to the left (towards the GSTM2 gene). How much DNA are you seeing.
    4. All the GSTM1-5 genes have 8 exons, with the termination codon in the 8-th exon. GSTM2 is also annotated to have a 9th exon. Click on the exon to see the evidence.
    5. Expand out the view of the GSTM1 cluster until you can see all five GSTM1-5 genes. How long is the DNA sequence range being displayed?
  3. Use the options below the genome display to turn some of the lanes on and off.
    1. In the section titled Genes and Gene Predictions, turn on Augustus, CCDS, Geneid Genes, and Genescan Genes. Do you see any additional genes? Do some of the genes have a different structure? What has changed? CCDS are probably the most reliable gene predictions. Which gene models differ from CCDS?
    2. Take a look at the GSTM1/GSTM2 predictions using the NCBI genome browser.. Which GSTM2 transcripts does NCBI support? Which GSTM1 transcripts?
    3. Looking in the Regulation section, turn on Encode Regulation. What new lanes appear?
  4. Download some sets of data from the UCSC browser.
    1. Select the Tools menu option from the top of the page, and select Table Browser.
    2. Use the group: drop-down menu to select regulation
    3. Without specifying an output file, use get output to download the coordinates of the CpG islands. How many are there? Do they agree with the map you were looking at? Download the CpG islands to a file using GTF format (be certain to name the file ".gtf").
    4. Also look at the Layered H3K4Me1 track. This data is in a different format (wiggle) for displaying continuous curves. Download it to a wiggle (".wig") file.
  5. Download the Integrated Genome Viewer from IGV Downloads
    1. Which version of the Human Genome assembly are you using? (Different versions have different associated annotation information. hg18 and hg19 seem to have much more associated data than hg38.
    2. Again, look up the GSTM1 gene. How many tracks do you see (where is the GSTM1 gene)?
    3. Mouse over some of the exon boxes to see the additional information available.
    4. Either click on the red box on the chromosome, or on the zoom scale on the upper right, to zoom in on the gene. What is the greatest amount of detail you can see?
    5. Zoom out from the gene using the zoom scale until you can see a non-GSTM gene in the gene view.
    6. Add some additional tracks using the File Menu to Load from Server. If you are using hg38, Click on the Annotation box, then add some Annotation tracks (e.g. Common SNPs and PhastCons from Annotations). If you are using hg18 or hg19, you will see Annotations/Phenotype and Disease Associations.) Make sure to add one or two lanes at a time, not sets of lanes.

      Zoom into the GSTM1 gene, and mouse-over some of the SNPs. What information is available?

    7. Load some data from File/Ga4gh/Google/1000Genomes/HG01440.

      Look for the GSTM1 gene again, then zoom in to see the boxes in the HG01440 panel, and mouse over the boxes. What are they showing?

      Move left and right to the adjacent GSTM genes. How many boxes do you see? Why do you think there are so many fewer boxes in the GSTM1 region? If you are correct, why are there any boxes?

    8. Add some of the data files you downloaded from UCSC by using the File and Load from File menu.

Course home page