Genome Browsers -- UCSC

The goal of this exercise is to gain some experience with the UCSC Genome browser (genome.ucsc.edu) and looking at features in the human genome.

First, we will examine the human GSTA gene cluster. For Monday, you will examine a gene picked at random.

  1. Go to the UCSC Genome browser UCSC and find the human GSTA1 gene.
    1. Select "human" as the organism and enter GSTA1 in the Position/Search Term box on the right side, and click on GO.
    2. Note that several sources for genes are listed: Gencode Genes, NCBI RefSeq Genes curated, etc. Look at the genomic coordinates of several of those genes. Are they all on the same chromosome? Do some of them share the same genomic coordinates?

      Pick the NCBI Refseq gene linked from NM_145740 and click on it to see the genome browser)

  2. Using the genome browser view to explore GSTA1 / NM_145740: (Hint: some of these questions can be answered more easily by clicking on the GSTA1 / NM_145740 label and then selecting: View details of parts of alignment within browser window and going to the bottom (together) of that page.)
    1. What chromosome is GSTA1 on?
    2. What is the size of the browser window (in nucleotides). How long is the gene?
    3. What is the begining coordinate of the gene?, the end coordinate?
    4. Is GSTA1 transcribed from left-to-right or right-to-left?
    5. Looking at the GSTA!/NM_145740 line, mouse over the different blocks and lines (with tiny arrows). What do the blocks correspond to? Note that the increasing/decreasing numbers for the exons should agree with your answer on transcription left-to-right or right-to-left?
    6. Look up the NM_145740 mRNA at the NCBI. How long is the mRNA?
    7. What fraction of the GSTA1 gene is found in the NM_145740 mRNA? (Hint: How long is the gene? How long is the mRNA?)
    8. What is the difference between the GSTA1/NM_145740 transcript and the GSTA1/NM_001319059 transcript? (Mose over the different region to identify it.)
    9. How many exons does the NM_145740 transcript have?
    10. How many protein coding exons?

  3. Zooming in to see the structure of the gene
    1. Using the zoom-in and '>>' (move right) buttons, zoom in so that you can see the beginning of the gene. Alternatively, enter the coordinate chr6:52803741-52803840 into the enter position, gene symbol, ... text box and press the "go" button.
    2. Do you see the start of the RNA transcript? What is the first nucleotide of the RNA transcript? The second?
    3. Do these nucleotides match the sequence of the NM_145740 mRNA? If not, why not?
    4. Zoom out to check to see if the first exon of the GSTA1 gene codes for any part of the protein?
    5. If not, which exon contains the initiation codon? At what nucleotide is the "ATG"?
    6. Approximately how long is the first intron?
  4. Zooming out to see the structure of the genome region
    1. Use the zoom out option to see the next closest GSTA gene. Now how much DNA are you seeing?
    2. Use the << to move to the left (towards the GSTA2 gene). How much DNA are you seeing?
    3. Expand out the view of the GSTA1 cluster until you can see all five GSTA1-5 genes, and the neighboring non-GSTA genes. How long is the DNA sequence range being displayed?
    4. Approximately how long is the spacing between the GSTA genes? Between the GSTA cluster and the non-GSTA genes.
  5. Functional (ENCODE/Conservation) information.

    Go back and search for the GSTA1 gene, so that the gene fills the entire window. On the default display, there are three sections below the gene intron/exon map that provide "functional" information about this region of the genome: (1) Gene Expression in 54 tissues; (2) Encode Candidate Cis-Regulatory Elements (cCREs) and H3K27Ac Mark; and (3) conservation information: 100 vertebrate Basewise Conservation, Cons 100 Verts, and Multiz alignments.

    1. Click on the Gene Expression in 54 tissues panel to follow the link. What are plotted on the x-axis? The y-axis? Where is GSTA1 mRNA expressed?
    2. Click on the yellow box in the ENCODE panel. What does that yellow box indicate?
    3. Looking at the Cons 100 Verts, where do the peaks on that plot align with respect to the overall gene structure? Do all the peaks line up?
    4. (optional) Why do you think the mouse row is missing from the MultiZ alignments?
  6. For your homework for Monday, pick a random gene from the Random Gene Set Generator and, individually, answer the questions in the Genome Browser assignment.

Course home page