From Gene Lists to Function


GOrilla
GOrilla, Gene Ontology (GO) tool for ranked gene lists* http://cbl-gorilla.cs.technion.ac.il

Analysis of a ranked list:

  1. Chose Homo sapiens as Organism
  2. Chose Single Ranked List
  3. Copy/Paste the gene list from file TF1_RankedList.txt into the input field.
  4. The genes in this list were closets to the ChIP-Seq peaks of TF1 and are sorted by the q-value
  5. Chose All for the Ontology
  6. Click Search Enriched GO terms

The result page will show the significantly enriched GO groups in the context of the ; note the different colors that indicate different enrichment p-values


Analysis of list of genes of interest vs background
  1. Chose Homo sapiens as Organism
  2. Chose Two unranked lists of genes
  3. Copy/Paste the gene list TF1_unranked_interest into the first input field.

    Copy/Paste the gene list TF1_unranked_back into the second input field

    The genes in list were closest to significant ChIP-Seq peaks of TF1, while the genes in list are all other genes annotated in GO

  4. Chose All for the Ontology
  5. Click Search Enriched GO terms

The result page will show the significantly enriched GO groups in the context of the ; note the different colors that indicate different enrichment p-values

This time, each of the three categories, Biological Process, Molecular Function, and Cellular show enrichment Click on the category name to see the enriched GO groups of each category


FUNC
The FUNC, program tests for enrichment of Gene Ontology groups among a list of genes of interest* http://func.eva.mpg.de/

To run the FUNC program, you need a file that contains a list of gene names, their associate GO terms (GO accessions numbers), and a 0/1 for presence/absence. Galaxy has an option to produce this list of Genes:GO:0/1.

We have prepared some sample files, which you will need to download to your desktop so you can upload the files to FUNC.
HyperGeometric
Test
Wilcoxon Text
TF1_ForFUNC_Hyper.txtTF1_ForFUNC_Wilcoxon.txt
TF2_ForFUNC_Hyper.txtTF2_ForFUNC_Wilcoxon.txt


Hypergeometric test

  1. Chose the link Submit a new job
  2. Give your project a name, e.g. TF1_Hyper

    You need to enter an email address, but it will not used for any commercials, newsletters etc.

  3. Chose TF1_ForFUNC_Hyper.txt (or TF2_ForFUNC_Hyper.txt) as your input file

    The genes is this file are associated with their GO groups; a 1 indicates that they are located nearest to a ChIP-Seq peak of TF1, a 0 marks all other genes.

  4. Chose hypergeometric as your test.
  5. Pick the GO ontology version that you used to annotate your genes (I used September 2009 when I made the file for you)
  6. Enter 5 in the field for Cutoff for number of genes/group (this means that a GO group needs to have at least 5 gene members to be analyzed)
  7. Then click Process file

    You should see a message that your file is being process

  8. Make a note of your ticket number, as this will allow you to find your job in case your browser closes before you get the results
  9. Click on the ticket number to see if your results are ready (it will take a few minutes) Download the general statistics file (and if you want to also the also the groupwise statistics) Study the statistics file to find out what the best Significance Level (SL) for the refinement is All categories are significant for overrepresentation; SL 0.001 gives the best ratio of observed/expected number of GO groups for all categories
  10. Run refinement 0.001 → 0.05

    This will report GO groups that were significant with p<0.001 before refinement and are p<0.05 after refinement

    This will again take a few minutes You can then download 3 (zipped) files, one for each category

  11. Sort the in the and for the p-value after refinement for overrepresentation (last column) to see which GO groups are significantly enriched.

Wilcoxon test
  1. Perform the same steps as above, but give the project a different name, chose TF1_ForFUNC_Wilcoxon.txt as input

    The list contains the names of genes closest to a peak in a ChIP-Seq experiment with TF1; genes are associated with their GO group and q-value of the peak

  2. Chose the wilcoxon test
  3. In the statistics file, you are interested in enrichment in the high-ranking genes (i.e. high q-value)
  4. Are any categories significantly enriched for GO groups?

    Which SL would you pick for the refinement?

  5. Run the refinement


GSEA

Use the MSigDB (Molecular Signals Database) part of GSEA

Note; GSEA requires free registration in advance for use; you will get an email instantly.

  1. From the menu to the left chose Annotate Gene Sets.
  2. In the left field that says Gene Identifiers, paste the list of genes of interest TF1_unrankedLists.txt from here.

    The genes in the interest file were closest to significant ChIP-Seq peaks of TF1.

    (You can also examine a "control" data set with the background file.)

  3. In Compute overlaps check C5 (the GO gene sets) and change the output to top 50.
    Then click Compute overlap.

    Note that only the GO groups marked in green are statistically significantly enriched in your gene set (p-value is calculated using the hypergeometric test)

  4. Now go back and compute the overlap also for C1-C4 You will discover enrichment for other types of gene sets, such as for genes involved in s disease or genes having certain TF binding sites in their promoter

    If you go back and use the right Compendia expression you can explore in which tissues or cell types your genes are expressed

Clicking on Excel will let you export your data


Course home page