From Gene Lists to Function

GOrilla

GOrilla, Gene Ontology (GO) tool for ranked gene lists^* http://cbl-gorilla.cs.technion.ac.il

Analysis of a ranked list:

Chose Homo sapiens as Organism
Chose Single Ranked List
Copy/Paste the gene list from file TF1_RankedList.txt into the input field.
The genes in this list were closets to the ChIP-Seq peaks of TF1 and are sorted by the q-value
Chose All for the Ontology
Click Search Enriched GO terms

The result page will show the significantly enriched GO groups in the context of the ; note the different colors that indicate different enrichment p-values

Analysis of list of genes of interest vs background

Chose Homo sapiens as Organism
Chose Two unranked lists of genes
Copy/Paste the gene list TF1_unranked_interest into the first input field.
Copy/Paste the gene list TF1_unranked_back into the second input field
The genes in list were closest to significant ChIP-Seq peaks of TF1, while the genes in list are all other genes annotated in GO
Chose All for the Ontology
Click Search Enriched GO terms

The result page will show the significantly enriched GO groups in the context of the ; note the different colors that indicate different enrichment p-values

This time, each of the three categories, Biological Process, Molecular Function, and Cellular show enrichment Click on the category name to see the enriched GO groups of each category

FUNC

The FUNC, program tests for enrichment of Gene Ontology groups among a list of genes of interest* http://func.eva.mpg.de/

To run the FUNC program, you need a file that contains a list of gene names, their associate GO terms (GO accessions numbers), and a 0/1 for presence/absence. Galaxy has an option to produce this list of Genes:GO:0/1.

We have prepared some sample files, which you will need to download to your desktop so you can upload the files to FUNC.

HyperGeometric
Test Wilcoxon Text

TF1_ForFUNC_Hyper.txt TF1_ForFUNC_Wilcoxon.txt

TF2_ForFUNC_Hyper.txt TF2_ForFUNC_Wilcoxon.txt

HyperGeometric Test	Wilcoxon Text
TF1_ForFUNC_Hyper.txt	TF1_ForFUNC_Wilcoxon.txt
TF2_ForFUNC_Hyper.txt	TF2_ForFUNC_Wilcoxon.txt

Hypergeometric test

Chose the link Submit a new job
Give your project a name, e.g. TF1_Hyper
You need to enter an email address, but it will not used for any commercials, newsletters etc.
Chose TF1_ForFUNC_Hyper.txt (or TF2_ForFUNC_Hyper.txt) as your input file
The genes is this file are associated with their GO groups; a 1 indicates that they are located nearest to a ChIP-Seq peak of TF1, a 0 marks all other genes.
Chose hypergeometric as your test.
Pick the GO ontology version that you used to annotate your genes (I used September 2009 when I made the file for you)
Enter 5 in the field for Cutoff for number of genes/group (this means that a GO group needs to have at least 5 gene members to be analyzed)
Then click Process file
You should see a message that your file is being process
Make a note of your ticket number, as this will allow you to find your job in case your browser closes before you get the results
Click on the ticket number to see if your results are ready (it will take a few minutes) Download the general statistics file (and if you want to also the also the groupwise statistics) Study the statistics file to find out what the best Significance Level (SL) for the refinement is All categories are significant for overrepresentation; SL 0.001 gives the best ratio of observed/expected number of GO groups for all categories
Run refinement 0.001 → 0.05
This will report GO groups that were significant with p<0.001 before refinement and are p<0.05 after refinement
This will again take a few minutes You can then download 3 (zipped) files, one for each category
Sort the in the and for the p-value after refinement for overrepresentation (last column) to see which GO groups are significantly enriched.

Wilcoxon test

Perform the same steps as above, but give the project a different name, chose TF1_ForFUNC_Wilcoxon.txt as input
The list contains the names of genes closest to a peak in a ChIP-Seq experiment with TF1; genes are associated with their GO group and q-value of the peak
Chose the wilcoxon test
In the statistics file, you are interested in enrichment in the high-ranking genes (i.e. high q-value)
Are any categories significantly enriched for GO groups?
Which SL would you pick for the refinement?
Run the refinement

GSEA

Use the MSigDB (Molecular Signals Database) part of GSEA

Note; GSEA requires free registration in advance for use; you will get an email instantly.

From the menu to the left chose Annotate Gene Sets.
In the left field that says Gene Identifiers, paste the list of genes of interest TF1_unrankedLists.txt from here.
The genes in the interest file were closest to significant ChIP-Seq peaks of TF1.
(You can also examine a "control" data set with the background file.)
In Compute overlaps check C5 (the GO gene sets) and change the output to top 50.
Then click Compute overlap.
Note that only the GO groups marked in green are statistically significantly enriched in your gene set (p-value is calculated using the hypergeometric test)
Now go back and compute the overlap also for C1-C4 You will discover enrichment for other types of gene sets, such as for genes involved in s disease or genes having certain TF binding sites in their promoter
If you go back and use the right Compendia expression you can explore in which tissues or cell types your genes are expressed

Clicking on Excel will let you export your data

Course home page