For this workshop, you will use the NCBI GEO2R tool to run a simple differential gene expression analysis on a dataset of your own choice found from GEO. Use the "Advanced Search Builder" GEO query function to select:
Now, paste the GEO Dataset accession number into GEO2R:
https://www.ncbi.nlm.nih.gov/geo/geo2r/
Use the GEO2R application to setup two experimental groups relevant to the study you chose, and assign each sample into the appropriate group. Click the "Top 250" button to see the table of most differentially expressed genes (rank sorted by statistical significance). Look at raw P. values vs. the adj. P values and consider whether you found any genes of "interest".
Download the tabulation of all results by clicking "save all results".
Import this tab-separated file into Galaxy, where you will use the Galaxy text transformation tools to cut out columns of interest, remove unwanted header rows, remove unwanted leading/trailing quotes, etc.
What is the relationship between FDR and P values in your results? Do you have any significant DEGs? What is the relationship between the fold change and significance of true DEGs? of non-DEGs?
http://cbl-gorilla.cs.technion.ac.il/
Run the analysis by choosing "All" ontologies in Step 4, and consider changing the P value threshold to 1e-4 or even lower; click the "show in REViGO" checkbox. This "single list" analysis will use all of the genes, ranked by statistical significance to look for over-representation, regardless of DEG status or FDR; once you've completed this and answered the questions below, go back and repeat the GOrilla analysis with two separate gene lists, one for DEGs better than some FDR threshold of interest (say 10%), and one list with all other genes (the "background" list).
Here is an example some differentially expressed genes: ORA_results.tabular
If you have problems getting the gene lists via Galaxy, you can use this file: ORA_target.tabular for the target gene set, and ORA_background.tabular for your background. Inspect the GO enrichment plots; do you see "crosstalk" between closely related/nested terms? For the most significantly enriched terms, what is the extent (magnitude) of the enrichment; does the GO term and assoc- iated gene list suggest candidate hypotheses to you?
Follow the "Visualize output in REViGO" link to see a different representation of the GOrilla enrichment results; are the enriched terms very different, semantically?
Does Reactome generate the same biological hypotheses as GOrilla? Using your powers for biological insight, can you rationalize the differences?
Computational Genomics Home Page