Using DAVID for GO and pathway enrichment analysis

For advanced help please see tutorial on the website (david.abcc.ncifcrf.gov/) and Huang da W, Sherman BT, Lempicki RA., Nat Protoc. 2009;4:44-57).


  1. Upload or paste a gene list

    To start DAVID, first click on "Functional Annotation" under "Shortcut to David tools" at the left of the home page. This will take you directly to the "Upload" Tab of the functional annotation page.

    To upload a file, you can either paste a list of gene identifiers into the window, or upload a file with gene identifies saved as a txt file. The identifiers can be a variety of things:

    Official gene names; affymetrix, etc probeset IDs; ensembl gene names, etc. Open the tab under "step 2: Select Identifier" to see the full list.

    *Note that whatever you use it must be the official identifiers from one of those lists, if the names are misspelled or etc. David will not recognize them.

    You can also click on the link "Upload Help" to get further instruction on composing the gene list.

    1. For this exercise, we will use a gene list from human, with official gene symbols as the identifier DAVID-genelist.html. You can either upload the txt file by browsing or just past the gene names into the window.

    2. Now, since these are gene names, we need to select "official_gene_symbol" from the pulldown menu under "step 2: select identifier".

    3. Under "Step 3", select "Gene list"

    4. Then, click "Submit" to submit the list (Step 4).

    5. A new window will open saying

      "Please note that multiple species have been detected in your gene list. You may select a specific specie(s) with the List Manager on the left side of the page by highlighting the specific specie(s) and pressing the "Select" button&."

      (This happens whenever you submit identifiers, like gene names, that could apply to many species).

      Click OK to proceed.

    6. In the top window at the left side of the screen will be a list of species with a number in parentheses (e.g. 81). That signifies the number of your identifiers that correspond to a unique gene in that species; the species with the largest numbers always appear first.

      Since these are human genes, select "Homo Sapiens" and then the button "Select species". Note that you can select another species instead and the results might be different. We will try that, below.

    7. Next, open the "Background" tab to select the background against which this gene set will be analyzed. Note that if you used, e.g. Affy U133A arrays, your "background" will be different than if you searched the whole genome (RNAseq) or used another type of array. That is because the array only contains  and so you can only measure  genes that are represented on the array. This will affect your statistics. So It is generally important to select the appropriate background.

      For this dataset, select from the "Affymetrix 3IVT Background" list, the Human Genome U133A 2 array.

      Note that you can also enter your own "background" e.g. in case you have a specialized array, or a species with only some orthologs mapped to established model organisms.


  2. Analyze the dataset

    Now that you have the gene list uploaded and the background selected, you can begin the functional analysis. In the center of the page, you will see at top

    1. A series of links with a + at the lefthand side. If you click on the + it will open that section and show you what the default selections are and offer you more choices. For example, click on "Pathways"; there are several pathway databases that will be incorporated into the functional search. Select "Reactome_Pathway" in addition to default for this exercise.

    2. A series of buttons "Combined view for Selected annotation". For this exercise, select the top button "Functional annotation clustering"

      A new page will appear with a hot-linked table. In this version of David, the GO and other terms are clustered together based on functional relatedness, to give you an overall enrichment for the set of functional groups, rather than the individual terms. This clustering algorithm is a major benefit of using DAVID.

      Note that you can also obtain a table of individual GO category enrichments by clicking on "Functional annotation chart" instead. These two formats can lead you to different conclusions, so it can be interesting to view both.

    3. At the top right hand side, you will see a link "Download File". This will allow you to download a version of the chart with the clusters, the component categories, the enrichment factor and p-value for each, and a list of the genes that fall into each category. The enrichment factor score is important (high is better) but any category with a small number of genes is always a bit suspect.

  3. Analyze using data from another species

    1. Now, go back to your "List" main page and select "Mus Musculus" instead of human as the species. You can see that 70 of your 81 gene names are mapped to this species.

      You also need to change the background on the top "Select a Background" list in the Background tab. Highlight Mus Musculus on the list, then Select the "Use" button below that top window.

    2. Next, select "Functional Annotation Clustering" in the center of the page ("Combined view for Selected Annotation").

      What are the similarities and differences in the annotation?

      Please note that different species may be better annotated than others for certain functions, and it is often useful to examine other species to get a more detailed view. For example, mouse is much better annotated for developmental functions than human. Every time you switch species, you will lose some genes, so this only works well for large gene lists.


  4. Inferring functional change

    From these functional categories, what would be your guess as to the overall function of the genes in this dataset?


ECG Home page