From Coordinates to Genes
Creating a gene list starting from peak locations
James Taylor has created a workflow on galaxy which you can import and share:
http://main.g2.bx.psu.edu/u/james/w/workflow-from-ucsc-genes-and-symbols
This workflow allows you to take a set of genome coordinates, in bed format, and retrieve a list of gene names for all of the nearby and flanking genes.
What you need to upload to use in this workflow:
-
A set of background UCSC genes : upload directly into Galaxy using the Get Data link to UCSC Main. This will take you to the UCSC Table Browser.
-
In the Table Browser, select your species and the genome build that matches the bed coordinates of your peaks.
-
Select group:
"Gene and Gene prediction tracks"
track: UCSC Genes;
table: KnownGene
-
Make sure genome is checked under region (should be by default).
-
Set output format as BED-browser extensible data and check Send output to Galaxy (will happen by default if you link out of Galaxy).
-
Then select get output to send the table to Galaxy.
-
A file that translates UCSC gene names to standard gene symbols (this is the input required for most functional programs).
To do this:
-
In Galaxy, Get Data again from UCSC Main.
-
In the Table browser, repeat steps b and c above; except for the Table, select kg X Ref at the bottom of the drop down menu.
-
In output format, select selected fields from primary and related tables and select get output
-
This time a new page will open up; scroll down to hg19.kgXref fields and select both (1) kgID and (2) geneSymbol, then scroll to the bottom of the page and click Allow selection from ... .
-
Then scroll back up to the top section and select done with selections
-
A new page will come up; select send query to galaxy.
-
In Galaxy, import Jame's workflow and select run to start the program.
-
In the first field, select the table you created in 2, above
-
In the second field, select the table you created in 1, above
-
In the third field , select your data set bed file
-
You need to upload this first
-
Remember BED file is a tab-separated list of chromosome, start and stop positions for your peaks, saved as plain text:
-
Click run workflow; if all goes well you should end up with a simple list of gene names.
Exercise:
You can try this using the human Hg19 genome build and the TF1-top50.bed file provided for the meme exercise.