This exercise is taken from "Galaxy 101" on the public server. For a very useful basic tutorial for Galaxy: https://usegalaxy.org/u/aun1/p/galaxy101.
We have modified the tutorial Instructions to reduce redundancies and to be compatible with the command line exercise later. So please follow these instructions, not the Galaxy tutorial instructions.
For this exercise we will use a ChIP-seq dataset for CTCF in the murine G1E cell line. This is a sample ChIP-seq dataset generated using an antibody to the transcription factor CTCF. Reads have been reduced to those mapping to chr19 for demonstration use.
To get started, get the data files, G1E_CTCF.fastqsanger and G1E_input.fastqsanger from the shared folder. Copy them into a folder in your home directory named chip-data.
On the MAC simply open the terminal under Go >> Utilities >> Terminal
At the prompt, type in:
# Change to your home directory $ cd # Make a new directory in your home directory $ mkdir chip-data # Copy the original chip data into your home directory $ cp /ecg/data/2014/chip/*.fastqsanger ~/chip-data/Download the files, then upload them from your computer into the ecg2014 Galaxy instance
When the file is finished uploading, click on the eye icon on the right panel to check the file contents. You should see files in the "fastq" format.
Step 1: Map these reads to a reference genome.
Use the "NGS: Mapping > "bowtie" tool. You will need to change the reference genome build you are mapping against to "Mus musculus (mm9, (UCSC, full))" and be sure the original input file appears in the fastq file toggle. Otherwise for this first try, you can leave the default mapping options.
However: you should take a look at the potential parameter settings you can use. Toggle "full parameter list" to have a look. Scroll down below the window for running Bowtie to find a description of these parameters and the output.
Also, click on the "Bowtie on data 1 aligned reads" label in the right side panel, to open up a window with descriptive information.
In the case of the Bowtie output you cannot see the output data by clicking the "eye" icon. This will prompt you to download the file instead. This is the BAM format.
It is not really necessary (or even possible) to read the BAM format, it is a binary encoding designed to be fed into other programs, like peak mappers.
NOW repeat the Bowtie mapping process with the input chromatin control for this sample, G1E_input.fastqsanger. You will need this for the peak finding exercise.
Follow the same steps as Step 1 on this sample.
** note the mapping program BWA is also available on galaxy, and is also very easy to run. This program is an alternative to the original Bowtie because it was a little bit less sensitive to mismatches; but it is not much different than Bowtie. BWA output can also be fed directly into MACS exactly as Bowtie can.
We will repeat this exercise on the command line later and save the wiggle file for upload into UCSC. However, it takes time to run MACS with the wiggle file option, so we won't do it now.
Otherwise the default values should be reasonable. We will discuss some of the MACs parameters in a future class.
Download the peaks.xls output from this file and compare it to the one you got with the genomic input control. What is different?