Summary - Run 'R', do some basic data manipulations, import a dataset and plot some results.
To run these excercises, you will need to have the 'R' statistics package installed on your laptop computer:
numusing the sequence generating operator
:. Look at the vector.
lg?) by specifying conditions using comparison operators
lg1 <- >15,
lg2 <- >=16,
lg3 <- <12, etc. Look at the logical (
lg1...lg?) vectors. Use the logical vectors to extract entries from
charusing the concatenate function
c(...)with the strings 'R', 'Python', 'Bioconductor', and 'RNA-seq'. Use this vector to create 2 logical vectors,
lg8, using the comparison operators
!='R'. Again, look at the vectors and use them to extract the contents of
mix1that contains values with a decimal point and integers using the
c(...)function. View the types of all the vectors you have produced using the
mode()function, or the
mix2that contains values with a decimal point, integers and characters with the
c(...)function. What type of vector is produced? Again, check by typing
mix2and hitting ``enter'' on your keyboard as well as using the
mix2using negative indexes together with the
:operator and the
num + num,
num - num. Try doing some arithmetic on the
matrix()function and filling in the values by row first.
mat <- matrix(num,ncol=5,byrow=T)
Look at the matrix, look at
Extract the full first row.
Extract the full fourth column of
Extract all rows and the 4th and 5th columns of
mat using the
: operator and
lgmby checking to see which elements in the first row of
lgmto extract the columns of
mat + mat,
mat - mat.
ExpListwith three components: ExpLevel (3 numeric elements), Exp (3 logical elements with at least one
TRUE) and GeneName (3 character elements).
ExpList <- list(ExpLevel=c(1,2,3), Exp=c(F,T,T), GeneName=c("p53","cMyc", "GSTM1"))View
$operator and single brackets,
ExpList. Do you notice any differences in the outputs?
ids, e.g "Gene1", "Gene2", "Gene3".
help(as.data.frame). Read the help page.
as.data.frameon the list
ExpListto generate a data frame named
ExpDatawith row names
ExpDatausing indexes. Use the
$operator to extract the Exp column.
x <- seq(0, 10, by=0.1) y <- x + rnorm(length(x), mean=0, sd=0.1)
plot(x,y,xlim=c(0,10), ylim=c(-2,12),pch=18, col='red') lines(x,x,col='blue')
z <- x + rnorm(length(x), mean=0, sd=0.5) points(x,z,pch=18, col='green')
rn <- rnorm(1000, mean=2.0, sd=0.5) hist(rn)Try a boxplot of the same data. What are the values of each of the horizontal lines in the boxplot? (e.g. the center, top, and bottom of the box? the "wiskers" between the dashed vertical lines and the circles?)
interactive.hpc.virginia.edu:/nv/md_rdlib2/biol4230/data/rna-seq/GSE_FPKM.tabto your laptop.
read.table(file="GSE_FPKM.tab",sep="\t", header=T,row.names=NULL)(row.names=NULL is required to fix the "duplicate row names" problem)
MCF7.ave <- apply(GSE_FPKM[MCF7.gt10,2:4],1,mean)Here, the 1 tells apply to apply the mean() function across each row. You must do the same thing for var()
Put the answers and 'R' scripts in a new directory: ~/biol4230/hwk7/.
Identify the 5 most highly expressed genes (on average) for each of the three experimental replicates in GSE_FPKM (MCF.7, GM12892, and H1.hESC.
Due Tuesday, April 4 at 12 noon.