Biology 4230/7230 -- Bioinformatics and Functional Genomics

Time: Tues/Thurs 9:30 - 10:50 -- Ruffner Hall Rm 174
Fri: 9:00 - 10:00 Chemistry Rm 411

Professor: William Pearson
Office: Jordan Hall 6-057

Course Description:

The Bioinformatics and Functional Genomics course will introduce the computational, statistical, evolutionary, and genetic concepts at the foundation of modern genome analysis, and address research problems in gene and genome structure using popular computer programs and biological and genome databases. The goal of the course is to introduce students to the computer algorithms, statistical approaches, and biology that interact to allow biologists to make inferences from large genomics datasets, so that they have a clear understanding of the foundations of computational approaches, which will be extended to gain practical experience addressing biological questions using genome datasets.

Students will become familiar with the Linux command-line environment and learn simple programming/scripting skills, which will allow them to perform medium-scale analysis of sequence biological data. The first part of the course will focus on similarity searching, homology, and phylogenetic reconstructions, combining programming and algorithms with evolutionary biology and protein structure. This material is covered by Part I of the Pevner textbook.

The second part of the course will focus on functional/expression analysis at the genomic level. Strategies for quantifying differential gene expression will be explored, leading to micro-array expression and RNA-seq expression analysis. Coordinately expressed gene-sets will then be used to explore methods for biologial pathway analysis, and identification of regulatory signals.

The course will focus on fundamental concepts in biological sequence alignment, statistics, evolution and phylogenetics, motif finding, gene structure and expression, and biological pathway analysis, with emphasis on understanding the strengths and weaknesses of different analysis strategies.

Course prerequisites:

This course will present a brief introduction to programming (Python) to facilitate programmed, reproducible, medium scale analysis of protein families and genome features. It is open to students coming with computing, statistical, chemistry, and life science interests. It does not assume a knowledge of the Linux command line and programming -- those topics will be taught -- but Linux/Unix command line experience will be helpful. It does assume some knowledge of basic molecular biology, the central dogma, the building blocks of DNA and proteins, and the structure of genes in prokaryotes and eukaryotes. Students applying for permission to enroll will be asked to fill out a brief form outlining their programming experience (if any), and their biology (possibly high school) course work. Mathematical and statistical concepts will not require calculus, but will require comfort with advanced algebra.

Learning Objectives:

After taking this class students will be able to:

  1. effectively use similarity searching tools to identify homologous protein and DNA sequences, understanding the use of statistical significance and expectation values.
  2. write short Python programs to automatically characterize protein sets using web-resources
  3. critically evaluate inferences of homology, evolutionary tree topology, and functional changes
  4. map RNA datasets onto sequenced genomes to examine expression and alternative splicing
  5. use the "R" statistics program and the Bioconductor package to quantify changes in gene expression on biological pathways

Course Format:

The course will be a hybrid lecture/lab course, with two 1.5 hr lectures on programming and a 1 hr "lab/discussion" each week.

Required Text:


Recommended Texts:

Bioinformatics and Functional Genomics, 2nd Ed. (2009) J. Pevsner, Wiley ISBN: 0470085851
Practical Computing for Biologists (2010) S. Haddock and C. Dunn, Sinauer ISBN: 0878933913

Activities and assignments:

Progress through the material will be evaluated with exams (1/3), problem sets (1/3), and final projects (1/3). There will two exams, approximately half-way through each half semester, that will cover the foundational biological and bioinformatic concepts covered early in the course. There will be two two major projects presented before mid-term and at the end of the term. The first project will start with similarity searching and ending with construction and evaluation of evolutionary trees; the second will address a functional genomics problem (RNA expression/motif finding). Programming problem sets will be assigned every week, with the Friday lab/discussion section devoted to addressing the programming and conceptual challenges posed by the problem set.

Grade composition:

First/third quarter exams: 33.3%
Weekly problem sets: 33.3%
Final projects (2): 33.3%


You are expected to attend all of the lectures for the course, and participate in the lab/discussion sections. If you have to miss a class, please notify me a before the planned absence.