COMP-364: Tools for the Life Sciences


MTR 10:35am-11:25am

ENGTR – Trottier Building 2120
Jan 7th 2015 – April 15th 2015
Prof. Mike Hallett
Location: McIntyre Building 903
Office Hours: By appointment

Bioinformatics is the use of computation, statistics and mathematics to investigate problems and test hypotheses in biological systems and disease. This course aims to provide students from the life sciences and clinical studies (e.g. biology, cell biology, biochemistry, immunology, physiology) with instruction in the basics techniques of bioinformatics. The course makes extended use of bioinformatic applications related to breast cancer, since this disease has been extensively investigated using modern genomics and there is a rich toolkit of bioinformatic methods here.

The course assumes no previous experience in computer science, statistics or genomics, although a cursory knowledge certainly would assist here. Regardless, students will leave this course with the ability to program in R, a computer language specifically designed for statistics with a long history of application in Bioinformatics. Students will learn specific techniques in the analysis of DNA information (single nucleotide polymorphisms, copy number variations, chromosomal aberrations, association studies), RNA expression (class discovery, class distinction, class prediction), pathway analysis, survival analysis, and integration of different levels of gene and post-gene regulation. Within these applications, students will be introduced to some genomic technologies such as Next Generation Sequencing (NGS) with emphasis on DNA-, exome- and RNA-seq, microarrays, and protein expression arrays.

All of these concepts from bioinformatics are developed using tools from computation and statistics including programming, optimization, hypothesis testing, probabilistic models, association tests, and some simple basic statistical tests. Additional concepts from computer science include, basics of programming, software versioning systems (eg GIT), cloud computing, recursion, and introductory aspects of algorithm design.

Teaching Assistant
Daniel Del Balso
Room 903 McIntyre Building
Office Hours: T,H 30 min. before the lecture, in the classroom. See emails for tutorials/special office hours.

Information regarding computer infrastructure for the course:

Course Notes

Links to Software and Tutorials

Textbooks, Manuals and On-line Courses (electronic)

Video Series and On-line Courses:


Related and Alternative Softwares

Course Evaluation:

  • Assignment 0 (due Friday, January 23rd, 2015) 10% of overall grade.
  • Assignment 1 (due Friday, February 13th, 2015) 10% of overall grade
  • Assignment 2 (due Tuesday, March 12th, 2015) 10% of overall grade
  • Midterm (March 17th, 2015) 20% of overall grade
  • Assignment 3 (due March 31st, 2015) 10% of overall grade
  • Assignment 4 (due April 10th, 2015) 10% of overall grade
  • Final Exam (April 21) 30% of overall grade

Module 1 – The Basics and Programming.

Lecture 1: What is bioinformatics? And some basic resources.

Links to related material:

Lecture 2: Breast cancer informatics: the example for the course.

Links to related material:

Lecture 3: R, RStudio, and a Unix Primer

Lecture 4: R basics, data types, operators, c, sets, vectors, matrices, arrays, lists

Lecture 5: R factors, data.frames, conditional execution, looping

 Lecture 6: Gene expression in R
You will need hucMini.R available through GIT for M1.L6

Links to related material:

Lecture 7: Exploration of prognostic value of TP53 expression in breast cancer

  1. differential expression
  2. patient clinical outcome
  3. descriptive statistics
  4. hypothesis testing: t-test, Wilcoxon, Kolmogorov-Smirnoff tests in R

Lecture 8: R probability distributions

  1. Example using the normal distribution of dnorm, cnorm, qnorm, rnorm.
  2. Simple plotting: hist, plot, lines

Lecture 9: R functions, scoping and algorithmics

  1. Writing functions
  2. Variable scoping
  3. Top down recursive approaches (examples)
  4. Bottom up dynamic programming approaches (examples)

Lecture 10: In/out-put; packages; Bioconductor

  1. Reading and writing to and from R
  2. R libraries and packages
  3. The Bioconductor Project

Module 2 – RNA Level Analysis of Breast Carcinoma

Lecture 1: Class Discovery –  Discovery subtypes in breast cancer mRNA expression data.

  1. Gene clusters
  2. Patient clusters: M1.L2 Breast Cancer Subtypes
  3. Distant measures (Eucliean & Pearson Correlation Distance)
  4. k-means algorithm (Why multivariate (gene) analysis has clear advantages over single gene analysis)
  5. hierarchical clustering.
  6. Visualization through heatmaps.

You will need hucMini.R (heatmap.simple() code) and for this lecture (from GIT).

Lecture 2: Class Prediction – Classifying patient clinical outcome.

You will need naiveBayes.R for this (from GIT).

  • Centroid-based methods
  • Naive Bayes’ classifiers
  • Cross-validation
  • Confounding in predictions

 Lecture 3: Measuring performance: Brief introduction to survival analysis

  • True/False Negative/Positive
  • Accuracy; Product of Accuracy
  • Kaplan-Meier
  • log-rank test

Lecture 4: Pathway Analysis

  • Hypergeometric test
  • Fisher’s Exact Test
  • Kolmogorov-Smirnoff Test

Module 3 – DNA Level Investigations of Breast Carcinoma

Lecture 1: Cancer Genomes and Next Generation Sequencing (NGS)

Links to bioinformatics tools:

Links to relevant genomics:

Links to relevant biology and medicine:

  • Course from C Kim, K Haigis (MIT): Cancer.
  • Moncunill V et al. (2014) Comprehensive characterization of complex structural variations by directly comparing genome sequence reads. Nature Biotechnology. 32, 1106-1112. PMID:
  • Helleday T, Eshtad S, Nik-Zainal S (2014) Mechanisms underlying mutational signatures in human cancers. Nature Review Genetics 15, 585-598.

Lecture 2: Germline and Somatic Variations

  • MuTect
  • VarScan
  • SNVMix, JointSNVMix

Lecture 3: RNA-seq.

Lecture 4: Tumoral Heterogeneity, Clonal Complexity and Evolution.

Links to related material:

  • molecular evolution
  • tumoral evolution
  • tumoral heterogeneity
  • tumoral phylogenies