COMP-364: Tools for the Life Sciences

TRF 10:35am-11:25am

ENGTR – Trottier Building 2110
Jan 5th 2015 – April 14th 2015
Prof. Mike Hallett
Location: Bellini Building
Office Hours: HERE

Bioinformatics is the use of computation, statistics and mathematics to investigate problems and test hypotheses in biological systems and disease. This course aims to provide students from the life sciences and clinical studies (e.g. biology, cell biology, biochemistry, immunology, physiology) with instruction in the basics techniques of bioinformatics. The course makes extended use of bioinformatic applications related to breast cancer, since this disease has been extensively investigated using modern genomics and there is a rich toolkit of bioinformatic methods here.

The course assumes no previous experience in computer science, statistics or genomics, although a cursory knowledge certainly would assist here. Regardless, students will leave this course with the ability to program in R, a computer language specifically designed for statistics with a long history of application in Bioinformatics. Students will learn specific techniques in the analysis of DNA information (single nucleotide polymorphisms, copy number variations, chromosomal aberrations, association studies), RNA expression (class discovery, class distinction, class prediction), pathway analysis, survival analysis, and integration of different levels of gene and post-gene regulation. Within these applications, students will be introduced to some genomic technologies such as Next Generation Sequencing (NGS) with emphasis on DNA-, exome- and RNA-seq, microarrays, and protein expression arrays.

All of these concepts from bioinformatics are developed using tools from computation and statistics including programming, optimization, hypothesis testing, probabilistic models, association tests, and some simple basic statistical tests. Additional concepts from computer science include, basics of programming, software versioning systems (eg GIT), cloud computing, recursion, and introductory aspects of algorithm design.

Teaching Assistant
Daniel Del Balso
Room 432 Bellini Building
Office Hours: T,H 30 min. before the lecture, in the classroom. See emails for tutorials/special office hours.

Information regarding computer infrastructure for the course:

Course Notes

Links to Software and Tutorials

Textbooks, Manuals and On-line Courses (electronic)

Video Series and On-line Courses:


Related and Alternative Softwares

Course Evaluation:

  • Assignment 0 (due Friday, January 23rd, 2015) 10% of overall grade.
  • Assignment 1 (due Friday, February 13th, 2015) 10% of overall grade
  • Assignment 2 (due Tuesday, February 17th, 2015) 10% of overall grade
  • Midterm (February 27th, 2015) 20% of overall grade
  • Assignment 3 (due March 10th, 2015) 10% of overall grade
  • Assignment 4 (due March 24th, 2015) 10% of overall grade
  • Final Exam (TBA) 30% of overall grade

Module 1 – The Basics and Programming.

Lecture 1 What is bioinformatics? And some basic resources.

Links to related material:

Lecture 2 Breast cancer informatics: the example for the course.

Links to related material:

Lecture 3 R Basics: basic manipulation of data, vectors, strings.

Links to related material:


Lecture 4 R basics: data flow and data-structures.

Links to related material:

Lecture 5 R basics: functions, parameters, scoping, libraries and packages.

Concepts from Statistics used in Bioinformatics:

Lecture 6 Bioconductor: examples of BioC packages.

Links to related material:

Module 2 – RNA Level Analysis (of Breast Carcinoma)

Lecture 7 - The Bioinformatics of Next Generation Sequencing: RNA-seq

Links to Relevant Genomics:

Links to Bioinformatic Concepts:

Lecture 8 – Class Discovery

Links to Relevant Genomics: RNA-seq, microarrays

Links to Relevant Biology: Breast cancer subtypes

Links to Bioinformatic Concepts: distance measures (pearson correlation distance, Euclidean distance), clustering algorithms (Wards, X, k-means), measures of cluster quality

Lectures 9 – Class Distinction

Links to Bioinformatic Concepts:

  • linear models (LIMMA)
  • multiple testing revisited: Family-wise error rate (FWER) and false discovery rate (FDR)

Lectures 10 –  Class Prediction

Links to Relevant Biology:

Links to Bioinformatic Concepts:

  • Centroid-based methods
  • Naive Bayes’ classifiers
  • Cross-validation
  • Confounding in predictions

Lecture 11 - Survival Analysis and Related Techniques

Links to Relevant Biology:

  • Oncotype Dx: predicting benefit to chemotherapy

Links to Bioinformatic Concepts:

  • log-rank test
  • Cox-proportional hazards


Lectures 12 - Pathway Analysis Slides

Links to Relevant Biology:

  • Breast cancer subtypes

 Links to Bioinformatic Concepts:

Lecture 13 - R/Bioconductor: visualizations (plotting distributions, ggplot, heatmaps).

Links to related material:

with Sushi R package

Module 3 – DNA level analysis (of breast carcinoma)

Lecture 14  – Investigating Genomic Information.

Links to bioinformatics tools and concepts:

Links to relevant genomics:

Links to relevant biology and medicine:

Lecture 15 – The bioinformatics of next generation sequencing (NGS): DNA-seq

Links to Relevant Bioinformatics:

Links to Relevant Genomics:

Lecture 16 – Analysis of germline mutations: risk factors, association tests.

 Links to relevant biology:  

Links to Relevant Genomics:

  • Genome Wide Association Study (GWAS)

Links to bioinformatic and statistical concepts:  

Lecture 17 - Somatic mutations: probabilistic models.

Links to Relevant Biology:

  • Somatic mutations
  • chromosomal instability

Links to Bioinformatic Concepts:

  • MuTec
  • VarScan,
  • Probabilistic models.


Lecture 18 - Tumoral Heterogeneity & Evolution.

Links to related material:

  • molecular evolution
  • tumoral evolution
  • tumoral heterogeneity
  • tumoral phylogenies

Lecture 8 Computing in the clouds and beyond

Links to related material:

Module 4 – Other types of data & integration

Lecture 19 - Integration: DNA + RNA = ?

Links to Relevant Biology: chromosomal instability and gene expression, IntClusters

Links to Relevant Genomics: Curtis et al.

Links to Bioinformatic Concepts: ?


Lecture 20 - Epigenomics Slides

Links to Relevant Biology: epigenetic modifications

Links to Relevant Genomics:bisulphide sequencing

Links to Bioinformatic Concepts: Batman?


Lecture 21 - Protein Microarrays Slides

Links to Relevant Biology: Phosphorylation

Links to Relevant Genomics: protein microarrays

Links to Bioinformatic Concepts: ?


Lecture 22 - Chemical Genomics Slides

Links to Relevant Biology: IOC, dose-response, checkerboards etc.

Links to Relevant Genomics: Connectivity Map

Links to Bioinformatic Concepts: LINCS


Lecture 23 – Network Biology Slides

Links to Relevant Biology:  

Links to Relevant Genomics:

Links to Bioinformatic Concepts: