COMP-364: Tools for the Life Sciences

 Under Construction. To be offered in Winter Semester 2015.
TRF 10:35am-11:25am
ENGTR – Trottier Building 2110
Jan 5th 2015 – April 14th 2015 
Prof. Mike Hallett
Location: Bellini Building
Office Hours: HERE


Bioinformatics is the use of computation, statistics and mathematics to investigate problems and test hypotheses in biological systems and disease. This course aims to provide students from the life sciences and clinical studies (e.g. biology, cell biology, biochemistry, immunology, physiology) with instruction in the basics techniques of bioinformatics. The course makes extended use of bioinformatic applications related to breast cancer, since this disease has been extensively investigated using modern genomics and there is a rich toolkit of bioinformatic methods here.

The course assumes no previous experience in computer science, statistics or genomics, although a cursory knowledge certainly would assist here. Regardless, students will leave this course with the ability to program in R, a computer language specifically designed for statistics with a long history of application in Bioinformatics. Students will learn specific techniques in the analysis of DNA information (single nucleotide polymorphisms, copy number variations, chromosomal aberrations, association studies), RNA expression (class discovery, class distinction, class prediction), pathway analysis, survival analysis, and integration of different levels of gene and post-gene regulation. Within these applications, students will be introduced to some genomic technologies such as Next Generation Sequencing (NGS) with emphasis on DNA-, exome- and RNA-seq, microarrays, and protein expression arrays.

All of these concepts from bioinformatics are developed using tools from computation and statistics including programming, optimization, hypothesis testing, probabilistic models, association tests, and some simple basic statistical tests often used in bioinformatics.


TA Information here.


 

Links to Software and Tutorials

Alternatives and additional helpful items….

Textbooks, Manuals and On-line Courses (electronic)

Alternatives and additional helpful readings….

Video Series and On-line Courses:

Data

Related and Alternative Softwares


Course Evaluation:

Four assignments (10% each)

        • Assignment 1 (due date ?)
        • Assignment 2 (due date ?)
        • Assignment 3 (due date ?)
        • Assignment 4 (due date ?)

Midterm (20%) (date here) Final Exam (40%)


Syllabus *** Each lecture listed below represents approximately 1.5 hours of teaching.


Module 1 – The Basics and Programming.

Lecture 1 Overview of course & basic bioinformatic resources.

Class slides

Links to related material:


Lecture 2 Breast cancer informatics: the example for the course.

Class slides

Links to related material:


Lecture 3 R Basics: basic manipulation of data, vectors, strings.

Class slides

Links to related material:


Lecture 4 R basics: data flow and data-structures.

Class slides

Links to related material: 


Lecture 5 R basics: functions, parameters, scoping, libraries and packages.

Class slides

Concepts from Statistics used in Bioinformatics:


Lecture 6 Bioconductor: examples of BioC packages.

Class slides

Links to related material: 


Lecture 7 R/Bioconductor: visualizations (plotting distributions, ggplot, heatmaps).

Class slides

Links to related material:


Lecture 8Computing in the clouds and beyond

Class slides

Links to related material:


Module 2- DNA level analysis (of breast carcinoma)

Lecture 9Investigating Genomic Information.

Class slides

Links to bioinformatics tools and concepts:

Links to relevant genomics:

Links to relevant biology and medicine: 


Lecture 10The bioinformatics of next generation sequencing (NGS): DNA-seq

Class slides

Links to Relevant Bioinformatics: 

Links to Relevant Genomics: 


Lecture 11  Analysis of germline mutations: risk factors, association tests.

Class slides

 Links to relevant biology:  

Links to Relevant Genomics: 

  • Genome Wide Association Study (GWAS)

Links to bioinformatic and statistical concepts:  


Lecture 12 Somatic mutations: probabilistic models.

Slides (link to slides here)

Links to Relevant Biology:

    • Somatic mutations
    • chromosomal instability 

Links to Bioinformatic Concepts:

    • MuTec
    • VarScan,
    • Probabilistic models.

 

Lecture 13Tumoral Heterogeneity & Evolution.

Slides (link to slides here)

Links to related material:

    • molecular evolution
    • tumoral evolution
    • tumoral heterogeneity
    • tumoral phylogenies

Module 3 – RNA Level Analysis (of Breast Carcinoma)

Lecture 14 Title: The Bioinformatics of Next Generation Sequencing: RNA-seq

Class slides

Links to Relevant Genomics: 

Links to Bioinformatic Concepts:


Lectures 15, 16 Title: Class Discovery

Slides (link to slides here)

Links to Relevant Genomics: RNA-seq, microarrays

Links to Relevant Biology: Breast cancer subtypes

Links to Bioinformatic Concepts: distance measures (pearson correlation distance, Euclidean distance), clustering algorithms (Wards, X, k-means), measures of cluster quality


Lectures 17Class Distinction

Slides (link to slides here)

Links to Bioinformatic Concepts:

    • linear models (LIMMA)
    • multiple testing revisited: Family-wise error rate (FWER) and false discovery rate (FDR)

 


Lectures 18, 19Class Prediction

Class slides

Links to Relevant Biology:

Links to Bioinformatic Concepts:

    • Centroid-based methods
    • Naive Bayes’ classifiers
    • Cross-validation
    • Confounding in predictions

 


Lecture 20 Title: Survival Analysis and Related Techniques

Slides (link to slides here)

Links to Relevant Biology:

    • Oncotype Dx: predicting benefit to chemotherapy

Links to Bioinformatic Concepts:

    • log-rank test
    • Cox-proportional hazards

 

Lectures 21, 22 Title: Pathway Analysis Slides (link to slides here) Links to Relevant Biology: Breast cancer subtypes

 

Links to Bioinformatic Concepts:


Module 4 – Other types of data & integration 

Lecture 22 Title: Integration: DNA + RNA = ? Slides (link to slides here) Links to Relevant Biology: chromosomal instability and gene expression, IntClusters Links to Relevant Genomics: Curtis et al. Links to Bioinformatic Concepts: ?

 


Lecture 23 Title: Epigenomics Slides (link to slides here) Links to Relevant Biology: epigenetic modifications Links to Relevant Genomics: bisulphide sequencing Links to Bioinformatic Concepts: Batman?

 


Lecture 24 Title: Protein Microarrays Slides (link to slides here) Links to Relevant Biology: Phosphorylation Links to Relevant Genomics: protein microarrays Links to Bioinformatic Concepts: ?

 


Lecture 25 Title: Chemical Genomics Slides (link to slides here) Links to Relevant Biology: IOC, dose-response, checkerboards etc. Links to Relevant Genomics: Connectivity Map Links to Bioinformatic Concepts: LINCS


 

Lecture 26

Title: Network Biology Slides (link to slides here) Links to Relevant Biology:  Links to Relevant Genomics: 

Links to Bioinformatic Concepts: