Under construction ….

The researchers within our group have been primarily interested in advancing our knowledge of breast cancer through the use of bioinformatics, systems epidemiology, and systems biology. We describe below the different components, questions and end-points of our research below.

Fundamental Questions of Breast Cancer Informatics

Prognosis and Response to Therapy.

There has been a sustained effort to identify markers of prognosis in women diagnosed with invasive breast cancer (IBC), a highly prevalent disease that accounts for 14% of all cancer deaths in women. The estimation of prognosis at time of diagnosis relies primarily upon clinicopathological parameters such as tumor size, histological grade, stage,lymph node (LN) infiltrate and molecular properties including expression of the Estrogen Receptor (ER), Progesterone Receptor (PR), and the Human Epidermal growth factor Receptor 2 (HER2) (Reis-Filho and Pusztai, 2011). Prognostic insight can in turn provide predictive insight with respect to patient benefit from chemo-, endocrine and targeted therapies.

Many (>100) gene signatures have been reported to have prognostic capacity in IBC. The signatures were typically derived by either manual curation of specific molecular processes (e.g. Antigen Presentation and Processing pathway, APP), from experimental perturbations in cell lines and transgenic mouse models of the disease (e.g. the Her2- and Met-related transgenic mice of our collaborators W Muller & M Park @ McGill), or  directly from gene expression profiles of IBC samples by contrasting observed good and poor outcome patients (e.g. our work with gene expression profiles of the tumor microenvironment of IDC).

A classification of good prognosis by a signature suggests that an individual will respond positively to standard-of-care for their tumor type, whereas a prediction of poor prognosis suggests the need for an alternative regimen.

The BCI has long been interested in the identification of prognostic gene signatures. In many cases, our gene signatures are first derived in model systems (e.g. transgenic murine models) and then cross-checked in human clinical samples of IDC.

The marginal overlap and the differences in the underlying biological processes polled by these signatures have generated criticisms regarding experimental design and lack of standardized bioinformatics techniques (Ioannidis et al., 2009), leading to speculation that almost all genes and processes are prognostic in IBC (Venet et al., 2011). However, the root cause of the myriad of dissimilar signatures may be primarily due to deep, structural interdependencies between a patient’s clinicopathological profile, tumor subtype and prognosis (Iwamoto and Pusztai, 2010).

We have been actively publishing work that attempts to clarify the nature and ubiquity of these structural interdependencies between patient attributes, tumor subtype, and prognosis. 

Figure 3

A comparison of over 100 prognostic gene signatures across thousands of invasive ductal carcinoma. The blue represent good outcome while red represents bad outcome. Dark shading means that signature got that patient right (True Positives and Negatives) while light shading indicates an incorrect prediction (False Positive and Negatives). Clinical variables, treatment and other information is provided below. The gray shaded region delimits signatures that did not out perform random gene sets. Notice how some patients are almost always predicted correctly whereas others are almost always predicted incorrectly.

For example, in Suderman, Tofigh et al. 2014, the predictions made by published prognostic gene signatures for IDC are compared en masse patient by patient, in order to identify if and where additional progress is possible. Using essentially all available data, this investigation establishes that there is large confounding in existing signatures between clinicopathological variables, subtype and clinical outcome. We establish that approximately 20% of IDC patients have a prognosis which appears inherently difficult to predict at time of diagnosis.

Predicting patient benefit and response for a specific to therapy.

One way that a prognostic signatures can be “converted” to response to therapy predictors via the so-called “Paik trick”:

  1. Use patient outcome (e.g. 5 year survival without distal recurrence) as the measure.
  2. Identify a gene signature and build a classifier (e.g. MammaPrint, Oncotype DX).
  3. Place all patients that are predicted as poor outcome into one cohort (High Risk).
  4. Place all patients that are predicted as good outcome into a second cohort (Low Risk).
  5. Within each cohort separately, perform survival analysis (e.g. KaplanMeier curves) examining the differences between treatment and no treatment arms.

[van de Vijver, M. J., He, Y. D., van t Veer, L. J., Dai, H., Hart, A. A. M., Voskuil, D. W., et al. (2002). A GeneExpression Signature as a Predictor of Survival in Breast Cancer. New England Journal of Medicine , 347 (25), 1999–2009.]

For both the Paik et al. (aka Oncotype DX) and van’t Veer (aka MammaPrint) gene signatures, the “Paik trick” works. In the case of Oncotype DX, the survival curves in the Low Risk cohort are indistinguishable between TAM+CHEMO vs TAM. Therefore there is a reason not to give chemo to Low Risk patients. Symmetric logic is used to analyze the High Risk cohort finds a survival difference between TAM+CHEMO vs TAM and the former has a better survival rate. Therefore, the additional treatment is warranted.

Clinical utility for a few prognostic signatures has been established and translated to the clinic (Hornberger et al., 2012) including, for example, OncotypeDX that assists in determining which ER+/LN- patients may benefit from additional adjuvant chemotherapy (Paik et al., 2004).

We have a long standing interest in the development of signatures that are predictive of response or benefit from a specific therapy.

This includes collaborations with Dr. S. Mader at the University of Montreal that investigate signatures for response to Tamoxifen or alternative aromatase inhibitors. On our resource (Bresect), we provide comparisons of survival characteristics between different therapy regimens within low and high risk groups as defined by each signature that has been shown to be prognostic in IDC with every treatment for which we have data.

Characteristics of the tumor such as lymph node status, stage, and grade have long been examined for their prognostic capacity either in isolation or in combination with molecular markers such as ER, PR, Her2 or Ki67. Similarly, characteristics of patients such as age and BMI have also been incorporated into prognostic and predictive models. Especially now with the advent of next generation sequencing (NGS) of tumors, germline and sporadic polymorphisms and copy number variations are being heavily investigated for their prognostic and predictive potential. However, there are many additional types of information including patient exposure and lifestyle that have largely gone unobserved in the vast majority of breast cancer genomic studies to date. Epidemiology has long observed that breast tumorigenesis is likely due in the majority of cases to specific aspects of patient lifestyle and exposure. The cause of this over-sight in studies to date may be due to the difficulty of designing and collecting sufficient population-based data and due to costs of these large studies.

The BCI is interested in the design, execution and analysis of studies that incorporate patient lifestyle and exposure information towards prognostic and predictive clinical end-points. 

Subtyping and Patient Stratification.

Arguably, the primary contribution of breast cancer genomics to date has been a deeper appreciation of IBC heterogeneity (see for example Weigelt et al., 2010). Although the four clinical subtypes defined by ER and HER2 status have long been recognized as distinct forms of the disease, early genomic studies underscored their vast differences at the molecular level (e.g. Gruvberger et al., 2001; Perou et al., 2000) and stimulated work to identify other markers that capture proliferation, progenitor cell properties, androgen-receptor related signaling and other many biologies (Desmedt et al., 2008; Guedj et al., 2012; Rakha et al., 2010).

Unbiased bioinformatic analyses of profiles generated the so-called intrinsic subtyping scheme, consisting of two subtypes enriched for ER+ tumors (luminal A and luminal B), a HER2+ enriched subtype (her2-enriched), a ER-/HER2- enriched subtype (basal-like), and a so-called normal-like subtype (Perou et al., 2000; Sorlie et al., 2001). Since the original publications, the intrinsic subtyping scheme has been refined several times (e.g. Parker et al., 2009) and extended to include the claudin low (CL) class of tumors. CL tumors display a high frequency of metaplastic and medullary differentiation (Prat et al., 2010).

Figure 1

Moreover, other genomics-based subtyping schemes have been proposed including the CIT scheme (Guedj et al., 2012), a scheme specific to “triple negative” (ER-/Her2- and Progesterone Receptor PR-) tumors (Lehmann et al., 2011), a scheme based on joint DNA and RNA copy number (Curtis et al., 2012), and others (Haibe-Kains et al., 2012; Jonsson et al., 2010; Wirapati et al., 2008).

The BCI has been very interested in the delineation and exploration of subtyping schemes, and comparisons between the schemes.  

For example, we produced the first stromal subtypes (Finak et al. unpublished but available from our resource page), derived from microarray-based gene expression profiles from laser capture microdissected non-epithelial components of tumors. The subtypes have some relationships with the intrinsic subtypes (which were derived in bulk expression profiles) but some components of the scheme appear to be determined uniquely by the stromal compartments of the tumor. More recently, we presented the so-called hybrid subtypes that combine the clinical (ER, Her2 based) and intrinsic subtypes (Suderman, Tofigh 2014 in revision). 

Recently, we show that the subtype assigned a patient via tools such as PAM50 is greatly influenced by the other patients within the dataset. For example, a specific patient’s subtype might change from luminal A to basalL if overall the percentage of ER+ patients in the dataset decreases. In order to ablate such “relativistic” effects, we developed a bioinformatics strategy entitled AIMS (Absolute assignment of breast cancer Intrinsic Molecular Subtype) that assigns a patient’s subtype without the use of a large panel of additional patient profiles (as is typically the case). In particular, we show that our approach that looks only at a single patient is absolute (it does not suffer from the relativistic problems that PAM50 and other subtyping tools do) and it is accurate (i.e. it assigns patient subtype correctly).

Clonal complexity and evolution.

The seminal manuscripts from the Sanger by NikZainal et al. (A life history of 21 breast cancers. Cell. 2012 May 25;149(5):9941007) and now several additional, independent manuscripts have used DNA- and exome-seq to examine intratumoral heterogeneity. The basic idea is to sequence to great depth and identify low penetrant somatic mutations (germline mutations are removed by sequencing matched blood or adjacent normal tissue as in our study).

These low penetrant somatic mutations (well below 50% in many cases) are believed to exist in only a small fraction of the cells in the sample sequenced. They are then analyzed by bioinformatics and by hand in terms of frequency, chromosomal position, types of genes involved in the CNV events, etc. to estimate a phylogenetic past for the tumors.

The BCI is interested in the tumoral evolution and clonal complexity of breast lesions. We are particularly focused on understanding the evolution of precursors of breast cancer towards invasive states.


Previous efforts related to cancer diagnosis have searched for biomarkers in surrogate tissues such as blood, by measuring the presence/absence of circulating tumor cells or specific macromolecules secreted from tumor cells. To date, such approaches lack sensitivity and the high rate of false negatives is likely due to the technical challenges associated with detecting an extremely rare species of cell or molecule within the complex heterogeneous tissue of blood. These challenges are likely to remain formidable for both proteomic- and metabolomics-based studies in the foreseeable future. Transcriptional profiling of blood was proposed as an alternative approach, since blood pervades the entire body and is in a constant state of renewal. It is the vehicle by which immune cells circulate between central and peripheral lymphoid organs, and migrate to and from tumor sites. From a technical perspective, the chemical uniformity afforded by RNAs offers clear advantages over protein- or metabolite-based biomarkers studies. Several groups including ours (Dumeaux et al. 2010) have defined intra- and inter-individual variability of blood gene expression in healthy individuals and established standardized procedures for blood sample collection and gene expression profiling (Dumeaux et al. 2008).


(c) Copyright Dennis Kunkel Microscopy. Human red blood cells, platelets (green), and T lymphocytes (orange)

Recently, we showed that we can harness the global molecular response in the patient’s blood cells as a screening strategy and presented a method for identifying early breast cancer based on the expression of a small number of genes (n=50) in blood from women (Dumeaux et al. 2014). Importantly, the genes used to detect the presence of a tumor provided mechanistic insight into why gene expression of blood cells is perturbed in individuals harboring a tumor, and suggest which molecular mechanisms are involved in this process emerging connections with immune escape of IBC.

Some questions that remain unanswered are how molecular processes are repressed in circulating blood cells by the presence of a specific breast cancer and how early in tumor development gene expression changes in blood cells can be detected. Further analyses of gene expression profiles in the matched tumor tissue as well as in blood samples collected within 5 years prior diagnosis have started and hopefully would clarify those questions. 

Types of Breast Lesions and Their End-points

Early precursors of breast cancer.

IBC represents the endpoint of a developmental process that is believed in some cases to progress through stages of increasing proliferation, atypical hyperplasia and carcinoma in situ. Detection of these non-invasive lesions (atypical or in situ) has increased by approximately three-fold due to the use of widespread screening programs and has similarly increased the number of diagnostic biopsiesAlthough both types of lesions are associated with a higher risk of developing IBC, a large fraction of these abnormalities are not direct precursors of invasive disease and will never become life threateningPredictors of progression would be useful for both women and their clinicians when deciding about their IBC risk reduction interventions – ranging from active surveillance to surgery with radiotherapy or mastectomy.

The BCI aims at gaining molecular insights into the origins and behavior of non-invasive breast lesions in order to distinguish the higher-risk minority of patients, who are destined to develop IBC and thus need aggressive treatment and follow-up, from the lower-risk majority of patients, who could be directed to more limited interventions.

Atypical ductal hyperplasia and In situ lesions: Progression.

Little is known regarding molecular heterogeneity outside of IBC, especially with respect to atypical lesions. Comparative studies including ours (Muggerud et al. 2010) have shown that in situ carcinoma (DCIS) has a range of molecular subtypes similar to those identified through gene expression profiling of IBC that might be associated with different invasive potentials. For example, DCIS with basal marker expression may give rise to IBC at a higher frequency than does DCIS exhibiting Her2 overexpression. Thus, the apparent imbalance between DCIS and IDC in molecular subtype prevalences suggests that there is no simple linear relationship of progression from DCIS to IBC.

Although valuable, RNA profiling studies to define molecular subtypes typically fail to address intralesional heterogeneity, or the possibility of more than one subtype within a single lesion. A more thorough analysis of intra-lesional heterogeneity by exome sequencing might identify key drivers mutation useful in stratifying risk for future IBC formation. These mutations should be relatively rare and, thus, theoretically easier to target therapeutically than the huge number of mutations found in IBC.

Furthermore, the primary phenotypic consequence of genetic aberrations in non-invasive breast lesions is the evolution from a polyclonal to monoclonal population of abnormal epithelial cells, which detach from the basement membrane and grow on top of each other, distending ducts or lobules. Accurate models of progression may depend on consideration of the presence and evolution of clonal diversity and this can be estimated by deep genomic sequencing. Therefore accurate models of progression may depend on consideration of the presence and evolution of clonal diversity.

The BCI is working with a substantial group of additional scientists and clinicians from McGill (Hallett, Basik, Boileau) and the University of Montreal (Trop, Gaboury, Mader, Robidoux, Mesurolles) to build ways to differentiate between breast lesions that are indolent versus aggressive, and to understand why certain non-invasive forms of the disease do or do not respond to preventive therapies (that prevent the development of an invasive breast cancer).

Invasive tumors.

 Integrating Information from Different Cancer-associated Tissues.

The primary breast tumor, its microenvironment, and the patient’s macro-response.


Tumor progression can only be explained by a detailed understanding of both paracrine and systemic signalling cascades that can be captured by blood-based gene signatures.

BC research has largely focused on the intrinsic properties of the tumor proper (TP) in order to develop therapies (eg Tamoxifen) targeted against key molecular components (eg estrogen receptor, ER) that drive progression within the neoplastic epithelial cells. However, cancers are not an autonomous mass of epithelial cells: they are multicellular systems capable of bidirectional interactions with neighboring non-malignant cells and extracellular components ie the microenvironment (ME). Such TP and ME (TP:ME) interactions impact tumor progression and drug sensitivity through classic paracrine signaling pathways (eg TGF-beta) and oncogenic pathways (eg RAS).

Similarly, most BC genomics studies have profiled bulk tumor capturing the TP with some ME (TME). However, the best prognostic and predictive TME-based or ME-based classifiers appear incapable of achieving an accuracy above 75%. Our most recent effort identified a subset of ~20% BC patients (of 5K total) whose prognosis is impossible to estimate at time of diagnosis given data from only the patient’s TME (Suderman, Tofigh et al. 2014). These and other indicators support the need to move beyond the TME: ER+ BC tends to recur as long as 10-15 years after surgical removal of the TME; success rate in translating drug discoveries from model systems that do not recapitulate the patient’s context is low (10%).

The impact of the systemic response (SR) in cancer is now increasingly acknowledged (Egeblad et al. 2010; McAllister & Weinberg 2010; DeFilippis & Tlsty 2012): an “instigating” BC establishes a systemic environment that activates otherwise-indolent disseminated tumor cells and is consistent with clinical observations. Importantly, the SR likely encodes for patient attributes that have largely been ignored to date eg exposures, lifestyle, and genotypic factors.

We have contributed to the understanding that many aspects of tumor progression and ultimately patient prognosis require an understanding of both paracrine and systemic signalling cascades, and that the tumor proper and the TME contain only a fraction of this information.

TME:SR interactions generated by our lab will provide such a means by unravelling their molecular basis and involvement in determining patient prognosis.

Screen Shot 2014-06-19 at 3.40.55 PM

Screenshot of software for investigating genes and pathways that appear to interact between the patient’s blood cells (top) and its tumor (bottom).