Integrating the Molecular Subtypes of Breast Cancer into a Novel Prognostic Model Haibe-Kains B1,2,*, Desmedt C1,*, Rothé F1, Piccart MJ1, Bontempi G2,#, and Sotiriou C1,# 1 Functional Genomics Unit, Institut Jules Bordet, Brussels, Belgium 2 Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium * Co-first authors, #Co-last authors. Subtype Prognostic Gene Signatures 1. Background early gene expression studies in breast cancer have provided a molecular classification of these tumors into at least three clinically relevant subtypes (ER-/HER2-, HER2+ or ER+/HER2). Our group recently introduced a robust method for subtype identification, exhibiting numerous advantages compared to the hierarchical clustering used in these initial publications [7, 1]. From a prognostic point of view, several signatures have been identified in the global population of patients. However, the majority of these only yielded good prognostic performance in the ER+/HER2subtype [7, 1]. Additionally, we showed that clinical and genomic prognostic factors dramatically depend on the molecular subtype of breast tumors. T HE In this work, we propose a novel prognostic model which takes into the account the molecular heterogeneity of breast cancer. Since we showed in [7, 1] that most of current gene signatures are only prognostic in the ER+/HER2subtype and that proliferation-related genes are their driving force, we did not generate a new prognostic signature for this subtype but considered instead the proliferation module (AURKA) introduced in [1], as subtype signature. In contrast, very few prognostic signatures have been reported thus far in the ER-/HER2- and HER2+ subtypes. Therefore, we identified two stable signatures composed of 63 and 22 genes for the ER-/HER2- and HER2+ subtypes respectively. Using these subtype prognostic gene signatures, we computed the GENIUS risk predictions for a set of 745 untreated node-negative patients retrieved from 5 public datasets. Performance Assessment and Comparison Subtype Clustering Subtypes identification Subtype clustering ER!/HER2!: GENIUS AOL NPI ER+/HER2- HER2+: subtype signature subtype signature subtype signature Model building Model building subtype risk score 0.2 0.4 0.5 0.6 0.7 0.2 0.8 subtype risk score k,l∈Ω wkl Combination risk predictions Figure 1: Design of GENIUS prognostic model. The final risk predictions were computed by combining the three subtype risk scores (Figure 1) in a local Model Network [4]: X r= ρj rj 0.4 0.5 0.6 0.7 0.8 concordance index Figure 5: Performance of risk score predictions for GENIUS and current prognostic gene signatures wrt the molecular subtype. The performance of the risk group predictions computed by GENIUS is illustrated in Figure 6. We observed a significant difference between the survival curves of low- and high-risk groups for both the global population (HR: 3.7; 95%CI [2.7,5]; p=1E-16) and all the subtypes: HRs of 3.7 (95%CI [2.5,5.5]; p=1E-10), 2.7 (95%CI [1.3,5.6]; p=7E-3) and 3.9 (95%CI [1.8,8.8]; p=8E-4) in the ER+/HER2-, ER-/HER2- and HER2+ subtypes, respectively. The probability of distant metastasis or relapse free survival of the low-risk group at 5 years was estimated at 91% in the global population, and 92%, 83% and 89% in the ER+/HER2-, ER-/HER2- and HER2+ subtypes, respectively. 461 237 448 206 3 435 178 4 5 6 Time (years) 416 159 397 143 352 135 7 314 120 8 273 107 9 244 100 10 213 85 1.0 HER2+ 0.8 Probability of survival 0.8 No. At Risk Low 472 High 252 2 0.6 0.2 HR=3.6, 95%CI [2.7,4.9], p!value=2.9E!16 1 Low High 0.0 0.0 Low High 0 0.4 Probability of survival 0.8 0.6 0.4 0.2 Probability of survival ER-/HER2- 1.0 ER+/HER2- 1.0 ALL The subtype prognostic gene signatures were therefore used to build the subtype risk prediction models such that, for a subtype j, the subtype risk score, denoted by rj , was defined as the weighted combination of all the gene expressions in the corresponding signature: P i∈Q wixi rj = nQ where Q is the set of genes in the prognostic signature specific to subtype j, nQ the number of genes in Q and wi is either +1 or −1 depending on the concordance index of gene i. 0.3 concordance index Figure 4: Performance of risk score predictions for GENIUS and prognostic clinical models wrt the molecular subtype. P where xi are the expressions of gene i, y are the survival data and wkl = ρj (xk )ρj (xl ) is the weight for the pair of comparable patients {k, l} with ρj (x) being the probability for a patient’s tumor x to belong to the subtype j. 0.3 0.082 0.077 Subtype prognostic gene signatures Model building subtype risk score GENIUS AOL NPI 1.0 AURKA 0.037 0.03 0.25 0.039 0.15 0.10 0.62 0.39 HR=3.9, 95%CI [2.6,5.8], p!value=6.6E!11 0 No. At Risk Low 374 High 129 1 370 125 2 363 111 3 354 97 4 5 6 Time (years) 338 88 323 79 282 75 7 247 66 0.8 Identification of prognostic genes GENIUS AURKA GGI STAT1 PLAU IRMODULE SDPP 8 213 64 9 191 62 10 Low High 166 51 HR=2.6, 95%CI [1.3,5.5], p!value=9.2E!03 0 No. At Risk Low 48 High 68 1 46 63 2 42 50 3 40 44 4 5 6 Time (years) 39 40 37 35 37 32 7 36 29 0.6 Identification of prognostic genes k,l∈Ω wkl 1{xki > xli} HER2+: ρ2 ρ1 0.043 0.19 0.0051 0.016 0.1 0.0037 0.44 0.028 0.4 HER2+ ρ3 ER!/HER2!: GENIUS AURKA GGI STAT1 PLAU IRMODULE SDPP 0.6 ER-/HER2- P Cwted(xi, y, ρj ) = ρ2 ER+/HER2!: GENIUS AOL NPI 0.25 0.27 2E-6 5E-9 0.065 0.094 0.4 ρ1 0.0013 0.02 ER+/HER2!: GENIUS AURKA GGI STAT1 PLAU IRMODULE SDPP 0.2 Stability-based feature ranking method [2] using a weighted version of the concordance index [3] as scoring function: GENIUS AOL NPI 0.012 0.018 5E-12 9E-14 1E-4 0.015 0.0 Subtype Prognostic Gene Signatures ALL: {ρ1, ρ2, ρ3} Model-based clustering (mixture of Gaussians) in a two-dimensional space defined by the ESR1 and ERBB2 module scores representing the ER and HER2 phenotypes respectively [1]. Once fitted to the training set, this clustering model returns a set of probabilities of a tumor belonging to each subtype, denoted by ρj where j ∈ {1=ER-/HER2-, 2=HER2+, 3=ER+/HER2-} are the subtypes. Test for GENIUS superiority Training set GENIUS AURKA GGI STAT1 PLAU IRMODULE SDPP 0.2 W ALL: Low High 0.0 developed a two-step prognostic model called GENIUS (Gene Expression progNostic Index Using Subtypes) as depicted in Figure 1: 1. Accurate assessment of the probabilities for a tumor to belong to each of the three breast cancer molecular subtypes (subtype clustering). 2. Combination of these probabilities with subtype-specific prognostic signatures (subtype prognostic gene signatures), which results in the final GENIUS risk predictions. E As sketched in Figure 5, GENIUS risk score predictions significntly outperformed current gene signatures in the global population of patients (Cindex of 0.71, superiority test p-values < 0.05) and was very competitive in each subtype (C-indices of 0.7, 0.66 and 0.66 in the ER+/HER2-, ER-/HER2and HER2+ subtypes, respectively). GENIUS also outperformed clinical guidelines (superiority test pvalues < 0.05) in most cases (Figure 4). Probability of survival 2. Methods Test for GENIUS superiority 8 31 23 9 27 20 10 21 16 HR=4.1, 95%CI [1.8,9.2], p!value=7.7E!04 0 No. At Risk Low 50 High 55 1 47 51 2 45 47 3 43 39 4 5 6 Time (years) 41 33 39 31 35 30 7 32 27 8 31 22 Figure 6: Survival curves of GENIUS risk group predictions wrt the molecular subtype. j 3. Results 4. Conclusions We focussed our analysis on untreated node-negative patients in order to build a prognostic model for early stage breast cancer and to avoid any confounding factors due to the treatment effects on survival (untreated). We used VDX [6, 5] as training set since this population contained the largest set of untreated patients with early breast cancer. novel prognostic model, which considers the molecular heterogeneity of breast cancer, outperforms the current clinical models and gene signatures. GENIUS was the only signature to be highly prognostic in all the molecular subtypes. Additionally, the modular architecture of the model allows plugging in any other gene expression signatures, potentially enlarging its usability. T HIS Subtype Clustering References [1] C. Desmedt, B. Haibe-Kains, P. Wirapati, et al. Biological Processes Associated with Breast Cancer Clinical Outcome Depend on the Molecular Subtypes. Clin Cancer Res, 14(16):5158–5165, 2008. doi:10.1158/1078-0432.CCR-074756. 2 Once subtype clustering was fitted on the training set (see Figures 2, we were able to robustly identify three subtypes: ER-/HER2- (99), HER2+ (54) and ER+/HER2- (191) (Figure 3). We validated the \end{minipage} robustness of this clustering model in 21 breast cancer microarray datasets (data not shown). 1 1.0 ER−/HER2− HER2+ ER+/HER2− [2] B. Haibe-Kains. Identification and Assessment of Dene Signatures in Human Breast Cancer. Ph.D. thesis, Université Libre de Bruxelles, 2009. y Densit [3] F. J. Harrell, K. Lee, and D. Mark. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med, 15(4):361–387, 1996. doi:10.1002/(SICI)10970258(19960229)15:4¡361::AID-SIM168¿3.0.CO;2-4. 1 ERB 0 B2 2 1 0 1 2 2 ESR [5] A. J. Minn, G. P. Gupta, D. Padua, et al. Lung metastasis genes couple breast tumor size and metastatic spread. Proceedings of the National Academy of Sciences, 104(16):6740–6745, 2007. doi:10.1073/pnas.0701138104. Patients 1 [6] Y. Wang, J. G. Klijn, Y. Zhang, et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet, 365:671–679, 2005. −2 1 [4] T. A. Johansen and B. A. Foss. Constructing NARMAX models using ARMAX models. International Journal of Control, 58:1125–1153, 1993. 0 0.0 2 −1 ERBB2 0.5 −2 −1 0 1 2 ESR1 Figure 2: The subtype clustering model is com- Figure 3: Identification of breast cancer molecular subtypes in the training set (VDX). posed of a mixture of three Gaussians. IMPAKT (ESMO) Breast 2009 [7] P. Wirapati, C. Sotiriou, S. Kunkel, et al. Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures. Breast Cancer Research, 10(4):R65, 2008. ISSN 1465-5411. doi:10.1186/bcr2124. 9 28 20 10 26 18