Integrating the Molecular Subtypes of Breast Cancer

publicité
Integrating the Molecular Subtypes of Breast Cancer
into a Novel Prognostic Model
Haibe-Kains B1,2,*, Desmedt C1,*, Rothé F1, Piccart MJ1, Bontempi G2,#, and Sotiriou C1,#
1
Functional Genomics Unit, Institut Jules Bordet, Brussels, Belgium
2
Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
*
Co-first authors, #Co-last authors.
Subtype Prognostic Gene Signatures
1. Background
early gene expression studies in breast cancer have provided a molecular classification of
these tumors into at least three clinically relevant subtypes (ER-/HER2-, HER2+ or ER+/HER2).
Our group recently introduced a robust method for subtype identification, exhibiting numerous advantages compared to the hierarchical clustering used in these initial publications [7, 1].
From a prognostic point of view, several signatures have been identified in the global population of
patients. However, the majority of these only yielded good prognostic performance in the ER+/HER2subtype [7, 1]. Additionally, we showed that clinical and genomic prognostic factors dramatically
depend on the molecular subtype of breast tumors.
T
HE
In this work, we propose a novel prognostic model which takes into the account the molecular heterogeneity of breast cancer.
Since we showed in [7, 1] that most of current gene signatures are only prognostic in the ER+/HER2subtype and that proliferation-related genes are their driving force, we did not generate a new prognostic signature for this subtype but considered instead the proliferation module (AURKA) introduced
in [1], as subtype signature.
In contrast, very few prognostic signatures have been reported thus far in the ER-/HER2- and HER2+
subtypes. Therefore, we identified two stable signatures composed of 63 and 22 genes for the
ER-/HER2- and HER2+ subtypes respectively.
Using these subtype prognostic gene signatures, we computed the GENIUS risk predictions for a set
of 745 untreated node-negative patients retrieved from 5 public datasets.
Performance Assessment and Comparison
Subtype Clustering
Subtypes
identification
Subtype
clustering
ER!/HER2!: GENIUS
AOL
NPI
ER+/HER2-
HER2+:
subtype
signature
subtype
signature
subtype
signature
Model
building
Model
building
subtype
risk score
0.2
0.4
0.5
0.6
0.7
0.2
0.8
subtype
risk score
k,l∈Ω wkl
Combination
risk predictions
Figure 1: Design of GENIUS prognostic model.
The final risk predictions were computed by combining the three subtype risk scores (Figure 1) in a
local Model Network [4]:
X
r=
ρj rj
0.4
0.5
0.6
0.7
0.8
concordance index
Figure 5: Performance of risk score predictions
for GENIUS and current prognostic gene signatures wrt the molecular subtype.
The performance of the risk group predictions computed by GENIUS is illustrated in Figure 6.
We observed a significant difference between the survival curves of low- and high-risk groups for
both the global population (HR: 3.7; 95%CI [2.7,5]; p=1E-16) and all the subtypes: HRs of 3.7
(95%CI [2.5,5.5]; p=1E-10), 2.7 (95%CI [1.3,5.6]; p=7E-3) and 3.9 (95%CI [1.8,8.8]; p=8E-4) in the
ER+/HER2-, ER-/HER2- and HER2+ subtypes, respectively. The probability of distant metastasis or
relapse free survival of the low-risk group at 5 years was estimated at 91% in the global population,
and 92%, 83% and 89% in the ER+/HER2-, ER-/HER2- and HER2+ subtypes, respectively.
461
237
448
206
3
435
178
4
5
6
Time (years)
416
159
397
143
352
135
7
314
120
8
273
107
9
244
100
10
213
85
1.0
HER2+
0.8
Probability of survival
0.8
No. At Risk
Low 472
High 252
2
0.6
0.2
HR=3.6, 95%CI [2.7,4.9], p!value=2.9E!16
1
Low
High
0.0
0.0
Low
High
0
0.4
Probability of survival
0.8
0.6
0.4
0.2
Probability of survival
ER-/HER2-
1.0
ER+/HER2-
1.0
ALL
The subtype prognostic gene signatures were therefore used to build the subtype risk prediction models such that, for a subtype j, the subtype risk score, denoted by rj , was defined as the weighted
combination of all the gene expressions in the corresponding signature:
P
i∈Q wixi
rj =
nQ
where Q is the set of genes in the prognostic signature specific to subtype j, nQ the number of genes
in Q and wi is either +1 or −1 depending on the concordance index of gene i.
0.3
concordance index
Figure 4: Performance of risk score predictions
for GENIUS and prognostic clinical models wrt the
molecular subtype.
P
where xi are the expressions of gene
i, y are the survival data and wkl =
ρj (xk )ρj (xl ) is the weight for the pair of
comparable patients {k, l} with ρj (x) being the probability for a patient’s tumor x
to belong to the subtype j.
0.3
0.082
0.077
Subtype prognostic
gene signatures
Model
building
subtype
risk score
GENIUS
AOL
NPI
1.0
AURKA
0.037
0.03
0.25
0.039
0.15
0.10
0.62
0.39
HR=3.9, 95%CI [2.6,5.8], p!value=6.6E!11
0
No. At Risk
Low 374
High 129
1
370
125
2
363
111
3
354
97
4
5
6
Time (years)
338
88
323
79
282
75
7
247
66
0.8
Identification
of prognostic
genes
GENIUS
AURKA
GGI
STAT1
PLAU
IRMODULE
SDPP
8
213
64
9
191
62
10
Low
High
166
51
HR=2.6, 95%CI [1.3,5.5], p!value=9.2E!03
0
No. At Risk
Low 48
High 68
1
46
63
2
42
50
3
40
44
4
5
6
Time (years)
39
40
37
35
37
32
7
36
29
0.6
Identification
of prognostic
genes
k,l∈Ω wkl 1{xki > xli}
HER2+:
ρ2
ρ1
0.043
0.19
0.0051
0.016
0.1
0.0037
0.44
0.028
0.4
HER2+
ρ3
ER!/HER2!: GENIUS
AURKA
GGI
STAT1
PLAU
IRMODULE
SDPP
0.6
ER-/HER2-
P
Cwted(xi, y, ρj ) =
ρ2
ER+/HER2!: GENIUS
AOL
NPI
0.25
0.27
2E-6
5E-9
0.065
0.094
0.4
ρ1
0.0013
0.02
ER+/HER2!: GENIUS
AURKA
GGI
STAT1
PLAU
IRMODULE
SDPP
0.2
Stability-based feature ranking method [2]
using a weighted version of the concordance index [3] as scoring function:
GENIUS
AOL
NPI
0.012
0.018
5E-12
9E-14
1E-4
0.015
0.0
Subtype Prognostic Gene Signatures
ALL:
{ρ1, ρ2, ρ3}
Model-based clustering (mixture of Gaussians) in a two-dimensional space defined
by the ESR1 and ERBB2 module scores
representing the ER and HER2 phenotypes respectively [1]. Once fitted to the
training set, this clustering model returns
a set of probabilities of a tumor belonging
to each subtype, denoted by ρj where j ∈
{1=ER-/HER2-, 2=HER2+, 3=ER+/HER2-}
are the subtypes.
Test for GENIUS
superiority
Training set
GENIUS
AURKA
GGI
STAT1
PLAU
IRMODULE
SDPP
0.2
W
ALL:
Low
High
0.0
developed a two-step prognostic model called GENIUS (Gene Expression progNostic Index
Using Subtypes) as depicted in Figure 1:
1. Accurate assessment of the probabilities for a tumor to belong to each of the three breast cancer
molecular subtypes (subtype clustering).
2. Combination of these probabilities with subtype-specific prognostic signatures (subtype prognostic
gene signatures), which results in the final GENIUS risk predictions.
E
As sketched in Figure 5, GENIUS risk score predictions significntly outperformed current gene signatures in the global population of patients (Cindex of 0.71, superiority test p-values < 0.05) and
was very competitive in each subtype (C-indices of
0.7, 0.66 and 0.66 in the ER+/HER2-, ER-/HER2and HER2+ subtypes, respectively). GENIUS also
outperformed clinical guidelines (superiority test pvalues < 0.05) in most cases (Figure 4).
Probability of survival
2. Methods
Test for GENIUS
superiority
8
31
23
9
27
20
10
21
16
HR=4.1, 95%CI [1.8,9.2], p!value=7.7E!04
0
No. At Risk
Low 50
High 55
1
47
51
2
45
47
3
43
39
4
5
6
Time (years)
41
33
39
31
35
30
7
32
27
8
31
22
Figure 6: Survival curves of GENIUS risk group predictions wrt the molecular subtype.
j
3. Results
4. Conclusions
We focussed our analysis on untreated node-negative patients in order to build a prognostic model
for early stage breast cancer and to avoid any confounding factors due to the treatment effects on
survival (untreated). We used VDX [6, 5] as training set since this population contained the largest
set of untreated patients with early breast cancer.
novel prognostic model, which considers the molecular heterogeneity of breast cancer, outperforms the current clinical models and gene signatures. GENIUS was the only signature to be
highly prognostic in all the molecular subtypes. Additionally, the modular architecture of the model
allows plugging in any other gene expression signatures, potentially enlarging its usability.
T
HIS
Subtype Clustering
References
[1] C. Desmedt, B. Haibe-Kains, P. Wirapati, et al. Biological Processes Associated with Breast Cancer Clinical Outcome
Depend on the Molecular Subtypes. Clin Cancer Res, 14(16):5158–5165, 2008. doi:10.1158/1078-0432.CCR-074756.
2
Once subtype clustering was fitted on the training set (see Figures 2, we were able to robustly identify three subtypes: ER-/HER2- (99), HER2+ (54) and ER+/HER2- (191) (Figure 3). We validated the
\end{minipage}
robustness of this clustering model in 21 breast
cancer microarray datasets (data not shown).
1
1.0
ER−/HER2−
HER2+
ER+/HER2−
[2] B. Haibe-Kains. Identification and Assessment of Dene Signatures in Human Breast Cancer. Ph.D. thesis, Université
Libre de Bruxelles, 2009.
y
Densit
[3] F. J. Harrell, K. Lee, and D. Mark. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med, 15(4):361–387, 1996. doi:10.1002/(SICI)10970258(19960229)15:4¡361::AID-SIM168¿3.0.CO;2-4.
1
ERB
0
B2
2
1
0
1
2
2
ESR
[5] A. J. Minn, G. P. Gupta, D. Padua, et al. Lung metastasis genes couple breast tumor size and metastatic spread.
Proceedings of the National Academy of Sciences, 104(16):6740–6745, 2007. doi:10.1073/pnas.0701138104.
Patients
1
[6] Y. Wang, J. G. Klijn, Y. Zhang, et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative
primary breast cancer. Lancet, 365:671–679, 2005.
−2
1
[4] T. A. Johansen and B. A. Foss. Constructing NARMAX models using ARMAX models. International Journal of Control,
58:1125–1153, 1993.
0
0.0
2
−1
ERBB2
0.5
−2
−1
0
1
2
ESR1
Figure 2: The subtype clustering model is com- Figure 3: Identification of breast cancer molecular subtypes in the training set (VDX).
posed of a mixture of three Gaussians.
IMPAKT (ESMO) Breast 2009
[7] P. Wirapati, C. Sotiriou, S. Kunkel, et al. Meta-analysis of gene expression profiles in breast cancer: toward a unified
understanding of breast cancer subtyping and prognosis signatures. Breast Cancer Research, 10(4):R65, 2008. ISSN
1465-5411. doi:10.1186/bcr2124.
9
28
20
10
26
18
Documents connexes
Téléchargement