Molecular subtypes identification to refine breast cancer prognosis Benjamin Haibe-Kains

publicité
Molecular subtypes identification to refine breast cancer
prognosis
Benjamin Haibe-Kains1,2
1 Functional
2
Genomics Unit, Institut Jules Bordet
Machine Learning Group, Université Libre de Bruxelles
October 16, 2008
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
1 / 76
Research Groups
Machine Learning Group (Gianluca Bontempi)
10 researchers (2 Profs, 1 postDoc, 7 PhD students), 2 graduate
students).
Research topics : Bioinformatics, Classification, Regression, Time
series prediction, Sensor networks.
Website : http://www.ulb.ac.be/di/mlg.
Scientific collaborations in ULB : IRIDIA (Sciences Appliquées),
Physiologie Molculaire de la Cellule (IBMM), Conformation des
Macromolcules Biologiques et Bioinformatique (IBMM), CENOLI
(Sciences), Functional Genomics Unit (Institut Jules Bordet), Service
d’Anesthesie (Erasme).
Scientific collaborations outside ULB : UCL Machine Learning Group
(B), Politecnico di Milano (I), Universitá del Sannio (I), George
Mason University (US).
The MLG is part to the ”Groupe de Contact FNRS” on Machine
Learning and to CINBIOS: http://babylone.ulb.ac.be/Joomla/.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
2 / 76
Research Groups
Functional Genomics Unit (Christos Sotiriou)
9 researchers (1 Prof, 5 postDocs, 3 PhD students), 5 technicians.
Research topics : Genomic analyses, clinical studies and translational
research.
Website :
http://www.bordet.be/en/services/medical/array/practical.htm.
National scientific collaborations : ULB, Erasme, ULg, Gembloux,
IDDI.
International scientific collaborations : Genome Institute of Singapore,
John Radcliffe Hospital, Karolinska Institute and Hospital, MD
Anderson Cancer Center, Netherlands Cancer Institute, Swiss Institute
for Experimental Cancer Research, NCI/NIH, Gustave-Roussy
Institute.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
3 / 76
Summary
Introduction
I
I
I
Breast Cancer
Prognosis
Gene Expression Profiling
Breast Cancer Molecular Subtypes
Prognostic Gene Signatures
Subtypes and Prognosis
I
GENIUS
Conclusion
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
4 / 76
Part I
Introduction
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
5 / 76
Breast Cancer
Breast cancer is a global public health issue.
It is the most frequently diagnosed malignancy in women in the
western world and the commonest cause of cancer death for European
and American women.
In Europe, one out of eight to ten women, depending on the country,
will develop breast cancer during their lifetime.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
6 / 76
ntensity for
each gene
probe.
A gene-expression
vector is then collected
by summarizing
the expression levels of
uorescent
for
each
gene
probe.
A gene-expression
is then
by summarizing
slidesintensity
(CodeLink). The
Affymetrix
microarray
system uses
a single-colour detection vector
scheme, whereby
onecollected
sample is hybridized
per chip. Agilent the expression l
cilitateTo
the
comparison
between
the
different
compensate
for differences
acilitate
the
comparison
between
the
different
in labelling,
hybridizh
technology
uses a two-colour
scheme, whereby
the
same array
isexperiments
hybridized with
twoand
different
samples.
ample.
facilitate
the
comparison
between
the
different
experiments
and
compensate
for
differences
in labelling,
RNA is extracted
from
frozen breast
tumour samples collected either at surgery (a), or before treatment (b), labelled with a detectable marker
ormalization
step
isis
usually
performed.
normalization
step
usually
performed.
methods,
a normalization
stepto
istheusually
performed.
(fluorescent
dye), and hybridized
array containing
individual gene-specific probes. Gene-expression levels are estimated by measuring the
fluorescent
intensity
for each geneare
probe.
A gene-expression
vector
is then collectedgene-expression
by summarizing the expression
levels of each
gene in the from tumour su
ession
prognostic
classifiers
usually
built
by
correlating
patterns,
ession
prognostic
classifiers
are
usually
built
generated
Gene-expression
prognostic
classifiers
are
usually
built
by correlating
gene-expression
patterns,
generated from tum
sample. To facilitate the comparison between the different experiments and compensate for differences in labelling, hybridizations and detection
ome
of
metastases
during
(a). Gene-expression
ome(development
(development
ofdistant
distant
metastases
during follow-up)
predictive
classifiers
of response
methods, a normalization
step is usually
performed. metastases
linical
outcome
(development
of distant
during follow-up)
(a). Gene-expression
predictive
classifiers
of re
Gene-expression
prognostic classifiers
are usually
built byfrom
correlating
gene-expression
patterns,
generated
from tumour surgical
specimens,therapy,
with
yyenerated
correlating
gene-expression
data,
derived
biopsies
taken
before
pre-operative
systemic
with clini
clin
correlating
gene-expression
data,
derived
by
correlating
gene-expression
data,
derived
from
biopsies
taken
before
pre-operative
systemic
therapy,
w
clinical outcome (development of distant metastases during follow-up) (a). Gene-expression predictive classifiers of response to treatment are
he given
(treatment
the
given
treatment
(bb).).gene-expression
generated
correlating
esponse
totreatment
thebygiven
(b).data, derived from biopsies taken before pre-operative systemic therapy, with clinical and/or pathological
a
b
Breast Cancer Prognosis
response to the given treatment (b).
a
Breast surgery
+ radiotherapy
Follow-up 5–10 years
Breast surgery
Follow-up 5-10 years
Development of
gene expression
prognostic classifier
Follow-up Follow-up
5–10 years5–10 years
Breast surgery
Breast
Breast surgery
surgery
Recurrence
Remission
DNA microarrays
b
Diagnosis
Frozen surgical
tumour specimen
Recurrence
Remission
Remission
Recurrence
Recurrence
Remission
?
Pre-operative chemotherapy
DNA
microarrays
DNA microarrays
Prognosis
Frozen surgical
No response
Response
Frozen surgical
surgical
Frozen
Tumour biopsy
tumour specimen
before therapy
tumour specimen
specimen
tumour
Development of
gene expression
predictive classifier
DNA microarrays
5–15% for the various microarray platforms,
Benjamin
(ULB)
whereas
it wasHaibe-Kains
10–20% for between-labora-
gene expression, were detecting
similar
the prediction of good outcome versus
Pre-operative
chemotherapy
Visit to
IGC results
October
16, 2008
changes in gene abundance.
Similar
poor outcome) but generated
different
7 / 76
Current Clinical Tools for Prognosis
Clinical variables
Histological
grade
Nodal
status
Age
Tumor
size
Guidelines
NIH/St Gallen
ER
IHC
NPI
AOL
Prognosis
Need to improve current clinical tools to detect patients who need
adjuvant systemic therapy.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
8 / 76
Potential of Genomic Technologies for Prognosis
In the nineties, new biotechnologies emerged:
I
I
Human genome sequencing.
Gene expression profiling (low to high-throughput).
Genomic data could be used to better understand cancer biology
. . . and to build efficient prognostic models.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
9 / 76
Biology Paradigm
Enhancers
Promoter
DNA
Transcription
Introns
pre-mRNA
Exon
Exon
Exon
Exon
Exon
Splicing
mRNA
5' UTR
Open reading
frame
3' UTR
Translation
Gene expression
profiling
protein
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
10 / 76
Gene Expression Profiling
Gene expression profiling using microarray chip:
Microarray chip
Hybridization
Detection
A
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
11 / 76
Microarray Data
Few samples (dozens to hundreds).
I
I
Microarray technology is expensive.
Frozen tumor samples are rare (biobank).
On the other hand, numerous gene expressions are measured.
I
The new microarray chips cover the whole genome (≈ 50,000 probes
representing 30,000 ”known genes”).
ß High feature-to-sample ratio (curse of dimensionality).
Microarray is a complex technology.
ß High level of noise in the data.
Biology is complex.
ß Variables are highly correlated (gene co-expressions due to biological
pathways).
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
12 / 76
Microarray Data
Warning
You can easily find spurious patterns in the data, biologically
”meaningful”.
Personal experience:
I
I
I
I
At the beginning of my thesis, I had accidentally mixed the patients
labels, so the relation between input (gene expressions) and output (a
mutation) was completely random.
I gave a list of genes differentially expressed between wild type and
mutated patients, to the biologists in charge of the project and they
found it very interesting (known genes, meaningful biological story).
When I saw my mistake, I corrected the bug and sent a new gene list
. . . and the results were even better!
In conclusion, the complexity of microarray data and the biology
behind should make you very critic and cautious with your results.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
13 / 76
Part II
Breast Cancer Molecular Subtypes
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
14 / 76
Breast Cancer Subtypes
Early microarray studies showed that BC is a molecularly
heterogeneous disease [Perou et al., 2000; Sorlie et al., 2001, 2003; Sotiriou
et al., 2003].
Hierarchical clustering on microarray data [Sorlie et al., 2001]:
MEDICAL SCIENCES
I
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
15 / 76
Breast Cancer Subtypes
Clinical Outcome
ig. 3.
The molecular subtypes exhibited different clinical outcomes,
suggesting that the biological processes involved in patients’ survival
might be different.
Overall and relapse-free survival analysis
of the 49 breast cancer patients,
uniformly16treate
Visit to IGC
October 16, 2008
/ 76
Benjamin Haibe-Kains (ULB)
Breast Cancer Subtypes
Early Results
These early studies showed similar results, i.e. ER and HER2
pathways are the main discriminators in breast cancer (confirmed by
[Kapp et al., 2006]).
However, this classification has strong limitations [Pusztai et al., 2006]:
I
I
I
Instability: the results are hardly reproducible due to the instability of
the hierarchical clustering method in combination with microarray data
(high feature-to-sample ratio).
Crispness: hierarchical clustering produces crisp partition of the dataset
(hard partitioning ) without estimation of the classification uncertainty.
Validation: the hierarchical clustering is hardly applicable to new data.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
17 / 76
Breast Cancer Subtypes
New Clustering Model
Because of these limitations we sought to develop a simple method to
identify the breast cancer subtypes.
ß We introduced a model-based clustering (mixture of Gaussians) in a
two-dimensional space defined by the ESR1 and ERBB2 module
scores [Wirapati et al., 2008; Desmedt et al., 2008].
I
I
We used the Bayesian information criterion (BIC) to select the most
likely number of subtypes [Fraley and Raftery, 2002].
We validated our model (fitted on Wang et al. series) on 14
independent datasets in terms of number of clusters and prediction
strength [Tibshirani and Walther, 2005].
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
18 / 76
New Clustering Model
Training
VDX
BIC
2
1
ER /HER2
HER2+
ER+/HER2
0.6
BIC
0
ERBB2
1
0.8
1
0.4
2
0.2
0
2
1
0
1
2
2
ESR1
Benjamin Haibe-Kains (ULB)
4
6
8
10
number of clusters
Visit to IGC
October 16, 2008
19 / 76
New Clustering Model
Validation
2
ER−/HER2−
HER2+
ER+/HER2−
0
●
●
●
●
● ● ● ●● ●
●
●
●
●
●
●
●
● ● ● ●●●● ● ●
● ●
● ●
●
● ●●
●●
●
● ●
● ● ●
●
● ●● ●● ●●
●●
●●● ●
● ●●●●●
● ●●
●
●
●
●
●● ●
●
● ●● ●●●
●●●● ●
●
●●
●
● ●●
●
●●●● ●
●●
●
●
●●
●
●●
●● ●
●●●
●●●
●
●● ●●●●●
●
●●
●
●
●●●●
●
●
●●
● ●●
● ●
● ● ●●
●●
●●
●
●
●
●
●●● ●
● ●
●
●
●● ●
● ● ●
● ●
●
●
●
●
● ● ● ● ●●
●
●
●
●●
−1
−1
0
ERBB2
0
●
●
●● ●
●
●●
●
●
● ●
●
●
● ●
●●● ●
● ●●●●● ●●
●● ● ●
●●●● ●
●
● ●●
●●
● ● ● ●●
●
● ●
●●
● ●●●
●
●
●
●
●
●
●●●
●● ● ● ●
●●
●
●●
●●●
●
●
● ●
●
●
●●
●
●●
●●
● ●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
ERBB2
●
●
●
● ● ●●●
●
●●
●
● ●
●
●
●●
●● ●
●
●
●●● ●●
● ●
● ●● ● ●●
●
●●● ●
●●
●●
●●●
●
●
●●
●
●
●● ●
●
●
●
●●
●
●●
●●●
●
●
●
●
●
●● ● ●
●●
●
● ●● ●●
●●
●●
●●
●
●
●●●●
●
●●
●●● ●
●
●●
●●● ●
●●●
●
●
●
●●
●
● ●
●●● ●●
●●
●●●●
● ●
●●
●
●
●
●
●●●
●
● ●● ● ●
●
●
●
●
●● ●●
●
●
●
●
●●●
●●
●●
●
● ●
● ●●
●
●●
●
●
●●
●●
●
●
●
●●●
●●●
●●
●●
●
●
●
●
●
●
●
● ●
●
● ●
●
● ●
●
●●
●●
● ●●
● ● ●
●
●
●
●
−1
ER−/HER2−
HER2+
ER+/HER2−
1
●
1
●
1
●
ERBB2
UPP
2
TBG
2
NKI
ER−/HER2−
HER2+
ER+/HER2−
−2
−1
0
1
2
−2
−2
−2
●
−2
−1
ESR1
0
1
2
−2
1
2
2
UNC2
2
2
ER−/HER2−
HER2+
ER+/HER2−
ER−/HER2−
HER2+
ER+/HER2−
ER−/HER2−
HER2+
ER+/HER2−
1
●
1
●
1
0
ESR1
MAINZ
UNT
●
−1
ESR1
−1
0
ESR1
Benjamin Haibe-Kains (ULB)
1
2
0
●
−2
●
−2
−2
●
●
●
● ● ●●● ●●
●
●
●
●
●
●● ● ●
● ●●
● ●
●
●
●
●
●● ●● ●●
●
● ●
●
●
●
●●
● ●●●
● ●●
●●
●● ●● ● ●
●
●
●● ●
●● ● ● ● ●●
●
● ●
● ● ●●
● ● ●●
●
●●
●
● ●
● ● ●●●
●
● ●●
●
● ● ● ●
●
●
●
−1
●
ERBB2
●
●
●
●●
●
●
●●
●●
●●
●●
●●●●●
●●
●
●
●
●
● ●● ● ●●
●
●
● ●●● ● ●
●
● ●●
●● ●
●
● ●●
● ●
●
●●
●
●
●
●
●
●
●
●
● ●●
●
● ●
●
● ●●
● ●●●
● ●
●
●●
●● ●
●●
●● ●
●●
● ●●● ●● ●●●
● ●●●●
●
●●
● ●
●
●
●●●● ● ● ●
●
●
● ● ●● ●●
●● ● ●●● ●
● ●
●●●
●
●
●
0
●
●
●●●
●
●
●
●
● ● ●● ●
● ●●
●
●
●● ●
●●
●
● ● ●●
●● ●
● ● ● ●●
● ●●
●
●● ● ●
●●
● ●●
●
● ●● ● ● ●
●●●
●
●
● ● ●●
●
●
● ● ●●
● ● ●
●
●
●
●●
●
−2
−1
●
●
ERBB2
●
●
−1
0
ERBB2
●
●
●
−2
−1
0
ESR1
Visit to IGC
1
2
−2
−1
0
1
2
ESR1
October 16, 2008
20 / 76
New Clustering Model
Validation: Prediction Strength
Dataset
NKI
TBG
UPP
UNT
MAINZ
STNO2
NCI
MSK
STK
DUKE
UNC2
CAL
DUKE2
NCH
Benjamin Haibe-Kains (ULB)
ER-/HER21.00
1.00
1.00
1.00
1.00
1.00
0.85
1.00
1.00
1.00
1.00
1.00
1.00
1.00
HER2+
1.00
1.00
0.93
0.89
1.00
0.69
0.83
1.00
0.91
0.82
0.87
1.00
0.64
0.82
Visit to IGC
ER+/HER20.99
0.83
0.87
0.92
0.90
0.97
0.93
0.96
0.87
0.92
0.96
0.95
0.95
0.98
October 16, 2008
21 / 76
New Clustering Model
Validation: Number of Clusters
1
0.8
BIC average
0.6
0.4
0.2
0
−0.2
2
4
6
8
10
number of clusters
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
22 / 76
Breast Cancer Subtypes
Clinical Outcome
ER+/HER2-: 60-70%
0.6
ER!/HER2!
HER2+
ER+/HER2!
0.0
of the global population of
BC patients.
0.4
HER2+: 15-20%
0.2
ER-/HER2-: 20-25%
Probability of survival
0.8
1.0
Node-negative untreated patients
NKI/TBG/UPP/UNT/MAINZ
0
No. At Risk
ER!/HER2! 119
HER2+ 106
ER+/HER2! 516
Benjamin Haibe-Kains (ULB)
Visit to IGC
1
111
98
507
2
91
91
487
3
83
81
462
4
5
6
Time (years)
78
73
435
71
69
410
68
64
363
7
64
58
319
8
53
52
282
9
46
47
257
October 16, 2008
10
37
44
223
23 / 76
Breast Cancer Subtypes
New Clustering Model (dis)Advantages
Advantages:
I
Simple model-based clustering:
F
F
I
Easily applicable to new data.
Returning for each patient the probability to belong to each subtype
(soft partitioning ).
Low dimensional space:
F
F
Low computational cost to fit the model.
Simple visualization of the results.
Disadvantages:
I
Low dimensional space: which dimension could we add in order to find
another robust subtype?
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
24 / 76
Part III
Prognostic Gene Signatures
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
25 / 76
Prognostic Gene Signatures
Use of microarray technology to improve current prognostic models
(NIH/St Gallen guidelines, NPI, AOL).
A typical microarray analysis dealing with breast cancer
prognostication involves 5 key steps:
1
2
3
4
5
Data preprocessing: quality controls and normalization.
Filtering: discard the genes exhibiting low expressions and/or low
variance.
Identification of a list of prognostic genes (called a gene signature).
Building of a prognostic model, i.e. combination of the expression of
the genes from the signature in order to predict the clinical outcome of
the patients.
Validation of the model performance and comparison with current
prognostic models.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
26 / 76
Prognostic Gene Signatures
Fishing Expedition
Prognostic models derived from gene expression data by looking for
genes associated with clinical outcome without any a priori biological
assumption [van’t Veer et al., 2002; Wang et al., 2005].
GENE70 signature
GENE76 signature
1.0
Overall Survival
0.8
0.6
0.4
Poor signature
0.2
0.0
2
4
60
91
57
72
54
55
0.4
0.2
P<0.001
0
0.6
6
8
10
12
31
26
22
17
12
9
0.0
Dist ant -met ast asis-f ree survival in validat ion set
1·0
1·0
Good (n=59)
0·8
0·6
Poor signature
Poor (n=112)
0·4
P<0.001
0·2
0
Years
Hazard ratio=5·67 (95% CI 2·59–12·4)
0
2 0 4
61 8 210 123
4
Years
Years
0·8
0·6
0·4
0·2
Log-rank p0·0001
5
6
7
55
55
53
48
66
60
55
52
Patients at risk
N O. AT R ISK
Good signature
Poor signature
Good signature
Probability of overall survival
Good signature
0.8
Probability of metastasis-free survival
Probability of Remaining
Metastasis-free
1.0
N O. AT R ISK
45
41
Good signature
Good signature
60 59 59 58
Poor signature
Poor signature
91 8611266
van't Veer et al.
van de Vijver
58
48
35 5624
103
50
33 9021
1255
1075
Wang et al.
Promising results but a lot criticisms from a statistical point of view.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
27 / 76
Prognostic Gene Signatures
Hypothesis-driven
Prognostic models were also derived from gene expression data based
on a biological assumption.
I
I
Example: GGI [Sotiriou et al., 2006] was designed to discriminate
patients with low and high histological grade (proliferation).
GGI was able to discriminate patients with intermediate histological
grade (HG2).
Histological grade
Benjamin Haibe-Kains (ULB)
GGI (HG2)
Visit to IGC
GGI
October 16, 2008
28 / 76
Prognostic Gene Signatures
Independent Validation
These preliminary resulting were promising but validation was
required.
A first validation was published by the authors of the GENE70 and
GENE76 signatures in [van de Vijver et al., 2002] and [Foekens et al., 2006]
respectively.
Our group was involved in a second validation:
I
I
I
Complete independence: the authors of the signatures were not aware
of the clinical data of the patients in the dataset.
The statistical analyses were performed by an independent group.
Aim: validate definitively the prognostic power of these two models in
order to start a large clinical trial called MINDACT (Microarray In
Node negative Disease may Avoid ChemoTherapy).
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
29 / 76
Prognostic Gene Signatures
Independent Validation (cont.)
Although the performance in this validation series was less impressive
than in the original publications, GENE70 and GENE76 sufficiently
improved the current clinical models to go ahead with MINDACT.
Independent Validation of the 76-Gene Signature
GENE76
0.8
0.6
0.4
52
59
28
163
0.2
Probability of event
Patients Events Risk group
7
11
6
52
Gene signature low risk, clinical low risk
Gene signature low risk, clinical high risk
Gene signature high risk, clinical low risk
Gene signature high risk, clinical high risk
0.0
,K( "#*( H.*+"#*( &$=.M
=,0&#2,( *+,( <,K.&%(
2(*+&%(+&')(*+&*(")(*+,(
$23($,2;,0*.G,'NI3(H,(
."2(&KQ#2*,K()"$(0'.%M
*H&$,(H.*+(&$=.*$&$N(
<,(;".%*25(A2(2+"H%(
,(J$,&*,2*(;$"J%"2*.0(
"I( &*( .K,%*.)N.%J( ;&M
(C+,2,(&%&'N2,2(2#JM
<&N(,W;'&.%(2"<,(")(
=,*H,,%( *+,( "$.J.%&'(
A
0
2
4
52
59
28
163
50
58
26
143
47
55
25
126
6
8
10
12
14
41
36
28
17
Year
44
Fig.
curves
The
511. Kaplan-Meier
51
44by genomic
36risk group.26
HRs
stratified by 10
center. A, TDM.
21and log-rank
16 tests are 12
3
B,115
overall survival.
101
91
74
46
Number at risk
B
1.0
"))(0+"2,%3()"$(="*+(
G.G&'5( L"H,G,$3( )"$(
2(&0+.,G,K("%'N(="$M
1.0
GENE70
bability of event
0.8
ß Validation of GENE70 [Buyse et al., 2006] and GENE76 [Desmedt et al.,
2007].
0.4
0.6
"H,K(*+&*(*+,(;$,K.0M
&2*(&2(J""K(&2(*+&*(")(
5,53(&('"H(;$"=&=.'.*N(
C+,(2,%2.*.G.*.,2()"$(
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
30 / 76
Prognostic Gene Signatures
Independent Validation (cont.)
We sought to compare the GGI to the GENE70 and GENE76
signatures in this validation series
. . . and showed that GGI has very similar
performance
[Haibe-Kains
!"#$%&'()*+,
!"##$%!!
&' ( )
*++,&--.../012345647+89:/623-;)<;=";>)-(-'()
et al., 2008b].
!"#$%&'()*+,!"##$%!!&'()
GENE70
GENE76
5
0
7
9
15
25
103
32
0
5
GGI
25
15
9
7
and the GGI were higher than the HR of the 76-gene signature, the differences were not statistically significant
(see Table 2). Figure 3 illustrates the Kaplan-Meier estimates of DMFS for the four groups of patients (two groups
with concordant results in risk assessment and two with
discordant results) for the different signatures two by two.
GENE70analyses (Table 3), we can conclude
From the multivariate
GENE76
that the three
signatures added significant information to
the traditional
GGIparameters and were the strongest predictive variables of DMFS, as reflected by their lowest p-valAOL
ues compared to the other variables. The additional
information of these signatures over the clinical risk was
also confirmed by the fact that the univariate HRs for the
0.5similar
0.6when0.7
0.8for the
0.9
three signatures remained
adjusted
clinical risk, with a HR of 7.25 (95% CI: 2.4–21.5; p = 3.5
concordance
index
-4
– 10 ), 2.8 (95% CI: 1.2–6.8; p = 0.018) and 6.25 (95%
CI: 2.3–17; p = 3.3 – 10-4) for the 70-gene signature, 76gene signature and the GGI respectively.
Figure
ple
Venn
according
diagram
1
to
illustrating
the prognostic
the classification
signatures of the tumor samVenn diagram illustrating the classification of the
tumor sample according to the prognostic signawe combined the three gene signatures
in order
tures. Dark
red = high-risk
patients and blue = low-risk
Benjamin
Haibe-Kains
(ULB)
VisitLastly,
to IGC
October
16, to
2008
1
Forest
the
anceAdjuvant!
Figure
indices,
plots
2 (and
and
Online
B/the
95%classification
CI)
log2
forhazard
the
31 / 76
Part IV
Subtypes and Prognosis
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
32 / 76
Prognosis in Specific Subtypes
The first publications attempted to build a prognostic model from the
global population of BC patients.
In 2005, Wang et al. were the first to divide the global population
based on ER status:
I
I
I
I
As BC biology is very different according to the ER status, prognostic
models might be different too.
They built a prognostic model for each subgroup of patients (ER+ and
ER-).
To make a prediction, they used one of the two models depending on
the ER-status of the tumor.
Unfortunately the group of ER- tumors was too small and their
corresponding model was not generalizable.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
33 / 76
Prognosis in Specific Subtypes
(cont.)
Recently, Teschendorff et al. built a new prognostic model for ERtumors [Teschendorff et al., 2007] and validated it [Teschendorff and Caldas,
2008] using large datasets.
I
The signature is composed of 7 immune-related genes.
We showed in two meta-analyses [Wirapati et al., 2008; Desmedt et al.,
2008] that:
I
Proliferation (AURKA) was the most prognostic factor in ER+/HER2tumors and the common driving force of the early gene signatures.
F
I
I
Actually, these early signatures (e.g. GENE70, GENE76, GGI) are
prognostic in ER+/HER2- tumors only.
Immune response (STAT1) is prognostic in ER-/HER2- and HER2+
tumors.
Tumor invasion (PLAU or uPA) is prognostic in HER2+ tumors.
Finak et al. introduced a stroma-derived prognostic predictor (SDPP)
particularly efficient in HER2+ tumors [Finak et al., 2008].
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
34 / 76
New Prognostic Model
Since current prognostic models/gene signatures are limited to some
subtypes, we sought to develop a new prognostic model integrating
the breast cancer subtypes identification in order to:
I
I
Build a prognostic gene signatures specifically targeting each subtype.
Build a global prognostic model able to predict the risk of the patients
whatever the tumor subtype (ER-/HER2-, HER2+ or ER+/HER2-).
We assessed the performance and compared it to current prognostic
models using the thorough statistical framework developed in
[Haibe-Kains et al., 2008a].
This new prognostic model is called GENIUS, standing for
Gene Expression progNostic Index Using Subtypes
Benjamin Haibe-Kains (ULB)
Visit to IGC
,
October 16, 2008
35 / 76
GENIUS
A
B
Training set
Validation set
Subtypes
identification
Pr1
Pr2
HER2+
Pr3
(Pr1, Pr2, Pr3)
Pr1
ER-/HER2-
ER+/HER2-
Pr2
Identification
of prognostic
genes
Identification
of prognostic
genes
GGI
subtype
signature
subtype
signature
subtype
signature
Model
building
Model
building
Model
building
subtype
risk score
subtype
risk score
subtype
risk score
Combination
GENIUS
GENIUS
risk score
cutoff
cutoff
risk group
risk group
risk score
Performance
assessment and
comparison
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
36 / 76
GENIUS
Performance Assessment and Comparison
We trained GENIUS on VDX:
I
286 node-negative untreated BC patients.
We assessed the performance in an independent dataset composed of
I
I
765 node-negative untreated patients
coming from 5 different datasets (NKI, TBG, UPP, UNT and MAINZ).
Risk score prediction: continuous value.
Risk group prediction: binary value (application of a cutoff on the risk
score).
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
37 / 76
GENIUS
Risk Score Prediction
State-of-the-art
gene signatures
A
Test for GENIUS
superiority
ALL:
clinical prognostic
indices
B
GENIUS
GGI
STAT1
PLAU
uPA
IRMODULE
SDPP
ALL:
0.025
4E-12
2E-15
1E-11
1E-4
0.027
Test forTest
GENIUS
for GENIUS
superiority
superiority
A
GENIUS
GGI GENIUS
AOL
STAT1
PLAUNPI
uPA
ER+/HER2!:
GENIUS
IRMODULE
SDPPAOL
ALL:
NPI
ER+/HER2!: GENIUS
GGI
STAT1
PLAU
uPA
IRMODULE
SDPP
ER!/HER2!: GENIUS
GGI
STAT1
PLAU
uPA
IRMODULE
SDPP
HER2+:
0.3
0.012
0.047
0.0021
0.01
0.44
0.014
ER!/HER2!: GENIUS
GGI
STAT1
0.2 PLAU
0.3
0.4
uPA
IRMODULE
SDPP
AOL
NPI
HER2+:
GENIUS
GGI
STAT1
PLAU
uPA
IRMODULE
SDPP
0.2
0.73
1E-5
1E-9
5E-8
0.058
0.2
ER+/HER2!: GENIUS
GGI GENIUS
ER!/HER2!:
STAT1
AOL
PLAUNPI
uPA
IRMODULE
HER2+:
GENIUS
SDPP
0.04
0.068
0.082
0.042
0.51
0.36
0.4
0.5
0.6
0.7
0.0015
2E-10 0.022
5E-17 0.017
3E-10
2E-4
0.0086 0.021
0.19
ALL
0.5
2E-5 0.31
1E-6 0.019
0.0012
0.13
0.12 0.27
ER
ER
HE
0.069
0.0072
0.13
0.7 9E-40.8
0.0076
concordance index
0.58
0.028
0.5
0.6
GENIUS
GGI
STAT1
PLAU
uPA
IRMODULE
SDPP
0.0052
0.074
0.015
0.0084
0.11
0.063
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.8
concordance index
concordance index
Benjamin Haibe-Kains (ULB)
B
Visit to IGC
October 16, 2008
38 / 76
GENIUS
Risk Group Prediction
GGI
GENIUS
Global population
No. At Risk
Low 488
High 253
474
241
2
459
208
3
445
180
4
5
6
Time (years)
427
158
409
140
Benjamin Haibe-Kains (ULB)
364
131
7
325
116
0.2
Low
High
0.0
HR=3.6, 95%CI [2.6,4.8], p−value=9.6E−17
1
0.6
0.8
0.0
Low
High
0
0.4
Probability of survival
0.6
0.4
0.2
Probability of survival
0.8
1.0
1.0
Global population
8
286
100
9
257
92
10
223
81
HR=2.5, 95%CI [1.8,3.3], p−value=1.7E−09
0
No. At Risk
Low 484
High 257
Visit to IGC
1
473
242
2
458
209
3
439
186
4
5
6
Time (years)
414
171
394
155
348
146
7
306
135
8
270
116
9
245
104
October 16, 2008
10
215
89
39 / 76
GENIUS
Risk Group Prediction (cont.)
GENIUS
GGI
ER+/HER2-
0.6
0.2
0.4
Probability of survival
0.6
0.4
0.2
Probability of survival
0.8
0.8
1.0
1.0
ER+/HER2-
Low
High
0.0
0.0
Low
High
HR=3.4, 95%CI [2.3,5], p−value=1.8E−09
0
No. At Risk
Low 391
High 125
1
384
124
2
374
113
3
364
99
4
5
6
Time (years)
349
87
335
76
Benjamin Haibe-Kains (ULB)
294
71
7
258
62
8
226
57
9
204
54
10
176
47
HR=3.4, 95%CI [2.3,5], p−value=1.3E−09
0
No. At Risk
Low 391
High 125
Visit to IGC
1
384
124
2
373
114
3
364
99
4
5
6
Time (years)
349
87
335
76
294
71
7
258
62
8
226
57
9
204
54
10
175
48
October 16, 2008
40 / 76
GENIUS
Risk Group Prediction (cont.)
GGI
GENIUS
ER-/HER2-
No. At Risk
Low 47
High 72
45
67
2
42
50
3
40
44
4
5
6
Time (years)
39
40
37
35
Benjamin Haibe-Kains (ULB)
37
32
7
36
29
0.6
0.2
HR=3.1, 95%CI [1.5,6.6], p−value=2.8E−03
1
Low
High
0.0
0.0
Low
High
0
0.4
Probability of survival
0.6
0.4
0.2
Probability of survival
0.8
0.8
1.0
1.0
ER-/HER2-
8
31
23
9
27
20
10
21
16
HR=0.8, 95%CI [0.4,1.6], p−value=5.4E−01
0
No. At Risk
Low 43
High 76
Visit to IGC
1
40
72
2
32
60
3
31
53
4
5
6
Time (years)
28
51
25
47
25
44
7
24
41
8
19
35
9
18
29
October 16, 2008
10
13
24
41 / 76
GENIUS
Risk Group Prediction (cont.)
GGI
GENIUS
HER2+
0.8
1
47
52
2
45
47
3
43
39
4
5
6
Time (years)
41
33
39
31
Benjamin Haibe-Kains (ULB)
35
30
7
32
27
0.6
0.2
HR=4.2, 95%CI [1.9,9.4], p−value=5.7E−04
0
Low
High
0.0
0.0
Low
High
No. At Risk
Low 50
High 56
0.4
Probability of survival
0.6
0.4
0.2
Probability of survival
0.8
1.0
1.0
HER2+
8
31
22
9
28
20
10
26
18
HR=1.4, 95%CI [0.71,2.7], p−value=3.5E−01
0
No. At Risk
Low 49
High 57
Visit to IGC
1
47
52
2
45
47
3
39
43
4
5
6
Time (years)
37
37
35
35
32
33
7
29
30
8
26
27
9
24
24
October 16, 2008
10
22
22
42 / 76
Part V
Conclusion
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
43 / 76
Conclusion
Numerous studies confirmed the great potential of gene expression
profiling using microarrays to better understand cancer biology and to
improve current prediction models.
This technology becomes more and more mature (MAQC [shi, 2006])
and is now ready for clinical applications.
The promising results of early publications were validated in different
independent studies.
Recent meta-analyses successfully recapitulated the main discoveries
made these late decades and refined our knowledge on breast cancer
biology.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
44 / 76
Conclusion (cont.)
We benefit from this strong basis to go a step further to improve
breast cancer prognosis using microarrays.
I
Prognostic models/gene signatures in specific subtypes [Teschendorff
et al., 2007; Desmedt et al., 2008; Finak et al., 2008].
I
Development of GENIUS, a prognostic model integrating BC molecular
subtypes identification [manuscript in preparation].
A major issue remains: ”How to combine these microarray prognostic
models with clinical variables?”
I
I
Several studies showed the additional information of tumor size, nodal
status, . . .
However, we currently lack of data to fit robust prognostic models
combining microarray and clinical variables.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
45 / 76
Thank you for your attention.
This presentation is available from http://www.ulb.ac.be/di/map/
bhaibeka/papers/haibekains2008molecular.pdf.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
46 / 76
Part VI
Bibliography
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
47 / 76
Bibliography I
The microarray quality control (maqc) project shows inter- and intraplatform
reproducibility of gene expression measurements. Nat Biotech, 24(9):1151–1161,
2006. URL http://dx.doi.org/10.1038/nbt1239.
Marc Buyse, Sherene Loi, Laura van’t Veer, Giuseppe Viale, Mauro Delorenzi,
Annuska M. Glas, Mahasti Saghatchian d’Assignies, Jonas Bergh, Rosette Lidereau,
Paul Ellis, Adrian Harris, Jan Bogaerts, Patrick Therasse, Arno Floore, Mohamed
Amakrane, Fanny Piette, Emiel Rutgers, Christos Sotiriou, Fatima Cardoso, and
Martine J. Piccart. Validation and Clinical Utility of a 70-Gene Prognostic Signature
for Women With Node-Negative Breast Cancer. J. Natl. Cancer Inst., 98(17):
1183–1192, 2006. doi: 10.1093/jnci/djj329. URL
http://jnci.oxfordjournals.org/cgi/content/abstract/jnci;98/17/1183.
C. A. Davis, F. Gerick, V. Hintermair, C. C. Friedel, K. Fundel, R. Kuffner, and
R. Zimmer. Reliable gene signatures for microarray classification: assessment of
stability and performance. Bioinformatics, 22(19):2356–2363, 2006.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
48 / 76
Bibliography II
Christine Desmedt, Fanny Piette, Sherene Loi, Yixin Wang, Francoise Lallemand,
Benjamin Haibe-Kains, Giuseppe Viale, Mauro Delorenzi, Yi Zhang,
Mahasti Saghatchian d’Assignies, Jonas Bergh, Rosette Lidereau, Paul Ellis, Adrian L.
Harris, Jan G.M. Klijn, John A. Foekens, Fatima Cardoso, Martine J. Piccart, Marc
Buyse, Christos Sotiriou, and on behalf of the TRANSBIG Consortium. Strong Time
Dependence of the 76-Gene Prognostic Signature for Node-Negative Breast Cancer
Patients in the TRANSBIG Multicenter Independent Validation Series. Clin Cancer
Res, 13(11):3207–3214, 2007. doi: 10.1158/1078-0432.CCR-06-2765. URL
http://clincancerres.aacrjournals.org/cgi/content/abstract/13/11/3207.
Christine Desmedt, Benjamin Haibe-Kains, Pratyaksha Wirapati, Marc Buyse, Denis
Larsimont, Gianluca Bontempi, Mauro Delorenzi, Martine Piccart, and Christos
Sotiriou. Biological Processes Associated with Breast Cancer Clinical Outcome
Depend on the Molecular Subtypes. Clin Cancer Res, 14(16):5158–5165, 2008. doi:
10.1158/1078-0432.CCR-07-4756. URL
http://clincancerres.aacrjournals.org/cgi/content/abstract/14/16/5158.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
49 / 76
Bibliography III
Greg Finak, Nicholas Bertos, Francois Pepin, Svetlana Sadekova, Margarita
Souleimanova, Hong Zhao, Haiying Chen, Gulbeyaz Omeroglu, Sarkis Meterissian,
Atilla Omeroglu, Michael Hallett, and Morag Park. Stromal gene expression predicts
clinical outcome in breast cancer. Nat Med, 14(5):518–527, 2008. URL
http://dx.doi.org/10.1038/nm1764.
J. A. Foekens, D. Atkins, Y. Zhang, F. C. Sweep, N. Harbeck, A. Paradiso, T. Cufer,
A. M. Sieuwerts, D. Talantov, P. N. Span, V. C. Tjan-Heijnen, A. F. Zito, K. Specht,
H. Hioefler, R. Golouh, F. Schittulli, M. Schmitt, L. V. Beex, J. G. Klijn, and
Y. Wang. Multicenter validation of a gene expression–based prognostic signature in
lymph node–negative primary breast cancer. Journal of Clinical Oncology, 24(11),
2006.
C. Fraley and A. E. Raftery. Model-based clustering, discriminant analysis, and density
estimation. Journal of American Statistical Asscoiation, 97(458):611–631, June 2002.
B. Haibe-Kains, C. Desmedt, C. Sotiriou, and G. Bontempi. A comparative study of
survival models for breast cancer prognostication based on microarray data: does a
single gene beat them all? Bioinformatics, 24(19):2200–2208, 2008a. doi:
10.1093/bioinformatics/btn374. URL http:
//bioinformatics.oxfordjournals.org/cgi/content/abstract/24/19/2200.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
50 / 76
Bibliography IV
Benjamin Haibe-Kains, Christine Desmedt, Fanny Piette, Marc Buyse, Fatima Cardoso,
Laura van’t Veer, Martine Piccart, Gianluca Bontempi, and Christos Sotiriou.
Comparison of prognostic gene expression signatures for breast cancer. BMC
Genomics, 9(1):394, 2008b. ISSN 1471-2164. doi: 10.1186/1471-2164-9-394. URL
http://www.biomedcentral.com/1471-2164/9/394.
Amy Kapp, Stefanie Jeffrey, Anita Langerod, Anne-Lise Borresen-Dale, Wonshik Han,
Dong-Young Noh, Ida Bukholm, Monica Nicolau, Patrick Brown, and Robert
Tibshirani. Discovery and validation of breast cancer subtypes. BMC Genomics, 7(1):
231, 2006. ISSN 1471-2164. doi: 10.1186/1471-2164-7-231. URL
http://www.biomedcentral.com/1471-2164/7/231.
MJ Pencina and RB D’Agostino. Overall c as a measure of discrimination in survival
analysis: model specific population value and confidence interval estimation. Stat
Med, 23(13):2109–2123, 2004. doi: 10.1002/sim.1802.
Charles M. Perou, Therese Sorlie, Michael B. Eisen, Matt van de Rijn, Stefanie S.
Jeffrey, Christian A. Rees, Jonathan R. Pollack, Douglas T. Ross, Hilde Johnsen,
Lars A. Akslen, Oystein Fluge, Alexander Pergamenschikov, Cheryl Williams,
Shirley X. Zhu, Per E. Lonning, Anne-Lise Borresen-Dale, Patrick O. Brown, and
David Botstein. Molecular portraits of human breast tumours. Nature, 406(6797):
747–752, 2000. URL http://dx.doi.org/10.1038/35021093.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
51 / 76
Bibliography V
Lajos Pusztai, Chafika Mazouni, Keith Anderson, Yun Wu, and W. Fraser Symmans.
Molecular Classification of Breast Cancer: Limitations and Potential. Oncologist, 11
(8):868–877, 2006. doi: 10.1634/theoncologist.11-8-868. URL
http://theoncologist.alphamedpress.org/cgi/content/abstract/11/8/868.
T. Sorlie, C. M. Perou, R. Tibshirani, T. Aas, S. Geisher, H. Johnsen, T. Hastie, M. B.
Eisen, M. van de Rijn, S. S. Jeffrey, T. Thorsen, H. Quist, J. C. Matese., P. O.
Brown, D. Botstein, P. L. Eystein, and A. L. Borresen-Dale. Gene expression patterns
breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Matl.
Acad. Sci. USA, 98(19):10869–10874, 2001.
T. Sorlie, R. Tibshirani, J. Parker, T. Hastie, J. S. Marron, A. Nobel, S. Deng,
H. Johnsen, R. Pesich, S. Geister, J. Demeter, C. Perou, P. E. Lonning, P. O. Brown,
A. L. Borresen-Dale, and D. Botstein. Repeated observation of breast tumor subtypes
in indepedent gene expression data sets. Proc Natl Acad Sci USA, 1(14):8418–8423,
2003.
C. Sotiriou, S. Y. Neo, L. M. McShane, E. L. Korn, P. M. Long, A. Jazaeri, P. Martiat,
S.B. Fox, A. L. Harris, and E. T. Liu. Breast cancer classification and prognosis based
on gene expression profiles from a population-based study. Proc. Natl. Acad. Sci.,
100(18):10393–10398, 2003.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
52 / 76
Bibliography VI
Christos Sotiriou, Pratyaksha Wirapati, Sherene Loi, Adrian Harris, Steve Fox, Johanna
Smeds, Hans Nordgren, Pierre Farmer, Viviane Praz, Benjamin Haibe-Kains,
Christine Desmedt, Denis Larsimont, Fatima Cardoso, Hans Peterse, Dimitry Nuyten,
Marc Buyse, Marc J. Van de Vijver, Jonas Bergh, Martine Piccart, and Mauro
Delorenzi. Gene Expression Profiling in Breast Cancer: Understanding the Molecular
Basis of Histologic Grade To Improve Prognosis. J. Natl. Cancer Inst., 98(4):
262–272, 2006. doi: 10.1093/jnci/djj052. URL
http://jnci.oxfordjournals.org/cgi/content/abstract/jnci;98/4/262.
Andrew Teschendorff and Carlos Caldas. A robust classifier of high predictive value to
identify good prognosis patients in er-negative breast cancer. Breast Cancer Research,
10(4):R73, 2008. ISSN 1465-5411. doi: 10.1186/bcr2138. URL
http://breast-cancer-research.com/content/10/4/R73.
Andrew Teschendorff, Ahmad Miremadi, Sarah Pinder, Ian Ellis, and Carlos Caldas. An
immune response gene expression module identifies a good prognosis subtype in
estrogen receptor negative breast cancer. Genome Biology, 8(8):R157, 2007. ISSN
1465-6906. doi: 10.1186/gb-2007-8-8-r157. URL
http://genomebiology.com/2007/8/8/R157.
R. Tibshirani and G. Walther. Cluster validation by prediction strength. Journal of
Computational and Graphical Statistics, 14(3):511–528, 2005.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
53 / 76
Bibliography VII
M. J. van de Vijver, Y. D. He, L. van’t Veer, H. Dai, A. M. Hart, D. W. Voskuil, G. J.
Schreiber, J. L. Peterse, C. Roberts, M. J. Marton, M. Parrish, D. Atsma,
A. Witteveen, A. Glas, L. Delahaye, T. van der Velde, H. Bartelink, S. Rodenhuis,
E. T. Rutgers, S. H. Friend, and R. Bernards. A gene expression signature as a
predictor of survival in breast cancer. New England Journal of Medicine, 347(25):
1999–2009, 2002.
L. J. van’t Veer, H. Dai, M. J. van de Vijver, Y. D. He, A. A. Hart, M. Mao, H. L.
Peterse, K. van der Kooy, M. J. Marton, A. T. Witteveen, G. J. Schreiber, R. M.
Kerkhiven, C. Roberts, P. S. Linsley, R. Bernards, and S. H. Friend. Gene expression
profiling predicts clinical outcome of breast cancer. Nature, 415:530–536, 2002.
Y. Wang, J. G. Klijn, Y. Zhang, A. M. Sieuwerts, M. P. Look, F. Yang, D. Talantov,
M. Timmermans, M. E. Meijer van Gelder, J. Yu, T. Jatkoe, E. M. Berns, D. Atkins,
and J. A. Forekens. Gene-expression profiles to predict distant metastasis of
lymph-node-negative primary breast cancer. Lancet, 365:671–679, 2005.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
54 / 76
Bibliography VIII
Pratyaksha Wirapati, Christos Sotiriou, Susanne Kunkel, Pierre Farmer, Sylvain
Pradervand, Benjamin Haibe-Kains, Christine Desmedt, Michail Ignatiadis, Thierry
Sengstag, Frederic Schutz, Darlene Goldstein, Martine Piccart, and Mauro Delorenzi.
Meta-analysis of gene expression profiles in breast cancer: toward a unified
understanding of breast cancer subtyping and prognosis signatures. Breast Cancer
Research, 10(4):R65, 2008. ISSN 1465-5411. doi: 10.1186/bcr2124. URL
http://breast-cancer-research.com/content/10/4/R65.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
55 / 76
Part VII
Appendix
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
56 / 76
Gene Expression Profiling Technologies
There exist several technologies to measure the expression of genes.
Low throughput technologies such as RT-PCR, allow for measuring
the expression of a few genes.
High throughput technologies, such as microarrays, allows for
measuring simultaneously the expression of thousands of genes (whole
genome).
Microarray principles will be illustrated through the Affymetrix
technology.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
57 / 76
Microarray
A microarray is composed of
I
I
I
I
DNA fragments (probes) fixed on a solid support.
Ordered position of probes.
Principle of hybridization to a specific probe of complementary
sequence.
Molecular labeling.
ß Simultaneous detection of thousands of sequences in parallel.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
58 / 76
Affymetrix GeneChip
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
59 / 76
Affymetrix GeneChip
Probes
A
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
60 / 76
Hybridization
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
61 / 76
Detection
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
62 / 76
Affymetrix Design
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
63 / 76
Affymetrix Equipment
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
64 / 76
Prognostic Gene Signatures
A Single Gene?
From the validation studies, we learned that GGI yields similar
(sometimes better) performance than other gene signatures
[Haibe-Kains et al., 2008b].
Since GGI is a very simple model from a statistical and a biological
(proliferation genes) points of view, we challenged the use of complex
statistical methods for BC prognostication.
We compared simple to complex statistical methods to a single
proliferation gene (AURKA) [Haibe-Kains et al., 2008a].
ß Due to the complexity of microarray data, it is very hard to build
prognostic models statistically better than AURKA.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
65 / 76
Prognostic Gene Signatures
A Single Gene? (cont.)
A single gene for breast cancer prognostication?
2
Forestplot of the concordance index for each method in the training
set and the three validation sets:
CONCORDANCE INDEX FOR RISK SCORE PREDICTION
AURKA
BD.COMBUNIV.WILCOXON.HG
AURKA
AURKA
AURKA
AURKA
AURKA
BD.COMBUNIV.COX.SURV
BD.COMBUNIV.WILCOXON.HG BD.COMBUNIV.WILCOXON.HG
BD.COMBUNIV.WILCOXON.HG
BD.COMBUNIV.WILCOXON.HG BD.COMBUNIV.WILCOXON.HG
BD.MULTIV.LM.TOE
BD.COMBUNIV.COX.SURV
BD.COMBUNIV.COX.SURV
BD.COMBUNIV.COX.SURV
BD.COMBUNIV.COX.SURV
BD.COMBUNIV.COX.SURV
BD.MULTIV.COX.SURV
BD.MULTIV.LM.TOE
BD.MULTIV.LM.TOE
BD.MULTIV.LM.TOE
BD.MULTIV.LM.TOE
BD.MULTIV.LM.TOE
GW.RANK.COMBUNIV.WILCOXON.HG
BD.MULTIV.COX.SURV
BD.MULTIV.COX.SURV
BD.MULTIV.COX.SURV
BD.MULTIV.COX.SURV
BD.MULTIV.COX.SURV
GW.RANKCV.COMBUNIV.WILCOXON.HG
GW.RANK.COMBUNIV.WILCOXON.HG
GW.RANK.COMBUNIV.WILCOXON.HG
GW.RANK.COMBUNIV.WILCOXON.HG
GW.RANK.COMBUNIV.WILCOXON.HG
GW.RANK.COMBUNIV.WILCOXON.HG
GW.RANK.COMBUNIV.COX.SURV
GW.RANKCV.COMBUNIV.WILCOXON.HG
GW.RANKCV.COMBUNIV.WILCOXON.HG
GW.RANKCV.COMBUNIV.WILCOXON.HG
GW.RANKCV.COMBUNIV.WILCOXON.HG
GW.RANKCV.COMBUNIV.WILCOXON.HG
GW.RANKCV.COMBUNIV.COX.SURV
GW.RANK.COMBUNIV.COX.SURVGW.RANK.COMBUNIV.COX.SURV GW.RANK.COMBUNIV.COX.SURV
GW.RANK.COMBUNIV.COX.SURV GW.RANK.COMBUNIV.COX.SURV
GW.RANK.MULTIV.RCOX.SURV
GW.RANKCV.COMBUNIV.COX.SURV
GW.RANKCV.COMBUNIV.COX.SURV
GW.RANKCV.COMBUNIV.COX.SURV
GW.RANKCV.COMBUNIV.COX.SURV
GW.RANKCV.COMBUNIV.COX.SURV
GW.RANKCV.MULTIV.RCOX.SURV
GW.RANK.MULTIV.RCOX.SURV GW.RANK.MULTIV.RCOX.SURV GW.RANK.MULTIV.RCOX.SURV
GW.RANK.MULTIV.RCOX.SURV GW.RANK.MULTIV.RCOX.SURV
GW.PCA.COMBUNIV.WILCOXON.HG
GW.RANKCV.MULTIV.RCOX.SURV
GW.RANKCV.MULTIV.RCOX.SURV GW.RANKCV.MULTIV.RCOX.SURV
GW.RANKCV.MULTIV.RCOX.SURVGW.RANKCV.MULTIV.RCOX.SURV
GW.PCACV.COMBUNIV.WILCOXON.HG
GW.PCA.COMBUNIV.WILCOXON.HG
GW.PCA.COMBUNIV.WILCOXON.HG
GW.PCA.COMBUNIV.WILCOXON.HG
GW.PCA.COMBUNIV.WILCOXON.HG
GW.PCA.COMBUNIV.WILCOXON.HG
GW.PCA.COMBUNIV.COX.SURV
GW.PCACV.COMBUNIV.WILCOXON.HG
GW.PCACV.COMBUNIV.WILCOXON.HG
GW.PCACV.COMBUNIV.WILCOXON.HG
GW.PCACV.COMBUNIV.WILCOXON.HG
GW.PCACV.COMBUNIV.WILCOXON.HG
GW.PCACV.COMBUNIV.COX.SURV
GW.PCA.COMBUNIV.COX.SURV GW.PCA.COMBUNIV.COX.SURV GW.PCA.COMBUNIV.COX.SURV
GW.PCA.COMBUNIV.COX.SURV GW.PCA.COMBUNIV.COX.SURV
GW.PCA.MULTIV.RCOX.SURV
GW.PCACV.COMBUNIV.COX.SURV
GW.PCACV.COMBUNIV.COX.SURVGW.PCACV.COMBUNIV.COX.SURV
GW.PCACV.COMBUNIV.COX.SURV
GW.PCACV.COMBUNIV.COX.SURV
GW.PCACV.MULTIV.RCOX.SURV
GW.PCA.MULTIV.RCOX.SURV
GW.PCA.MULTIV.RCOX.SURV
GW.PCA.MULTIV.RCOX.SURV
GW.PCA.MULTIV.RCOX.SURV
GW.PCA.MULTIV.RCOX.SURV
GENE76
GW.PCACV.MULTIV.RCOX.SURV GW.PCACV.MULTIV.RCOX.SURV GW.PCACV.MULTIV.RCOX.SURV
GW.PCACV.MULTIV.RCOX.SURV GW.PCACV.MULTIV.RCOX.SURV
GGI
GENE76
GENE76
GENE76
GENE76
GENE76
AOL
GGI
GGI
GGI
GGI
GGI
NPI
AOL
AOL
NPI
NPI
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.5 0.55 0.6 0.65 0.7 0.75 0.8
0.50.550.60.650.70.750.80.850.90.95
concordance index
0.5 0.55 0.6 0.65 0.7 0.75 0.8
0.50.550.60.650.70.750.80.850.90.95
Training
VDX set
Validation
TBVDXset 1
concordance index
Validation
TAM set 2
0.5 0.55 0.6 0.65 0.7 0.75 0.8
concordance index
Validation
set 3
UPP
Supplementary Figure 1: Forest plot of the concordance indices for the risk scores predicted by all the methods in the training set (VDX) and
in the three validation sets (TBG, TAM, and UPP). AURKA and GGI models were not fitted on VDX which can be considered as a validation
set for these models.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
66 / 76
GENIUS
Identification of Subtypes
The first step of GENIUS method is the identification of subtypes in
the dataset.
In BC, we applied the clustering model developed previously (training
set: VDX).
The model returns the probabilities Pr(s) for a patient to belong to
each subtype s ∈ S.
I
S is composed of the ER-/HER2-, HER2+ and ER+/HER2- subtypes.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
67 / 76
GENIUS
Identification of Prognostic Genes
We used a ranking-based gene selection method.
The score (relevance) given to each gene is based on the significance
of the concordance index.
We introduced a weighted version of the concordance index in order
to select genes relevant for a specific subtype;
The weights were defined as the probability for a patient to belong to
the subtype of interest.
ß This feature selection allowed for using all the patients in the dataset.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
68 / 76
GENIUS
Weighted Concordance Index
Survival data for the ith patient:
I
I
ti stands for the event time
ci for the censoring time
C -index computes the probability that, for a pair of randomly chosen
comparable patients, the patient with the higher risk prediction will
experience an event before the lower risk patient.
P
i,j∈Ω 1{ri > rj }
C -index =
|Ω|
I
I
where ri and rj are the risk predictions of the patient i and j
Ω is the set of all the pairs of patients {i, j} such that:
F
F
ri 6= rj (no ties in r )
meet one of the following conditions: (i) both patients i and j
experienced an event and time ti < tj or (ii) only patient i experienced
an event and ti < cj .
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
69 / 76
GENIUS
Weighted Concordance Index (cont.)
We introduced a weighted version of the concordance
P
i,j∈Ω wij 1{ri > rj }
P
C -indexwted =
i,j∈Ω wij
I
where wij = wi wj is the weight for the pair of patients {i, j} ∈ Ω.
Significance of the C -index was computed by assuming asymptotic
normality [Pencina and D’Agostino, 2004].
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
70 / 76
GENIUS
Signature Stability
Once the genes were ranked, the only hyperparameter to tune was the
signature size k (number of selected genes in the signature).
We assessed the stability with respect to the signature size by
resampling the training set.
The stability criterion was inspired from [Davis et al., 2006]:
I
I
I
Let X be the set of features and freq(xj ) be the number of sampling
steps in which a feature xj ∈ X has been selected out of m sampling
steps.
The set X is sorted by frequency into the set x(1) , x(2) , . . . , x(n) where
freq(x(i) ) ≥ freq(x(j) ) if i < j where i, j ∈ {1, 2, . . . , n}.
A first measure of stability for a given signature size k is returned by
Pk
Stab(k) =
Benjamin Haibe-Kains (ULB)
Visit to IGC
i=1
freq(x(i) )
km
October 16, 2008
71 / 76
GENIUS
Signature Stability (cont.)
Since the Stab statistic can be made artificially high by simply
increasing k, we formulated an adjusted statistic
k
Stabadj (k) = max 0, Stab(k) − α
n
I
where α is a penalty factor depending on the number of selected
features (usually α = 1).
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
72 / 76
GENIUS
Signature Stability (cont.)
Signature stability
A
Signature stability
HER2+
HER2+
B
1.0
0.8
0.6
Stability
0.4
0.2
0.0
0.0
0.2
0.4
Stability
0.6
0.8
1.0
ER!/HER2!
ER-/HER2-
0
200
400
600
800
1000
0
signature size
200
400
600
800
1000
signature size
In the training set (VDX), the most stable signatures were composed
of 63 and 22 genes for the ER-/HER2- and HER2+ subtypes.
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
73 / 76
GENIUS
Risk Score Prediction
The risk score predictions for the subtype s is defined as
P
wi xi
R(s) = i∈Q
nQ
I
I
I
I
where Q is the set of of genes in the signature for subtype s
xi is the expression of gene i
wi ∈ {−1, +1} depending on the concordance index (> 0.5 or ≤ 0.5)
nQ is the signature size.
The global risk score is defined as
X
R=
Pr(s)R(s)
s∈S
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
74 / 76
Tools
Bioinformatics softwares
I
I
I
I
R is a widely used open source language and environment for statistical
computing and graphics
Bioconductor is an open source and open development software
project for the analysis and comprehension of genomic data
Java Treeview is an open source software for clustering visualization
BRB Array Tools is a software suite for microarray analysis working as
an Excel macro
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
75 / 76
Links
Personal webpage: http://www.ulb.ac.be/di/map/bhaibeka/
Machine Learning Group: http://www.ulb.ac.be/di/mlg
Functional Genomics Unit:
http://www.bordet.be/en/services/medical/array/practical.htm
Master in Bioinformatics at ULB and other belgian universities:
http://www.bioinfomaster.ulb.ac.be/
Benjamin Haibe-Kains (ULB)
Visit to IGC
October 16, 2008
76 / 76
Téléchargement