Gene Modules, Breast Cancer Subtypes and Prognosis Benjamin Haibe-Kains

publicité
Gene Modules, Breast Cancer Subtypes and Prognosis
Benjamin Haibe-Kains1,2
1 Functional
2
Genomics Unit, Institut Jules Bordet
Machine Learning Group, Université Libre de Bruxelles
November 19, 2008
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
1 / 80
Research Groups
Functional Genomics Unit (Christos Sotiriou)
9 researchers (1 Prof, 5 postDocs, 3 PhD students), 5 technicians.
Research topics : Genomic analyses, clinical studies and translational
research.
Website :
http://www.bordet.be/en/services/medical/array/practical.htm.
National scientific collaborations : ULB, Erasme, ULg, Gembloux,
IDDI.
International scientific collaborations : Genome Institute of Singapore,
John Radcliffe Hospital, Karolinska Institute and Hospital, MD
Anderson Cancer Center, Netherlands Cancer Institute, Swiss Institute
for Experimental Cancer Research, NCI/NIH, Gustave-Roussy
Institute.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
2 / 80
Research Groups
Machine Learning Group (Gianluca Bontempi)
10 researchers (2 Profs, 1 postDoc, 7 PhD students), 2 graduate
students).
Research topics : Bioinformatics, Classification, Regression, Time
series prediction, Sensor networks.
Website : http://www.ulb.ac.be/di/mlg.
Scientific collaborations in ULB : IRIDIA, Physiologie Molculaire de la
Cellule (IBMM), Conformation des Macromolcules Biologiques et
Bioinformatique (IBMM), CENOLI (Sciences), Functional Genomics
Unit (Institut Jules Bordet), Service d’Anesthesie (Erasme).
Scientific collaborations outside ULB : UCL Machine Learning Group
(B), Politecnico di Milano (I), Universitá del Sannio (I), George
Mason University (US).
The MLG is part to the ”Groupe de Contact FNRS” on Machine
Learning and to CINBIOS: http://babylone.ulb.ac.be/Joomla/.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
3 / 80
Summary
Introduction
Breast Cancer Subtypes
I
New Clustering Model
F
F
Gene Modules
Model-Based Clustering
Prognostic Gene Signatures
Subtypes and Prognosis
Conclusions
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
4 / 80
Part I
Introduction
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
5 / 80
Breast Cancer
Breast cancer is a global public health issue.
It is the most frequently diagnosed malignancy in women in the
western world and the commonest cause of cancer death for European
and American women.
In Europe, one out of eight to ten women, depending on the country,
will develop breast cancer during their lifetime.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
6 / 80
Breast Cancer Prognosis
a
Breast surgery
+ radiotherapy
Follow-up
5-10 years
Diagnosis
Recurrence
Remission
?
Prognosis
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
7 / 80
Current Clinical Tools for Prognosis
Clinical variables
Histological
grade
Nodal
status
Age
Tumor
size
Guidelines
NIH/St Gallen
ER
IHC
NPI
AOL
Prognosis
Need to improve current clinical tools to detect patients who really
need adjuvant systemic therapy.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
8 / 80
Potential of Genomic Technologies for Prognosis
In the nineties, new biotechnologies emerged:
I
I
Human genome sequencing.
Gene expression profiling (low to high-throughput).
Genomic data could be used to better understand cancer biology
. . . and to build efficient prognostic models.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
9 / 80
Biology Paradigm
Enhancers
Promoter
DNA
Transcription
Introns
pre-mRNA
Exon
Exon
Exon
Exon
Exon
Splicing
mRNA
5' UTR
Open reading
frame
3' UTR
Translation
Gene expression
profiling
protein
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
10 / 80
Gene Expression Profiling
Gene expression profiling using microarray chip:
Microarray chip
Hybridization
Detection
A
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
11 / 80
Microarray Data
Few samples (dozens to hundreds).
I
I
Microarray technology is expensive.
Frozen tumor samples are rare (biobank).
On the other hand, numerous gene expressions are measured.
I
The recent microarray chips cover the whole genome (≈ 50,000 probes
representing 30,000 ”known genes”).
ß High feature-to-sample ratio (curse of dimensionality).
Microarray is a complex technology.
ß High level of noise in the measurements.
Biology is complex.
ß Variables are highly correlated (gene co-expressions due to biological
pathways).
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
12 / 80
Part II
Breast Cancer Molecular Subtypes
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
13 / 80
Molecular Heterogeneity
Early microarray studies showed that breast cancer is a
molecularly heterogeneous disease [Perou et al., 2000,
Sorlie et al., 2001, Sorlie et al., 2003, Sotiriou et al., 2003].
I
Example: hierarchical clustering on microarray data
[Sorlie et al., 2001].
ß Identification of sets of co-expressed genes.
ß Identification of groups of similar tumors.
Fig. 1.
Benjamin Haibe-Kains (ULB)
Visit to NKI
Gene expression patterns o
clustering
using the 476
November 19,
2008
14 cDNA
/ 80intri
Gene Clusters
Several gene clusters were identified to be the main discriminators of
breast cancer molecular subtypes.
I
Example from [Sorlie et al., 2001]:
Gene cluster
Novel (unknown)
MEDICAL SCIENCES
ERBB2 amplicon
Basal epithelial
cell-enriched
Normal breast-like
Luminal epithelial
gene (ESR1)
Fig. 1. Gene expression patterns of 85 experimental samples representing 78 carcinomas, three benign tumors, and four normal tissues, analyzed by hierarchical
clustering using the 476 cDNA intrinsic clone set. (A) The tumor specimens were divided into five (or six) subtypes based on differences in gene expression. The
dendrogram showing the five (six) subtypes of tumors are colored as: luminal subtype A, dark blue; luminal subtype B, yellow; luminal subtype C, light
Benjamin Haibe-Kains (ULB)cluster
Visit
November
2008
blue; normal breast-like, green; basal-like, red; and
ERBB2!,to
pink.NKI
(B) The full cluster diagram scaled down (the complete 456-clone
cluster diagram 19,
is available
15 / 80
Tumor Clusters
Perou et al. and Sotiriou et al. identified at least three breast cancer
molecular subtypes:
I
I
I
Basal-like, mainly ER- and HER2- tumors.
ERBB2+ or HER2+ tumors.
Luminal-like, which could be further separated in low and high
proliferative tumors [Loi et al., 2007].
Perou et al.
Sotiriou et al.
SCIENCES
rogram of 99 breast cancer specimens analyzed by hierarchical clustering analysis using 706 probe elements selected for the high variability across
Materials and Methods). The tumors were separated into two main groups mainly associated with ER status as determined by the ligand-binding
Haibe-Kains (IHC).
(ULB)
Visit
to NKIsubgroups within the ER$ and ER#November
confirmedBenjamin
by immunohistochemistry
The dendrogram further branched
into smaller
classes based19,
on
2008
16 / 80
Clinical Outcome
ig. 3.
The molecular subtypes exhibited different clinical outcomes,
suggesting that the biological processes involved in patients’ survival
might be different.
I
Example from [Sorlie et al., 2001]:
Overall and relapse-free survival analysis of the 49 breast cancer patients, uniformly treate
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
17 / 80
Breast Cancer Subtypes
Early Results
These early studies showed similar results, i.e. ER and HER2
phenotypes are the main discriminators in breast cancer (confirmed by
[Kapp et al., 2006]).
However, this classification has strong limitations [Pusztai et al., 2006]:
I
I
I
Instability: the results are hardly reproducible due to the instability of
the hierarchical clustering method in combination with microarray data
(high feature-to-sample ratio).
Crispness: hierarchical clustering produces crisp partition of the dataset
(hard partitioning ) without estimation of the classification uncertainty.
Validation: the hierarchical clustering is hardly applicable to new data.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
18 / 80
New Clustering Model
Because of these limitations we sought to develop a robust method to
identify the breast cancer subtypes.
This method consists in:
1
2
A prototype-based clustering method to identify sets of co-expressed
genes (gene modules).
A model-based clustering in a low dimensional space to identify groups
of similar tumors (subtypes).
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
19 / 80
Gene Modules
Aim: identification of co-expressed genes related to a biological
process of interest.
Method:
1
2
Choice of the biological processes of interest.
Selection of a prototype for each biological process.
F
3
A prototype is a gene known to be related to the biological process of
interest (e.g. ESR1 for ER phenotype or AURKA for proliferation).
Identification of the genes specifically co-expressed with each prototype
to populate gene modules.
F
A gene j is specifically co-expressed with a prototype q if the
co-expression of gene j with prototype q is statistically higher than with
the other prototypes.
ß Computation of gene module scores by averaging the expressions of
the genes in the modules.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
20 / 80
Gene Modules
Example
Choice of three prototypes: P1, P2 and P3.
patient 2
patient 2
Gene j assigned to the gene module 1 (prototype P1):
P1
P1
P2
P2
gene j
gene j
patient 1
patient 1
Benjamin Haibe-Kains (ULB)
nt
tie
pa
pa
tie
nt
3
P3
3
P3
Visit to NKI
November 19, 2008
21 / 80
Gene Modules
Example (cont.)
patient 2
patient 2
Gene j not assigned to any gene module:
P1
P1
P2
P2
gene j
gene j
patient 1
patient 1
P3
pa
pa
tie
nt
tie
nt
3
3
P3
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
22 / 80
Gene Modules
Method
We tackled the problem from a prediction point of view.
I
Basic idea: If a gene j is statistically better predicted by prototype q
than by all the other prototypes, then gene j is specifically co-expressed
with prototype q and is assigned to gene module q.
For each gene j, we fit a set of linear models:
I
I
The univariate models using each prototype as explanatory variable and
the gene j as response variable.
The ”best” multivariate model.
We compute the leave-one-out cross-validation (LOOCV) errors of
these models.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
23 / 80
Gene Modules
Method: LOOCV Errors
P2
P3
MULTIV
LOOCV errors
P1
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
24 / 80
Gene Modules
Method: Statistical Model Selection
We statistically compare the models on the basis of their LOOCV
errors (Friedman test).
ß Identification of the set of best models.
I
I
The models present in the set are statistically better than the absent
models.
The models present in the set have similar LOOCV errors.
If there is only one univariate model (using prototype q) in the set of
best models, gene j is assigned to the gene module q.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
25 / 80
Gene Modules
Method: Statistical Model Selection (cont.)
Friedman test
P2
P3
MULTIV
LOOCV errors
P1
Set of best
models
Set of models
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
26 / 80
Gene Modules
Method: Meta-Analysis
If the sample size is small, many genes will not be assigned to any
gene module (not enough evidence for the statistical model selection
step).
We can increase the size of gene modules by integrating several
datasets (larger sample size).
However, merging datasets from different laboratories, cohorts of
patients and microarray platforms is a difficult task.
ß Meta-analysis framework (each dataset is analyzed separately and
results are combined).
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
27 / 80
Gene Modules
Method: Meta-Analysis (cont.)
Compute the LOOCV errors for the univariate and multivariate
models in each dataset separately.
Check the homogeneity of the standardized coefficients of the
univariate models over datasets (heterogeneity test).
I
I
The relation between gene j and the prototypes should be similar in
each dataset.
If the coefficients are heterogeneous, discard gene j from the analysis
(conservative way).
Perform a ”meta” Friedman test:
I
I
Combine the p-values returned by the pairwise tests applied in each
dataset separately.
Consider these meta p-values in the traditional version of Friedman
test.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
28 / 80
Gene Modules
Method (dis)Advantages
Advantages:
I
I
I
Robust to overfitting (linear models + LOOCV).
Control for other biological processes, i.e. prevent the use of highly
correlated prototypes (gene modules are then small).
Integration of several datasets using different microarray platforms.
F
F
I
I
Insensitive to ”batch” effect (meta-analysis framework).
Check for heterogeneity between datasets.
The gene module scores (signed average) are easily computable
whatever the microarray technology (except for very small platforms).
Very conservative (control of false positive rate).
Disadvantages:
I
I
I
Limited to linear relation between gene and prototypes.
Computationally intensive (statistical model selection step).
Very conservative (many false negatives).
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
29 / 80
Gene Modules
Results
In [Desmedt et al., 2008],
We selected 7 prototypes to be representative of key biological
processes involved in breast cancer:
I
I
I
I
I
I
I
ESR1 gene for ER phenotype.
ERBB2 gene for HER2 phenotype.
AURKA gene for proliferation
STAT1 gene for immune response.
VEGF gene for angiogenesis.
PLAU gene for tumor invasion.
CASP3 gene for apoptosis.
We used 2 large breast cancer microarray datasets:
Wang et al. series: 286 node-negative patients on Affymetrix platform
(22283 probes).
I van de Vijver et al. series: 295 patients on Agilent microarray platform
(24496 probes).
ß ≈ 10,000 probes in common (mapping through EntrezGene IDs).
I
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
30 / 80
Gene Modules
Results (cont.)
We found gene modules of various size:
I
I
ESR1 is the largest gene module as expected.
AURKA is the second one, highlighting the importance of proliferation.
Gene module
ESR1
AURKA
STAT1
PLAU
ERBB2
VEGF
CASP3
Size
468
228
94
67
27
13
8
Gene ontology analysis confirmed the coherence of the gene modules
with respect to the prototypes or biological processes of interest.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
31 / 80
Model-Based Clustering
We know from early microarray studies that breast cancer is a
molecularly heterogeneous disease.
ER and HER2 phenotypes seem to be the main (only?) discriminators.
However the first classification models, based on hierarchical
clustering, are hardly reproducible/applicable to new data.
ß We introduced a simple model-based clustering (mixture of
Gaussians) in a two-dimensional space defined by the ESR1 and
ERBB2 module scores.
I
I
We used the Bayesian Information criterion (BIC) to select the most
likely number of subtypes.
We validated our model (fitted on Wang’s series, VDX) on 14
independent datasets by estimating the prediction strength
[Tibshirani and Walther, 2005].
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
32 / 80
Model-Based Clustering
Training
VDX
BIC
2
1
ER /HER2
HER2+
ER+/HER2
0.6
BIC
0
ERBB2
1
0.8
1
0.4
2
0.2
0
2
1
0
1
2
2
ESR1
Benjamin Haibe-Kains (ULB)
4
6
8
10
number of clusters
Visit to NKI
November 19, 2008
33 / 80
Model-Based Clustering
Validation
2
ER−/HER2−
HER2+
ER+/HER2−
0
●
●
●
●
● ● ● ●● ●
●
●
●
●
●
●
●
● ● ● ●●●● ● ●
● ●
● ●
●
● ●●
●●
●
● ●
● ● ●
●
● ●● ●● ●●
●●
●●● ●
● ●●●●●
● ●●
●
●
●
●
●● ●
●
● ●● ●●●
●●●● ●
●
●●
●
● ●●
●
●●●● ●
●●
●
●
●●
●
●●
●● ●
●●●
●●●
●
●● ●●●●●
●
●●
●
●
●●●●
●
●
●●
● ●●
● ●
● ● ●●
●●
●●
●
●
●
●
●●● ●
● ●
●
●
●● ●
● ● ●
● ●
●
●
●
●
● ● ● ● ●●
●
●
●
●●
−1
−1
0
ERBB2
0
●
●
●● ●
●
●●
●
●
● ●
●
●
● ●
●●● ●
● ●●●●● ●●
●● ● ●
●●●● ●
●
● ●●
●●
● ● ● ●●
●
● ●
●●
● ●●●
●
●
●
●
●
●
●●●
●● ● ● ●
●●
●
●●
●●●
●
●
● ●
●
●
●●
●
●●
●●
● ●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
ERBB2
●
●
●
● ● ●●●
●
●●
●
● ●
●
●
●●
●● ●
●
●
●●● ●●
● ●
● ●● ● ●●
●
●●● ●
●●
●●
●●●
●
●
●●
●
●
●● ●
●
●
●
●●
●
●●
●●●
●
●
●
●
●
●● ● ●
●●
●
● ●● ●●
●●
●●
●●
●
●
●●●●
●
●●
●●● ●
●
●●
●●● ●
●●●
●
●
●
●●
●
● ●
●●● ●●
●●
●●●●
● ●
●●
●
●
●
●
●●●
●
● ●● ● ●
●
●
●
●
●● ●●
●
●
●
●
●●●
●●
●●
●
● ●
● ●●
●
●●
●
●
●●
●●
●
●
●
●●●
●●●
●●
●●
●
●
●
●
●
●
●
● ●
●
● ●
●
● ●
●
●●
●●
● ●●
● ● ●
●
●
●
●
−1
ER−/HER2−
HER2+
ER+/HER2−
1
●
1
●
1
●
ERBB2
UPP
2
TBG
2
NKI
ER−/HER2−
HER2+
ER+/HER2−
−2
−1
0
1
2
−2
−2
−2
●
−2
−1
ESR1
0
1
2
−2
1
2
2
UNC2
2
2
ER−/HER2−
HER2+
ER+/HER2−
ER−/HER2−
HER2+
ER+/HER2−
ER−/HER2−
HER2+
ER+/HER2−
1
●
1
●
1
0
ESR1
MAINZ
UNT
●
−1
ESR1
−1
0
ESR1
Benjamin Haibe-Kains (ULB)
1
2
0
●
−2
●
−2
−2
●
●
●
● ● ●●● ●●
●
●
●
●
●
●● ● ●
● ●●
● ●
●
●
●
●
●● ●● ●●
●
● ●
●
●
●
●●
● ●●●
● ●●
●●
●● ●● ● ●
●
●
●● ●
●● ● ● ● ●●
●
● ●
● ● ●●
● ● ●●
●
●●
●
● ●
● ● ●●●
●
● ●●
●
● ● ● ●
●
●
●
−1
●
ERBB2
●
●
●
●●
●
●
●●
●●
●●
●●
●●●●●
●●
●
●
●
●
● ●● ● ●●
●
●
● ●●● ● ●
●
● ●●
●● ●
●
● ●●
● ●
●
●●
●
●
●
●
●
●
●
●
● ●●
●
● ●
●
● ●●
● ●●●
● ●
●
●●
●● ●
●●
●● ●
●●
● ●●● ●● ●●●
● ●●●●
●
●●
● ●
●
●
●●●● ● ● ●
●
●
● ● ●● ●●
●● ● ●●● ●
● ●
●●●
●
●
●
0
●
●
●●●
●
●
●
●
● ● ●● ●
● ●●
●
●
●● ●
●●
●
● ● ●●
●● ●
● ● ● ●●
● ●●
●
●● ● ●
●●
● ●●
●
● ●● ● ● ●
●●●
●
●
● ● ●●
●
●
● ● ●●
● ● ●
●
●
●
●●
●
−2
−1
●
●
ERBB2
●
●
−1
0
ERBB2
●
●
●
−2
−1
0
ESR1
Visit to NKI
1
2
−2
−1
0
1
2
ESR1
November 19, 2008
34 / 80
Model-Based Clustering
Validation: Prediction Strength
Reference
[van de Vijver
[Desmedt
[Miller
[Sotiriou
[Schmidt
[Sorlie
[Sotiriou
[Minn
[Pawitan
[Bild
[Hoadley
[Chin
[Bonnefoi
[Naderi
et
et
et
et
et
et
et
et
et
et
et
et
et
et
al.,
al.,
al.,
al.,
al.,
al.,
al.,
al.,
al.,
al.,
al.,
al.,
al.,
al.,
Benjamin Haibe-Kains (ULB)
2002]
2007]
2005]
2006]
2008]
2003]
2003]
2005]
2005]
2006]
2007]
2006]
2007]
2007]
Dataset
NKI
TBG
UPP
UNT
MAINZ
STNO2
NCI
MSK
STK
DUKE
UNC2
CAL
DUKE2
NCH
ER-/HER21.00
1.00
1.00
1.00
1.00
1.00
0.85
1.00
1.00
1.00
1.00
1.00
1.00
1.00
Visit to NKI
HER2+
1.00
1.00
0.93
0.89
1.00
0.69
0.83
1.00
0.91
0.82
0.87
1.00
0.64
0.82
ER+/HER20.99
0.83
0.87
0.92
0.90
0.97
0.93
0.96
0.87
0.92
0.96
0.95
0.95
0.98
November 19, 2008
35 / 80
Model-Based Clustering
Validation: Number of Clusters
1
0.8
BIC average
0.6
0.4
0.2
0
−0.2
2
4
6
8
10
number of clusters
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
36 / 80
Breast Cancer Subtypes
Clinical Outcome
ER+/HER2-: 60-70%
0.6
ER!/HER2!
HER2+
ER+/HER2!
0.0
of the global population of
breast cancer patients.
0.4
HER2+: 15-20%
0.2
ER-/HER2-: 20-25%
Probability of survival
0.8
1.0
Node-negative untreated patients
NKI/TBG/UPP/UNT/MAINZ
0
No. At Risk
ER!/HER2! 119
HER2+ 106
ER+/HER2! 516
Benjamin Haibe-Kains (ULB)
Visit to NKI
1
111
98
507
2
91
91
487
3
83
81
462
4
5
6
Time (years)
78
73
435
71
69
410
68
64
363
7
64
58
319
8
53
52
282
9
46
47
257
November 19, 2008
10
37
44
223
37 / 80
Breast Cancer Subtypes
New Clustering Model (dis)Advantages
Advantages:
I
Simple model-based clustering:
F
F
I
Easily applicable to new data.
Returning for each patient the probability to belong to each subtype
(soft partitioning ).
Low dimensional space:
F
F
F
Stability/robustness of the clustering model.
Low computational cost to fit the model.
Simple visualization of the results.
Disadvantage:
I
Low dimensional space: which dimension could we add in order to find
another robust subtype?
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
38 / 80
Part III
Prognostic Gene Signatures
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
39 / 80
Prognostic Gene Signatures
Use of microarray technology to improve current prognostic models
(NIH/St Gallen guidelines, NPI, AOL).
A typical microarray analysis dealing with breast cancer
prognostication involves 5 key steps:
1
2
3
4
5
Data preprocessing: quality controls and normalization.
Filtering: discard the genes exhibiting low expressions and/or low
variance.
Identification of a list of prognostic genes (called a gene signature).
Building of a prognostic model, i.e. combination of the genes from the
signature in order to predict the clinical outcome of the patients.
Validation of the model performance and comparison with current
prognostic models.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
40 / 80
Prognostic Gene Signatures
Fishing Expedition
Prognostic models derived from gene expression data by looking for
genes associated with clinical outcome without any a priori biological
assumption [van’t Veer et al., 2002, Wang et al., 2005].
GENE70 signature
GENE76 signature
1.0
Overall Survival
0.8
0.6
0.4
Poor signature
0.2
0.0
2
4
60
91
57
72
54
55
0.4
0.2
P<0.001
0
0.6
6
8
10
12
31
26
22
17
12
9
0.0
Dist ant -met ast asis-f ree survival in validat ion set
1·0
1·0
Good (n=59)
0·8
0·6
Poor signature
Poor (n=112)
0·4
P<0.001
0·2
0
Years
Hazard ratio=5·67 (95% CI 2·59–12·4)
0
2 0 4
61 8 210 123
4
Years
Years
0·8
0·6
0·4
0·2
Log-rank p0·0001
5
6
7
55
55
53
48
66
60
55
52
Patients at risk
N O. AT R ISK
Good signature
Poor signature
Good signature
Probability of overall survival
Good signature
0.8
Probability of metastasis-free survival
Probability of Remaining
Metastasis-free
1.0
N O. AT R ISK
45
41
Good signature
Good signature
60 59 59 58
Poor signature
Poor signature
91 8611266
van't Veer et al.
van de Vijver
58
48
35 5624
103
50
33 9021
1255
1075
Wang et al.
Promising results but some criticisms from a statistical point of view.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
41 / 80
Prognostic Gene Signatures
Hypothesis-driven
Prognostic models were also derived from gene expression data based
on a biological assumption.
I
I
Example: GGI [Sotiriou et al., 2006] was designed to discriminate
patients with low and high histological grade (proliferation).
GGI was able to discriminate patients with intermediate histological
grade (HG2).
Histological grade
Benjamin Haibe-Kains (ULB)
GGI (HG2)
Visit to NKI
GGI
November 19, 2008
42 / 80
Prognostic Gene Signatures
Independent Validation
These preliminary resulting were promising but validation was
required.
A first validation was published by the authors of the GENE70 and
GENE76 signatures in [van de Vijver et al., 2002] and [Foekens et al., 2006]
respectively.
Our group was involved in a second validation:
I
I
I
Complete independence: the authors of the signatures were not aware
of the clinical data of the patients in the dataset.
The statistical analyses were performed by an independent group.
Aim: validate definitively the prognostic power of these two models in
order to start a large clinical trial called MINDACT (Microarray In
Node negative Disease may Avoid ChemoTherapy).
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
43 / 80
Prognostic Gene Signatures
Independent Validation (cont.)
Although the performance in this validation series was less impressive
than in the original publications, GENE70 and GENE76 sufficiently
improved the current clinical models to go ahead with MINDACT.
Independent Validation of the 76-Gene Signature
GENE76
0.8
0.6
0.4
52
59
28
163
0.2
Probability of event
Patients Events Risk group
7
11
6
52
Gene signature low risk, clinical low risk
Gene signature low risk, clinical high risk
Gene signature high risk, clinical low risk
Gene signature high risk, clinical high risk
0.0
,K( "#*( H.*+"#*( &$=.M
=,0&#2,( *+,( <,K.&%(
2(*+&%(+&')(*+&*(")(*+,(
$23($,2;,0*.G,'NI3(H,(
."2(&KQ#2*,K()"$(0'.%M
*H&$,(H.*+(&$=.*$&$N(
<,(;".%*25(A2(2+"H%(
,(J$,&*,2*(;$"J%"2*.0(
"I( &*( .K,%*.)N.%J( ;&M
(C+,2,(&%&'N2,2(2#JM
<&N(,W;'&.%(2"<,(")(
=,*H,,%( *+,( "$.J.%&'(
A
0
2
4
52
59
28
163
50
58
26
143
47
55
25
126
6
8
10
12
14
41
36
28
17
Year
44
Fig.
curves
The
511. Kaplan-Meier
51
44by genomic
36risk group.26
HRs
stratified by 10
center. A, TDM.
21and log-rank
16 tests are 12
3
B,115
overall survival.
101
91
74
46
Number at risk
B
1.0
"))(0+"2,%3()"$(="*+(
G.G&'5( L"H,G,$3( )"$(
2(&0+.,G,K("%'N(="$M
1.0
GENE70
bability of event
0.8
ß Validation of GENE70 [Buyse et al., 2006] and
GENE76 [Desmedt et al., 2007].
0.4
0.6
"H,K(*+&*(*+,(;$,K.0M
&2*(&2(J""K(&2(*+&*(")(
5,53(&('"H(;$"=&=.'.*N(
C+,(2,%2.*.G.*.,2()"$(
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
44 / 80
Prognostic Gene Signatures
Independent Validation (cont.)
We sought to compare the GGI to the GENE70 and GENE76
signatures in this validation series
!"#$%&'()*+,
!"##$%!! &' ( )
*++,&--.../012345647+89:/623-;)<;=";>)-(-'()
ß!"#$%&'()*+,!"##$%!!&'()
GGI yields very similar performance [Haibe-Kains
et al., 2008a].
GENE70
GENE76
5
0
7
9
15
25
103
32
0
5
GGI
25
15
9
7
and the GGI were higher than the HR of the 76-gene signature, the differences were not statistically significant
(see Table 2). Figure 3 illustrates the Kaplan-Meier estimates of DMFS for the four groups of patients (two groups
with concordant results in risk assessment and two with
discordant results) for the different signatures two by two.
GENE70analyses (Table 3), we can conclude
From the multivariate
GENE76
that the three
signatures added significant information to
the traditional
GGIparameters and were the strongest predictive variables of DMFS, as reflected by their lowest p-valAOL
ues compared to the other variables. The additional
information of these signatures over the clinical risk was
also confirmed by the fact that the univariate HRs for the
0.5similar
0.6when0.7
0.8for the
0.9
three signatures remained
adjusted
clinical risk, with a HR of 7.25 (95% CI: 2.4–21.5; p = 3.5
concordance
index
-4
– 10 ), 2.8 (95% CI: 1.2–6.8; p = 0.018) and 6.25 (95%
CI: 2.3–17; p = 3.3 – 10-4) for the 70-gene signature, 76gene signature and the GGI respectively.
Figure
ple
Venn
according
diagram
1
to
illustrating
the prognostic
the classification
signatures of the tumor samVenn diagram illustrating the classification of the
tumor sample according to the prognostic signaLastly, we combined the three gene signatures in order to
tures. Dark red = high-risk patients and blue = low-risk
the potential improvement in BC prognostication.
patients. GENE70
= 70-gene
signature, GENE76 = 76-gene Visitassess
Benjamin
Haibe-Kains
(ULB)
to NKI
November 19, 2008
1
the
Forest
anceAdjuvant!
Figure
indices,
plots
2 (and
and
Online
B/the
95%classification
CI)
log2
forhazard
the
45 / 80
Forest plots (and 95% CI)
fo
Prognostic Gene Signatures
A Single Gene?
From the validation studies, we learned that GGI yields similar
(sometimes better) performance than other gene signatures
[Haibe-Kains et al., 2008a].
Since GGI is a very simple model from a statistical and a biological
(proliferation-related genes) points of view, we challenged the use of
complex statistical methods for breast cancer prognostication.
We compared simple to complex statistical methods to a single
proliferation gene (AURKA) [Haibe-Kains et al., 2008b].
ß Due to the complexity of microarray data, it is very hard to build
prognostic models statistically better than AURKA.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
46 / 80
Prognostic Gene Signatures
A Single Gene? (cont.)
A single gene for breast cancer prognostication?
2
Forestplot of the concordance index for each method in the training
set and the three validation sets:
CONCORDANCE INDEX FOR RISK SCORE PREDICTION
AURKA
BD.COMBUNIV.WILCOXON.HG
AURKA
AURKA
AURKA
AURKA
AURKA
BD.COMBUNIV.COX.SURV
BD.COMBUNIV.WILCOXON.HG BD.COMBUNIV.WILCOXON.HG
BD.COMBUNIV.WILCOXON.HG
BD.COMBUNIV.WILCOXON.HG BD.COMBUNIV.WILCOXON.HG
BD.MULTIV.LM.TOE
BD.COMBUNIV.COX.SURV
BD.COMBUNIV.COX.SURV
BD.COMBUNIV.COX.SURV
BD.COMBUNIV.COX.SURV
BD.COMBUNIV.COX.SURV
BD.MULTIV.COX.SURV
BD.MULTIV.LM.TOE
BD.MULTIV.LM.TOE
BD.MULTIV.LM.TOE
BD.MULTIV.LM.TOE
BD.MULTIV.LM.TOE
GW.RANK.COMBUNIV.WILCOXON.HG
BD.MULTIV.COX.SURV
BD.MULTIV.COX.SURV
BD.MULTIV.COX.SURV
BD.MULTIV.COX.SURV
BD.MULTIV.COX.SURV
GW.RANKCV.COMBUNIV.WILCOXON.HG
GW.RANK.COMBUNIV.WILCOXON.HG
GW.RANK.COMBUNIV.WILCOXON.HG
GW.RANK.COMBUNIV.WILCOXON.HG
GW.RANK.COMBUNIV.WILCOXON.HG
GW.RANK.COMBUNIV.WILCOXON.HG
GW.RANK.COMBUNIV.COX.SURV
GW.RANKCV.COMBUNIV.WILCOXON.HG
GW.RANKCV.COMBUNIV.WILCOXON.HG
GW.RANKCV.COMBUNIV.WILCOXON.HG
GW.RANKCV.COMBUNIV.WILCOXON.HG
GW.RANKCV.COMBUNIV.WILCOXON.HG
GW.RANKCV.COMBUNIV.COX.SURV
GW.RANK.COMBUNIV.COX.SURVGW.RANK.COMBUNIV.COX.SURV GW.RANK.COMBUNIV.COX.SURV
GW.RANK.COMBUNIV.COX.SURV GW.RANK.COMBUNIV.COX.SURV
GW.RANK.MULTIV.RCOX.SURV
GW.RANKCV.COMBUNIV.COX.SURV
GW.RANKCV.COMBUNIV.COX.SURV
GW.RANKCV.COMBUNIV.COX.SURV
GW.RANKCV.COMBUNIV.COX.SURV
GW.RANKCV.COMBUNIV.COX.SURV
GW.RANKCV.MULTIV.RCOX.SURV
GW.RANK.MULTIV.RCOX.SURV GW.RANK.MULTIV.RCOX.SURV GW.RANK.MULTIV.RCOX.SURV
GW.RANK.MULTIV.RCOX.SURV GW.RANK.MULTIV.RCOX.SURV
GW.PCA.COMBUNIV.WILCOXON.HG
GW.RANKCV.MULTIV.RCOX.SURV
GW.RANKCV.MULTIV.RCOX.SURV GW.RANKCV.MULTIV.RCOX.SURV
GW.RANKCV.MULTIV.RCOX.SURVGW.RANKCV.MULTIV.RCOX.SURV
GW.PCACV.COMBUNIV.WILCOXON.HG
GW.PCA.COMBUNIV.WILCOXON.HG
GW.PCA.COMBUNIV.WILCOXON.HG
GW.PCA.COMBUNIV.WILCOXON.HG
GW.PCA.COMBUNIV.WILCOXON.HG
GW.PCA.COMBUNIV.WILCOXON.HG
GW.PCA.COMBUNIV.COX.SURV
GW.PCACV.COMBUNIV.WILCOXON.HG
GW.PCACV.COMBUNIV.WILCOXON.HG
GW.PCACV.COMBUNIV.WILCOXON.HG
GW.PCACV.COMBUNIV.WILCOXON.HG
GW.PCACV.COMBUNIV.WILCOXON.HG
GW.PCACV.COMBUNIV.COX.SURV
GW.PCA.COMBUNIV.COX.SURV GW.PCA.COMBUNIV.COX.SURV GW.PCA.COMBUNIV.COX.SURV
GW.PCA.COMBUNIV.COX.SURV GW.PCA.COMBUNIV.COX.SURV
GW.PCA.MULTIV.RCOX.SURV
GW.PCACV.COMBUNIV.COX.SURV
GW.PCACV.COMBUNIV.COX.SURVGW.PCACV.COMBUNIV.COX.SURV
GW.PCACV.COMBUNIV.COX.SURV
GW.PCACV.COMBUNIV.COX.SURV
GW.PCACV.MULTIV.RCOX.SURV
GW.PCA.MULTIV.RCOX.SURV
GW.PCA.MULTIV.RCOX.SURV
GW.PCA.MULTIV.RCOX.SURV
GW.PCA.MULTIV.RCOX.SURV
GW.PCA.MULTIV.RCOX.SURV
GENE76
GW.PCACV.MULTIV.RCOX.SURV GW.PCACV.MULTIV.RCOX.SURV GW.PCACV.MULTIV.RCOX.SURV
GW.PCACV.MULTIV.RCOX.SURV GW.PCACV.MULTIV.RCOX.SURV
GGI
GENE76
GENE76
GENE76
GENE76
GENE76
AOL
GGI
GGI
GGI
GGI
GGI
NPI
AOL
AOL
NPI
NPI
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.5 0.55 0.6 0.65 0.7 0.75 0.8
0.50.550.60.650.70.750.80.850.90.95
concordance index
0.5 0.55 0.6 0.65 0.7 0.75 0.8
0.50.550.60.650.70.750.80.850.90.95
Training
VDX set
Validation
TBVDXset 1
concordance index
Validation
TAM set 2
0.5 0.55 0.6 0.65 0.7 0.75 0.8
concordance index
Validation
set 3
UPP
Supplementary Figure 1: Forest plot of the concordance indices for the risk scores predicted by all the methods in the training set (VDX) and
in the three validation sets (TBG, TAM, and UPP). AURKA and GGI models were not fitted on VDX which can be considered as a validation
set for these models.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
47 / 80
Part IV
Subtypes and Prognosis
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
48 / 80
Prognosis in Specific Subtypes
The first publications attempted to build a prognostic model from the
global population of breast cancer patients.
In 2005, Wang et al. were the first to divide the global population
based on ER status:
I
I
I
I
As breast cancer biology is very different according to the ER status
(IHC), prognostic models might be different too.
They built a prognostic model for each subgroup of patients (ER+ and
ER-).
To make a prediction, they used one of the two models depending on
the ER-status of the tumor.
Unfortunately the group of ER- tumors was too small and their
corresponding model was not generalizable.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
49 / 80
Prognosis in Specific Subtypes
(cont.)
Recently, Teschendorff et al. built a new prognostic model for ERtumors [Teschendorff et al., 2007] and validated it using large datasets
[Teschendorff and Caldas, 2008].
I
The signature is composed of 7 immune-related genes.
We showed in two meta-analyses
[Wirapati et al., 2008, Desmedt et al., 2008]
I
F
I
I
that:
Proliferation (AURKA) was the most prognostic factor in ER+/HER2tumors and the common driving force of the early gene signatures.
Actually, these signatures (e.g. GENE70, GENE76, GGI) are prognostic
in ER+/HER2- tumors only.
Immune response (STAT1) is prognostic in ER-/HER2- and HER2+
tumors.
Tumor invasion (PLAU or uPA) is prognostic in HER2+ tumors.
Finak et al. introduced a stroma-derived prognostic predictor (SDPP)
particularly efficient in HER2+ tumors [Finak et al., 2008].
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
50 / 80
New Prognostic Model
Since current prognostic models/gene signatures are limited to some
subtypes, we plan to develop a new prognostic model integrating the
breast cancer subtypes identification in order to:
I
I
Build a prognostic gene signatures specifically targeting each subtype.
Build a global prognostic model able to predict the risk of the patients
whatever the tumor subtype (ER-/HER2-, HER2+ or ER+/HER2-).
We plan to assess and to compare the performance of this new model
to current prognostic models using the thorough statistical framework
developed in [Haibe-Kains et al., 2008b].
This new prognostic model is called GENIUS, standing for
Gene Expression progNostic Index Using Subtypes
Benjamin Haibe-Kains (ULB)
Visit to NKI
,
November 19, 2008
51 / 80
Part V
Conclusion
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
52 / 80
Conclusion
Numerous studies confirmed the great potential of gene expression
profiling using microarrays to better understand cancer biology and to
improve current prediction models.
This technology becomes more and more mature (MAQC
[MAQC Consortium, 2006]) and is now ready for clinical applications.
The promising results of early publications were validated in different
independent studies.
Recent meta-analyses successfully recapitulated the main discoveries
made these late decades and refined our knowledge on breast cancer
biology.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
53 / 80
Conclusion (cont.)
We benefit from this strong basis to go a step further to improve
breast cancer prognosis using microarrays.
I
Prognostic models/gene signatures in specific subtypes
I
[Teschendorff et al., 2007, Desmedt et al., 2008, Finak et al., 2008].
Development of GENIUS, a prognostic model integrating breast cancer
molecular subtypes identification [manuscript in preparation].
A major issue remains: ”How to combine these microarray prognostic
models with clinical variables?”
I
I
Several studies showed the additional information of tumor size, nodal
status, . . .
However, we currently lack of data to fit robust prognostic models
combining microarray and clinical variables.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
54 / 80
Thank you for your attention.
This presentation is available from http:
//www.ulb.ac.be/di/map/bhaibeka/papers/haibekains2008gene.pdf.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
55 / 80
Part VI
Bibliography
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
56 / 80
Bibliography I
Bild, A. H., Yao, G., Chang, J. T., Wang, Q., Potti, A., Chasse, D., Joshi, M.-B.,
Harpole, D., Lancaster, J. M., Berschuk, A., Olson Jr, J. A., Marks, J. R.,
Dressman, H. K., West, M., and Nevins, J. R. (2006).
Oncogenic pathway signatures in human cancers as a guide to targeted therapies.
Nature, 439:353–356.
Bonnefoi, H., Potti, A., Delorenzi, M., Mauriac, L., Campone, M., Tubiana-Hulin,
M., Petit, T., Rouanet, P., Jassem, J., Blot, E., Becette, V., Farmer, P., André, S.,
Acharya, C. R., Mukherjee, S., Cameron, D., Bergh, J., Nevins, J. R., and Iggo, R.
(2007).
Validation of gene signatures that predict the response of breast cancer to
neoadjuvant chemotherapy: a substudy of the eortc 10994/big 00-01 clinical trial.
Lancet Oncology, 8(12):1071–1078.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
57 / 80
Bibliography II
Buyse, M., Loi, S., van’t Veer, L., Viale, G., Delorenzi, M., Glas, A. M.,
Saghatchian d’Assignies, M., Bergh, J., Lidereau, R., Ellis, P., Harris, A., Bogaerts,
J., Therasse, P., Floore, A., Amakrane, M., Piette, F., Rutgers, E., Sotiriou, C.,
Cardoso, F., and Piccart, M. J. (2006).
Validation and Clinical Utility of a 70-Gene Prognostic Signature for Women With
Node-Negative Breast Cancer.
J. Natl. Cancer Inst., 98(17):1183–1192.
Chin, K., Devries, S., Fridlyand, J., Spellman, P., Roydasgupta, R., Kuo, W. L.,
Lapuk, A., Neve, R., Qian, Z., Ryder, T., Chen, F., Feiler, H., Tokuyasu, T.,
Kingsley, C., Dairkee, S., Meng, Z., Chew, K., Pinkel, D., Jain, A., Ljung, B.,
Esserman, L., Albertson, D., Waldman, F., and Gray, J. (2006).
Genomic and transcriptional aberrations linked to breast cancer pathophysiologies.
Cancer cell, 10:529–41.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
58 / 80
Bibliography III
Desmedt, C., Haibe-Kains, B., Wirapati, P., Buyse, M., Larsimont, D., Bontempi,
G., Delorenzi, M., Piccart, M., and Sotiriou, C. (2008).
Biological Processes Associated with Breast Cancer Clinical Outcome Depend on
the Molecular Subtypes.
Clin Cancer Res, 14(16):5158–5165.
Desmedt, C., Piette, F., Loi, S., Wang, Y., Lallemand, F., Haibe-Kains, B., Viale,
G., Delorenzi, M., Zhang, Y., d’Assignies, M. S., Bergh, J., Lidereau, R., Ellis, P.,
Harris, A. L., Klijn, J. G., Foekens, J. A., Cardoso, F., Piccart, M. J., Buyse, M.,
and Sotiriou, C. (2007).
Strong Time Dependence of the 76-Gene Prognostic Signature for Node-Negative
Breast Cancer Patients in the TRANSBIG Multicenter Independent Validation
Series.
Clin Cancer Res, 13(11):3207–3214.
Finak, G., Bertos, N., Pepin, F., Sadekova, S., Souleimanova, M., Zhao, H., Chen,
H., Omeroglu, G., Meterissian, S., Omeroglu, A., Hallett, M., and Park, M. (2008).
Stromal gene expression predicts clinical outcome in breast cancer.
Nat Med, 14(5):518–527.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
59 / 80
Bibliography IV
Foekens, J. A., Atkins, D., Zhang, Y., Sweep, F. C., Harbeck, N., Paradiso, A.,
Cufer, T., Sieuwerts, A. M., Talantov, D., Span, P. N., Tjan-Heijnen, V. C., Zito,
A. F., Specht, K., Hioefler, H., Golouh, R., Schittulli, F., Schmitt, M., Beex, L. V.,
Klijn, J. G., and Wang, Y. (2006).
Multicenter validation of a gene expression–based prognostic signature in lymph
node–negative primary breast cancer.
Journal of Clinical Oncology, 24(11).
Haibe-Kains, B., Desmedt, C., Piette, F., Buyse, M., Cardoso, F., van’t Veer, L.,
Piccart, M., Bontempi, G., and Sotiriou, C. (2008a).
Comparison of prognostic gene expression signatures for breast cancer.
BMC Genomics, 9(1):394.
Haibe-Kains, B., Desmedt, C., Sotiriou, C., and Bontempi, G. (2008b).
A comparative study of survival models for breast cancer prognostication based on
microarray data: does a single gene beat them all?
Bioinformatics, 24(19):2200–2208.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
60 / 80
Bibliography V
Hoadley, K., Weigman, V., Fan, C., Sawyer, L., He, X., Troester, M., Sartor, C.,
Rieger-House, T., Bernard, P., Carey, L., and Perou, C. (2007).
Egfr associated expression profiles vary with breast tumor subtype.
BMC Genomics, 8(1):258.
Kapp, A., Jeffrey, S., Langerod, A., Borresen-Dale, A.-L., Han, W., Noh, D.-Y.,
Bukholm, I., Nicolau, M., Brown, P., and Tibshirani, R. (2006).
Discovery and validation of breast cancer subtypes.
BMC Genomics, 7(1):231.
Loi, S., Haibe-Kains, B., Desmedt, C., Lallemand, F., Tutt, A. M., Gillet, C., Ellis,
P., Harris, A., Bergh, J., Foekens, J. A., Klijn, J. G., Larsimont, D., Buyse, M.,
Bontempi, G., Delorenzi, M., Piccart, M. J., and Sotiriou, C. (2007).
Definition of Clinically Distinct Molecular Subtypes in Estrogen Receptor-Positive
Breast Carcinomas Through Genomic Grade.
J Clin Oncol, 25(10):1239–1246.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
61 / 80
Bibliography VI
MAQC Consortium (2006).
The microarray quality control (maqc) project shows inter- and intraplatform
reproducibility of gene expression measurements.
Nat Biotech, 24(9):1151–1161.
Miller, L. D., Smeds, J., George, J., Vega, V. B., Vergara, L., Pioner, A., Pawitan,
Y., Hall, P., Klaar, S., Liu, E. T., and Bergh, J. (2005).
An expression signature for p53 status in human breast cancer predicts mutation
status, transcriptional effects, and patient survival.
PNAS, 102(38):13550–13555.
Minn, A. J., Gupta, G. P., Siegel, P. M., Bos, P. D., Shu, W., Giri, D. D., Viale, A.,
Olshen, A. B., Gerald, W. L., and Massague, J. (2005).
Genes that mediate breast cancer metastasis to lung.
Nature, 436(7050):518–524.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
62 / 80
Bibliography VII
Naderi, A., Teschendorff, A. E., Barbosa-Morais, N. L., Pinder, S. E., Green, A. R.,
andJ . F. Robertson, D. G. P., Aparicio, S., Ellis, I. O., Brenton, J. D., and Caldas,
C. (2007).
A gene-expression signature to predict survival in breast cancer across independent
data sets.
Oncogene, 26:1507–1516.
Pawitan, Y., Bjohle, J., Amler, L., Borg, A., Egyhazi, S., Hall, P., Han, X.,
Holmberg, L., Huang, F., Klaar, S., Liu, E. T., Miller, L., Nordgren, H., Ploner, A.,
Sandelin, K., Shaw, P. M., Smeds, J., Skoog, L., Wedren, S., and Bergh, J. (2005).
Gene expression profiling spares early breast cancer patients from adjuvant therapy:
derived and validated in two population-based cohorts.
Breast Cancer Research, 7(6):953–964.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
63 / 80
Bibliography VIII
Perou, C. M., Sorlie, T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., Rees, C. A.,
Pollack, J. R., Ross, D. T., Johnsen, H., Akslen, L. A., Fluge, O.,
Pergamenschikov, A., Williams, C., Zhu, S. X., Lonning, P. E., Borresen-Dale,
A.-L., Brown, P. O., and Botstein, D. (2000).
Molecular portraits of human breast tumours.
Nature, 406(6797):747–752.
Pusztai, L., Mazouni, C., Anderson, K., Wu, Y., and Symmans, W. F. (2006).
Molecular Classification of Breast Cancer: Limitations and Potential.
Oncologist, 11(8):868–877.
Schmidt, M., Bohm, D., von Torne, C., Steiner, E., Puhl, A., Pilch, H., Lehr,
H.-A., Hengstler, J. G., Kolbl, H., and Gehrmann, M. (2008).
The Humoral Immune System Has a Key Prognostic Impact in Node-Negative
Breast Cancer.
Cancer Res, 68(13):5405–5413.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
64 / 80
Bibliography IX
Sorlie, T., Perou, C. M., Tibshirani, R., Aas, T., Geisher, S., Johnsen, H., Hastie,
T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., Thorsen, T., Quist, H., Matese.,
J. C., Brown, P. O., Botstein, D., Eystein, P. L., and Borresen-Dale, A. L. (2001).
Gene expression patterns breast carcinomas distinguish tumor subclasses with
clinical implications.
Proc. Matl. Acad. Sci. USA, 98(19):10869–10874.
Sorlie, T., Tibshirani, R., Parker, J., Hastie, T., Marron, J. S., Nobel, A., Deng, S.,
Johnsen, H., Pesich, R., Geister, S., Demeter, J., Perou, C., Lonning, P. E., Brown,
P. O., Borresen-Dale, A. L., and Botstein, D. (2003).
Repeated observation of breast tumor subtypes in indepedent gene expression data
sets.
Proc Natl Acad Sci USA, 1(14):8418–8423.
Sotiriou, C., Neo, S. Y., McShane, L. M., Korn, E. L., Long, P. M., Jazaeri, A.,
Martiat, P., Fox, S., Harris, A. L., and Liu, E. T. (2003).
Breast cancer classification and prognosis based on gene expression profiles from a
population-based study.
Proc. Natl. Acad. Sci., 100(18):10393–10398.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
65 / 80
Bibliography X
Sotiriou, C., Wirapati, P., Loi, S., Harris, A., Fox, S., Smeds, J., Nordgren, H.,
Farmer, P., Praz, V., Haibe-Kains, B., Desmedt, C., Larsimont, D., Cardoso, F.,
Peterse, H., Nuyten, D., Buyse, M., Van de Vijver, M. J., Bergh, J., Piccart, M.,
and Delorenzi, M. (2006).
Gene Expression Profiling in Breast Cancer: Understanding the Molecular Basis of
Histologic Grade To Improve Prognosis.
J. Natl. Cancer Inst., 98(4):262–272.
Teschendorff, A. and Caldas, C. (2008).
A robust classifier of high predictive value to identify good prognosis patients in
er-negative breast cancer.
Breast Cancer Research, 10(4):R73.
Teschendorff, A., Miremadi, A., Pinder, S., Ellis, I., and Caldas, C. (2007).
An immune response gene expression module identifies a good prognosis subtype in
estrogen receptor negative breast cancer.
Genome Biology, 8(8):R157.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
66 / 80
Bibliography XI
Tibshirani, R. and Walther, G. (2005).
Cluster validation by prediction strength.
Journal of Computational and Graphical Statistics, 14(3):511–528.
van de Vijver, M. J., He, Y. D., van’t Veer, L., Dai, H., Hart, A. M., Voskuil,
D. W., Schreiber, G. J., Peterse, J. L., Roberts, C., Marton, M. J., Parrish, M.,
Atsma, D., Witteveen, A., Glas, A., Delahaye, L., van der Velde, T., Bartelink, H.,
Rodenhuis, S., Rutgers, E. T., Friend, S. H., and Bernards, R. (2002).
A gene expression signature as a predictor of survival in breast cancer.
New England Journal of Medicine, 347(25):1999–2009.
van’t Veer, L. J., Dai, H., van de Vijver, M. J., He, Y. D., Hart, A. A., Mao, M.,
Peterse, H. L., van der Kooy, K., Marton, M. J., Witteveen, A. T., Schreiber, G. J.,
Kerkhiven, R. M., Roberts, C., Linsley, P. S., Bernards, R., and Friend, S. H.
(2002).
Gene expression profiling predicts clinical outcome of breast cancer.
Nature, 415:530–536.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
67 / 80
Bibliography XII
Wang, Y., Klijn, J. G., Zhang, Y., Sieuwerts, A. M., Look, M. P., Yang, F.,
Talantov, D., Timmermans, M., van Gelder, M. E. M., Yu, J., Jatkoe, T., Berns,
E. M., Atkins, D., and Forekens, J. A. (2005).
Gene-expression profiles to predict distant metastasis of lymph-node-negative
primary breast cancer.
Lancet, 365:671–679.
Wirapati, P., Sotiriou, C., Kunkel, S., Farmer, P., Pradervand, S., Haibe-Kains, B.,
Desmedt, C., Ignatiadis, M., Sengstag, T., Schutz, F., Goldstein, D., Piccart, M.,
and Delorenzi, M. (2008).
Meta-analysis of gene expression profiles in breast cancer: toward a unified
understanding of breast cancer subtyping and prognosis signatures.
Breast Cancer Research, 10(4):R65.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
68 / 80
Part VII
Appendix
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
69 / 80
Gene Expression Profiling Technologies
There exist several technologies to measure the expression of genes.
Low throughput technologies such as RT-PCR, allow for measuring
the expression of a few genes.
High throughput technologies, such as microarrays, allows for
measuring simultaneously the expression of thousands of genes (whole
genome).
Microarray principles will be illustrated through the Affymetrix
technology.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
70 / 80
Microarray
A microarray is composed of
I
I
I
I
DNA fragments (probes) fixed on a solid support.
Ordered position of probes.
Principle of hybridization to a specific probe of complementary
sequence.
Molecular labeling.
ß Simultaneous detection of thousands of sequences in parallel.
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
71 / 80
Affymetrix GeneChip
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
72 / 80
Affymetrix GeneChip
Probes
A
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
73 / 80
Hybridization
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
74 / 80
Detection
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
75 / 80
Affymetrix Design
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
76 / 80
Affymetrix Equipment
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
77 / 80
Mixture of Gaussians?
1.5
1
0.5
NKI
0
0.8
0.6
0.4
0.2
VDX
0
1.5
1
Density
1
0.5
0
module: ERBB2
module: ERBB2
Density
module: AURKA
module: AURKA
1
0.8
0.6
0.4
0.2
0
1.5
1
0.5
module: ESR1
module: ESR1
1
0.8
0.6
0.4
0.2
0
0
!2
!1
0
1
2
!2
score
Benjamin Haibe-Kains (ULB)
!1
0
1
2
score
Visit to NKI
November 19, 2008
78 / 80
Tools
Bioinformatics softwares
I
I
I
I
R is a widely used open source language and environment for statistical
computing and graphics
Bioconductor is an open source and open development software
project for the analysis and comprehension of genomic data
Java Treeview is an open source software for clustering visualization
BRB Array Tools is a software suite for microarray analysis working as
an Excel macro
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
79 / 80
Links
Personal webpage: http://www.ulb.ac.be/di/map/bhaibeka/
Machine Learning Group: http://www.ulb.ac.be/di/mlg
Functional Genomics Unit:
http://www.bordet.be/en/services/medical/array/practical.htm
Master in Bioinformatics at ULB and other belgian universities:
http://www.bioinfomaster.ulb.ac.be/
Benjamin Haibe-Kains (ULB)
Visit to NKI
November 19, 2008
80 / 80
Téléchargement