Gene Modules, Breast Cancer Subtypes and Prognosis Benjamin Haibe-Kains1,2 1 Functional 2 Genomics Unit, Institut Jules Bordet Machine Learning Group, Université Libre de Bruxelles November 19, 2008 Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 1 / 80 Research Groups Functional Genomics Unit (Christos Sotiriou) 9 researchers (1 Prof, 5 postDocs, 3 PhD students), 5 technicians. Research topics : Genomic analyses, clinical studies and translational research. Website : http://www.bordet.be/en/services/medical/array/practical.htm. National scientific collaborations : ULB, Erasme, ULg, Gembloux, IDDI. International scientific collaborations : Genome Institute of Singapore, John Radcliffe Hospital, Karolinska Institute and Hospital, MD Anderson Cancer Center, Netherlands Cancer Institute, Swiss Institute for Experimental Cancer Research, NCI/NIH, Gustave-Roussy Institute. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 2 / 80 Research Groups Machine Learning Group (Gianluca Bontempi) 10 researchers (2 Profs, 1 postDoc, 7 PhD students), 2 graduate students). Research topics : Bioinformatics, Classification, Regression, Time series prediction, Sensor networks. Website : http://www.ulb.ac.be/di/mlg. Scientific collaborations in ULB : IRIDIA, Physiologie Molculaire de la Cellule (IBMM), Conformation des Macromolcules Biologiques et Bioinformatique (IBMM), CENOLI (Sciences), Functional Genomics Unit (Institut Jules Bordet), Service d’Anesthesie (Erasme). Scientific collaborations outside ULB : UCL Machine Learning Group (B), Politecnico di Milano (I), Universitá del Sannio (I), George Mason University (US). The MLG is part to the ”Groupe de Contact FNRS” on Machine Learning and to CINBIOS: http://babylone.ulb.ac.be/Joomla/. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 3 / 80 Summary Introduction Breast Cancer Subtypes I New Clustering Model F F Gene Modules Model-Based Clustering Prognostic Gene Signatures Subtypes and Prognosis Conclusions Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 4 / 80 Part I Introduction Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 5 / 80 Breast Cancer Breast cancer is a global public health issue. It is the most frequently diagnosed malignancy in women in the western world and the commonest cause of cancer death for European and American women. In Europe, one out of eight to ten women, depending on the country, will develop breast cancer during their lifetime. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 6 / 80 Breast Cancer Prognosis a Breast surgery + radiotherapy Follow-up 5-10 years Diagnosis Recurrence Remission ? Prognosis Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 7 / 80 Current Clinical Tools for Prognosis Clinical variables Histological grade Nodal status Age Tumor size Guidelines NIH/St Gallen ER IHC NPI AOL Prognosis Need to improve current clinical tools to detect patients who really need adjuvant systemic therapy. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 8 / 80 Potential of Genomic Technologies for Prognosis In the nineties, new biotechnologies emerged: I I Human genome sequencing. Gene expression profiling (low to high-throughput). Genomic data could be used to better understand cancer biology . . . and to build efficient prognostic models. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 9 / 80 Biology Paradigm Enhancers Promoter DNA Transcription Introns pre-mRNA Exon Exon Exon Exon Exon Splicing mRNA 5' UTR Open reading frame 3' UTR Translation Gene expression profiling protein Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 10 / 80 Gene Expression Profiling Gene expression profiling using microarray chip: Microarray chip Hybridization Detection A Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 11 / 80 Microarray Data Few samples (dozens to hundreds). I I Microarray technology is expensive. Frozen tumor samples are rare (biobank). On the other hand, numerous gene expressions are measured. I The recent microarray chips cover the whole genome (≈ 50,000 probes representing 30,000 ”known genes”). ß High feature-to-sample ratio (curse of dimensionality). Microarray is a complex technology. ß High level of noise in the measurements. Biology is complex. ß Variables are highly correlated (gene co-expressions due to biological pathways). Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 12 / 80 Part II Breast Cancer Molecular Subtypes Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 13 / 80 Molecular Heterogeneity Early microarray studies showed that breast cancer is a molecularly heterogeneous disease [Perou et al., 2000, Sorlie et al., 2001, Sorlie et al., 2003, Sotiriou et al., 2003]. I Example: hierarchical clustering on microarray data [Sorlie et al., 2001]. ß Identification of sets of co-expressed genes. ß Identification of groups of similar tumors. Fig. 1. Benjamin Haibe-Kains (ULB) Visit to NKI Gene expression patterns o clustering using the 476 November 19, 2008 14 cDNA / 80intri Gene Clusters Several gene clusters were identified to be the main discriminators of breast cancer molecular subtypes. I Example from [Sorlie et al., 2001]: Gene cluster Novel (unknown) MEDICAL SCIENCES ERBB2 amplicon Basal epithelial cell-enriched Normal breast-like Luminal epithelial gene (ESR1) Fig. 1. Gene expression patterns of 85 experimental samples representing 78 carcinomas, three benign tumors, and four normal tissues, analyzed by hierarchical clustering using the 476 cDNA intrinsic clone set. (A) The tumor specimens were divided into five (or six) subtypes based on differences in gene expression. The dendrogram showing the five (six) subtypes of tumors are colored as: luminal subtype A, dark blue; luminal subtype B, yellow; luminal subtype C, light Benjamin Haibe-Kains (ULB)cluster Visit November 2008 blue; normal breast-like, green; basal-like, red; and ERBB2!,to pink.NKI (B) The full cluster diagram scaled down (the complete 456-clone cluster diagram 19, is available 15 / 80 Tumor Clusters Perou et al. and Sotiriou et al. identified at least three breast cancer molecular subtypes: I I I Basal-like, mainly ER- and HER2- tumors. ERBB2+ or HER2+ tumors. Luminal-like, which could be further separated in low and high proliferative tumors [Loi et al., 2007]. Perou et al. Sotiriou et al. SCIENCES rogram of 99 breast cancer specimens analyzed by hierarchical clustering analysis using 706 probe elements selected for the high variability across Materials and Methods). The tumors were separated into two main groups mainly associated with ER status as determined by the ligand-binding Haibe-Kains (IHC). (ULB) Visit to NKIsubgroups within the ER$ and ER#November confirmedBenjamin by immunohistochemistry The dendrogram further branched into smaller classes based19, on 2008 16 / 80 Clinical Outcome ig. 3. The molecular subtypes exhibited different clinical outcomes, suggesting that the biological processes involved in patients’ survival might be different. I Example from [Sorlie et al., 2001]: Overall and relapse-free survival analysis of the 49 breast cancer patients, uniformly treate Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 17 / 80 Breast Cancer Subtypes Early Results These early studies showed similar results, i.e. ER and HER2 phenotypes are the main discriminators in breast cancer (confirmed by [Kapp et al., 2006]). However, this classification has strong limitations [Pusztai et al., 2006]: I I I Instability: the results are hardly reproducible due to the instability of the hierarchical clustering method in combination with microarray data (high feature-to-sample ratio). Crispness: hierarchical clustering produces crisp partition of the dataset (hard partitioning ) without estimation of the classification uncertainty. Validation: the hierarchical clustering is hardly applicable to new data. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 18 / 80 New Clustering Model Because of these limitations we sought to develop a robust method to identify the breast cancer subtypes. This method consists in: 1 2 A prototype-based clustering method to identify sets of co-expressed genes (gene modules). A model-based clustering in a low dimensional space to identify groups of similar tumors (subtypes). Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 19 / 80 Gene Modules Aim: identification of co-expressed genes related to a biological process of interest. Method: 1 2 Choice of the biological processes of interest. Selection of a prototype for each biological process. F 3 A prototype is a gene known to be related to the biological process of interest (e.g. ESR1 for ER phenotype or AURKA for proliferation). Identification of the genes specifically co-expressed with each prototype to populate gene modules. F A gene j is specifically co-expressed with a prototype q if the co-expression of gene j with prototype q is statistically higher than with the other prototypes. ß Computation of gene module scores by averaging the expressions of the genes in the modules. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 20 / 80 Gene Modules Example Choice of three prototypes: P1, P2 and P3. patient 2 patient 2 Gene j assigned to the gene module 1 (prototype P1): P1 P1 P2 P2 gene j gene j patient 1 patient 1 Benjamin Haibe-Kains (ULB) nt tie pa pa tie nt 3 P3 3 P3 Visit to NKI November 19, 2008 21 / 80 Gene Modules Example (cont.) patient 2 patient 2 Gene j not assigned to any gene module: P1 P1 P2 P2 gene j gene j patient 1 patient 1 P3 pa pa tie nt tie nt 3 3 P3 Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 22 / 80 Gene Modules Method We tackled the problem from a prediction point of view. I Basic idea: If a gene j is statistically better predicted by prototype q than by all the other prototypes, then gene j is specifically co-expressed with prototype q and is assigned to gene module q. For each gene j, we fit a set of linear models: I I The univariate models using each prototype as explanatory variable and the gene j as response variable. The ”best” multivariate model. We compute the leave-one-out cross-validation (LOOCV) errors of these models. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 23 / 80 Gene Modules Method: LOOCV Errors P2 P3 MULTIV LOOCV errors P1 Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 24 / 80 Gene Modules Method: Statistical Model Selection We statistically compare the models on the basis of their LOOCV errors (Friedman test). ß Identification of the set of best models. I I The models present in the set are statistically better than the absent models. The models present in the set have similar LOOCV errors. If there is only one univariate model (using prototype q) in the set of best models, gene j is assigned to the gene module q. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 25 / 80 Gene Modules Method: Statistical Model Selection (cont.) Friedman test P2 P3 MULTIV LOOCV errors P1 Set of best models Set of models Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 26 / 80 Gene Modules Method: Meta-Analysis If the sample size is small, many genes will not be assigned to any gene module (not enough evidence for the statistical model selection step). We can increase the size of gene modules by integrating several datasets (larger sample size). However, merging datasets from different laboratories, cohorts of patients and microarray platforms is a difficult task. ß Meta-analysis framework (each dataset is analyzed separately and results are combined). Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 27 / 80 Gene Modules Method: Meta-Analysis (cont.) Compute the LOOCV errors for the univariate and multivariate models in each dataset separately. Check the homogeneity of the standardized coefficients of the univariate models over datasets (heterogeneity test). I I The relation between gene j and the prototypes should be similar in each dataset. If the coefficients are heterogeneous, discard gene j from the analysis (conservative way). Perform a ”meta” Friedman test: I I Combine the p-values returned by the pairwise tests applied in each dataset separately. Consider these meta p-values in the traditional version of Friedman test. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 28 / 80 Gene Modules Method (dis)Advantages Advantages: I I I Robust to overfitting (linear models + LOOCV). Control for other biological processes, i.e. prevent the use of highly correlated prototypes (gene modules are then small). Integration of several datasets using different microarray platforms. F F I I Insensitive to ”batch” effect (meta-analysis framework). Check for heterogeneity between datasets. The gene module scores (signed average) are easily computable whatever the microarray technology (except for very small platforms). Very conservative (control of false positive rate). Disadvantages: I I I Limited to linear relation between gene and prototypes. Computationally intensive (statistical model selection step). Very conservative (many false negatives). Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 29 / 80 Gene Modules Results In [Desmedt et al., 2008], We selected 7 prototypes to be representative of key biological processes involved in breast cancer: I I I I I I I ESR1 gene for ER phenotype. ERBB2 gene for HER2 phenotype. AURKA gene for proliferation STAT1 gene for immune response. VEGF gene for angiogenesis. PLAU gene for tumor invasion. CASP3 gene for apoptosis. We used 2 large breast cancer microarray datasets: Wang et al. series: 286 node-negative patients on Affymetrix platform (22283 probes). I van de Vijver et al. series: 295 patients on Agilent microarray platform (24496 probes). ß ≈ 10,000 probes in common (mapping through EntrezGene IDs). I Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 30 / 80 Gene Modules Results (cont.) We found gene modules of various size: I I ESR1 is the largest gene module as expected. AURKA is the second one, highlighting the importance of proliferation. Gene module ESR1 AURKA STAT1 PLAU ERBB2 VEGF CASP3 Size 468 228 94 67 27 13 8 Gene ontology analysis confirmed the coherence of the gene modules with respect to the prototypes or biological processes of interest. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 31 / 80 Model-Based Clustering We know from early microarray studies that breast cancer is a molecularly heterogeneous disease. ER and HER2 phenotypes seem to be the main (only?) discriminators. However the first classification models, based on hierarchical clustering, are hardly reproducible/applicable to new data. ß We introduced a simple model-based clustering (mixture of Gaussians) in a two-dimensional space defined by the ESR1 and ERBB2 module scores. I I We used the Bayesian Information criterion (BIC) to select the most likely number of subtypes. We validated our model (fitted on Wang’s series, VDX) on 14 independent datasets by estimating the prediction strength [Tibshirani and Walther, 2005]. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 32 / 80 Model-Based Clustering Training VDX BIC 2 1 ER /HER2 HER2+ ER+/HER2 0.6 BIC 0 ERBB2 1 0.8 1 0.4 2 0.2 0 2 1 0 1 2 2 ESR1 Benjamin Haibe-Kains (ULB) 4 6 8 10 number of clusters Visit to NKI November 19, 2008 33 / 80 Model-Based Clustering Validation 2 ER−/HER2− HER2+ ER+/HER2− 0 ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ●●● ● ● ●●●●● ● ●● ● ● ● ● ●● ● ● ● ●● ●●● ●●●● ● ● ●● ● ● ●● ● ●●●● ● ●● ● ● ●● ● ●● ●● ● ●●● ●●● ● ●● ●●●●● ● ●● ● ● ●●●● ● ● ●● ● ●● ● ● ● ● ●● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● −1 −1 0 ERBB2 0 ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ●●●●● ●● ●● ● ● ●●●● ● ● ● ●● ●● ● ● ● ●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●●● ●● ● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ERBB2 ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●● ●● ● ● ● ●●● ●● ● ● ● ●● ● ●● ● ●●● ● ●● ●● ●●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ●●● ● ● ● ● ● ●● ● ● ●● ● ● ●● ●● ●● ●● ●● ● ● ●●●● ● ●● ●●● ● ● ●● ●●● ● ●●● ● ● ● ●● ● ● ● ●●● ●● ●● ●●●● ● ● ●● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ●● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ● ● ●●● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● −1 ER−/HER2− HER2+ ER+/HER2− 1 ● 1 ● 1 ● ERBB2 UPP 2 TBG 2 NKI ER−/HER2− HER2+ ER+/HER2− −2 −1 0 1 2 −2 −2 −2 ● −2 −1 ESR1 0 1 2 −2 1 2 2 UNC2 2 2 ER−/HER2− HER2+ ER+/HER2− ER−/HER2− HER2+ ER+/HER2− ER−/HER2− HER2+ ER+/HER2− 1 ● 1 ● 1 0 ESR1 MAINZ UNT ● −1 ESR1 −1 0 ESR1 Benjamin Haibe-Kains (ULB) 1 2 0 ● −2 ● −2 −2 ● ● ● ● ● ●●● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ●● ● ●●● ● ●● ●● ●● ●● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● −1 ● ERBB2 ● ● ● ●● ● ● ●● ●● ●● ●● ●●●●● ●● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ●● ●● ● ●● ●● ● ●● ● ●●● ●● ●●● ● ●●●● ● ●● ● ● ● ● ●●●● ● ● ● ● ● ● ● ●● ●● ●● ● ●●● ● ● ● ●●● ● ● ● 0 ● ● ●●● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● −2 −1 ● ● ERBB2 ● ● −1 0 ERBB2 ● ● ● −2 −1 0 ESR1 Visit to NKI 1 2 −2 −1 0 1 2 ESR1 November 19, 2008 34 / 80 Model-Based Clustering Validation: Prediction Strength Reference [van de Vijver [Desmedt [Miller [Sotiriou [Schmidt [Sorlie [Sotiriou [Minn [Pawitan [Bild [Hoadley [Chin [Bonnefoi [Naderi et et et et et et et et et et et et et et al., al., al., al., al., al., al., al., al., al., al., al., al., al., Benjamin Haibe-Kains (ULB) 2002] 2007] 2005] 2006] 2008] 2003] 2003] 2005] 2005] 2006] 2007] 2006] 2007] 2007] Dataset NKI TBG UPP UNT MAINZ STNO2 NCI MSK STK DUKE UNC2 CAL DUKE2 NCH ER-/HER21.00 1.00 1.00 1.00 1.00 1.00 0.85 1.00 1.00 1.00 1.00 1.00 1.00 1.00 Visit to NKI HER2+ 1.00 1.00 0.93 0.89 1.00 0.69 0.83 1.00 0.91 0.82 0.87 1.00 0.64 0.82 ER+/HER20.99 0.83 0.87 0.92 0.90 0.97 0.93 0.96 0.87 0.92 0.96 0.95 0.95 0.98 November 19, 2008 35 / 80 Model-Based Clustering Validation: Number of Clusters 1 0.8 BIC average 0.6 0.4 0.2 0 −0.2 2 4 6 8 10 number of clusters Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 36 / 80 Breast Cancer Subtypes Clinical Outcome ER+/HER2-: 60-70% 0.6 ER!/HER2! HER2+ ER+/HER2! 0.0 of the global population of breast cancer patients. 0.4 HER2+: 15-20% 0.2 ER-/HER2-: 20-25% Probability of survival 0.8 1.0 Node-negative untreated patients NKI/TBG/UPP/UNT/MAINZ 0 No. At Risk ER!/HER2! 119 HER2+ 106 ER+/HER2! 516 Benjamin Haibe-Kains (ULB) Visit to NKI 1 111 98 507 2 91 91 487 3 83 81 462 4 5 6 Time (years) 78 73 435 71 69 410 68 64 363 7 64 58 319 8 53 52 282 9 46 47 257 November 19, 2008 10 37 44 223 37 / 80 Breast Cancer Subtypes New Clustering Model (dis)Advantages Advantages: I Simple model-based clustering: F F I Easily applicable to new data. Returning for each patient the probability to belong to each subtype (soft partitioning ). Low dimensional space: F F F Stability/robustness of the clustering model. Low computational cost to fit the model. Simple visualization of the results. Disadvantage: I Low dimensional space: which dimension could we add in order to find another robust subtype? Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 38 / 80 Part III Prognostic Gene Signatures Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 39 / 80 Prognostic Gene Signatures Use of microarray technology to improve current prognostic models (NIH/St Gallen guidelines, NPI, AOL). A typical microarray analysis dealing with breast cancer prognostication involves 5 key steps: 1 2 3 4 5 Data preprocessing: quality controls and normalization. Filtering: discard the genes exhibiting low expressions and/or low variance. Identification of a list of prognostic genes (called a gene signature). Building of a prognostic model, i.e. combination of the genes from the signature in order to predict the clinical outcome of the patients. Validation of the model performance and comparison with current prognostic models. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 40 / 80 Prognostic Gene Signatures Fishing Expedition Prognostic models derived from gene expression data by looking for genes associated with clinical outcome without any a priori biological assumption [van’t Veer et al., 2002, Wang et al., 2005]. GENE70 signature GENE76 signature 1.0 Overall Survival 0.8 0.6 0.4 Poor signature 0.2 0.0 2 4 60 91 57 72 54 55 0.4 0.2 P<0.001 0 0.6 6 8 10 12 31 26 22 17 12 9 0.0 Dist ant -met ast asis-f ree survival in validat ion set 1·0 1·0 Good (n=59) 0·8 0·6 Poor signature Poor (n=112) 0·4 P<0.001 0·2 0 Years Hazard ratio=5·67 (95% CI 2·59–12·4) 0 2 0 4 61 8 210 123 4 Years Years 0·8 0·6 0·4 0·2 Log-rank p0·0001 5 6 7 55 55 53 48 66 60 55 52 Patients at risk N O. AT R ISK Good signature Poor signature Good signature Probability of overall survival Good signature 0.8 Probability of metastasis-free survival Probability of Remaining Metastasis-free 1.0 N O. AT R ISK 45 41 Good signature Good signature 60 59 59 58 Poor signature Poor signature 91 8611266 van't Veer et al. van de Vijver 58 48 35 5624 103 50 33 9021 1255 1075 Wang et al. Promising results but some criticisms from a statistical point of view. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 41 / 80 Prognostic Gene Signatures Hypothesis-driven Prognostic models were also derived from gene expression data based on a biological assumption. I I Example: GGI [Sotiriou et al., 2006] was designed to discriminate patients with low and high histological grade (proliferation). GGI was able to discriminate patients with intermediate histological grade (HG2). Histological grade Benjamin Haibe-Kains (ULB) GGI (HG2) Visit to NKI GGI November 19, 2008 42 / 80 Prognostic Gene Signatures Independent Validation These preliminary resulting were promising but validation was required. A first validation was published by the authors of the GENE70 and GENE76 signatures in [van de Vijver et al., 2002] and [Foekens et al., 2006] respectively. Our group was involved in a second validation: I I I Complete independence: the authors of the signatures were not aware of the clinical data of the patients in the dataset. The statistical analyses were performed by an independent group. Aim: validate definitively the prognostic power of these two models in order to start a large clinical trial called MINDACT (Microarray In Node negative Disease may Avoid ChemoTherapy). Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 43 / 80 Prognostic Gene Signatures Independent Validation (cont.) Although the performance in this validation series was less impressive than in the original publications, GENE70 and GENE76 sufficiently improved the current clinical models to go ahead with MINDACT. Independent Validation of the 76-Gene Signature GENE76 0.8 0.6 0.4 52 59 28 163 0.2 Probability of event Patients Events Risk group 7 11 6 52 Gene signature low risk, clinical low risk Gene signature low risk, clinical high risk Gene signature high risk, clinical low risk Gene signature high risk, clinical high risk 0.0 ,K( "#*( H.*+"#*( &$=.M =,0&#2,( *+,( <,K.&%( 2(*+&%(+&')(*+&*(")(*+,( $23($,2;,0*.G,'NI3(H,( ."2(&KQ#2*,K()"$(0'.%M *H&$,(H.*+(&$=.*$&$N( <,(;".%*25(A2(2+"H%( ,(J$,&*,2*(;$"J%"2*.0( "I( &*( .K,%*.)N.%J( ;&M (C+,2,(&%&'N2,2(2#JM <&N(,W;'&.%(2"<,(")( =,*H,,%( *+,( "$.J.%&'( A 0 2 4 52 59 28 163 50 58 26 143 47 55 25 126 6 8 10 12 14 41 36 28 17 Year 44 Fig. curves The 511. Kaplan-Meier 51 44by genomic 36risk group.26 HRs stratified by 10 center. A, TDM. 21and log-rank 16 tests are 12 3 B,115 overall survival. 101 91 74 46 Number at risk B 1.0 "))(0+"2,%3()"$(="*+( G.G&'5( L"H,G,$3( )"$( 2(&0+.,G,K("%'N(="$M 1.0 GENE70 bability of event 0.8 ß Validation of GENE70 [Buyse et al., 2006] and GENE76 [Desmedt et al., 2007]. 0.4 0.6 "H,K(*+&*(*+,(;$,K.0M &2*(&2(J""K(&2(*+&*(")( 5,53(&('"H(;$"=&=.'.*N( C+,(2,%2.*.G.*.,2()"$( Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 44 / 80 Prognostic Gene Signatures Independent Validation (cont.) We sought to compare the GGI to the GENE70 and GENE76 signatures in this validation series !"#$%&'()*+, !"##$%!! &' ( ) *++,&--.../012345647+89:/623-;)<;=";>)-(-'() ß!"#$%&'()*+,!"##$%!!&'() GGI yields very similar performance [Haibe-Kains et al., 2008a]. GENE70 GENE76 5 0 7 9 15 25 103 32 0 5 GGI 25 15 9 7 and the GGI were higher than the HR of the 76-gene signature, the differences were not statistically significant (see Table 2). Figure 3 illustrates the Kaplan-Meier estimates of DMFS for the four groups of patients (two groups with concordant results in risk assessment and two with discordant results) for the different signatures two by two. GENE70analyses (Table 3), we can conclude From the multivariate GENE76 that the three signatures added significant information to the traditional GGIparameters and were the strongest predictive variables of DMFS, as reflected by their lowest p-valAOL ues compared to the other variables. The additional information of these signatures over the clinical risk was also confirmed by the fact that the univariate HRs for the 0.5similar 0.6when0.7 0.8for the 0.9 three signatures remained adjusted clinical risk, with a HR of 7.25 (95% CI: 2.4–21.5; p = 3.5 concordance index -4 – 10 ), 2.8 (95% CI: 1.2–6.8; p = 0.018) and 6.25 (95% CI: 2.3–17; p = 3.3 – 10-4) for the 70-gene signature, 76gene signature and the GGI respectively. Figure ple Venn according diagram 1 to illustrating the prognostic the classification signatures of the tumor samVenn diagram illustrating the classification of the tumor sample according to the prognostic signaLastly, we combined the three gene signatures in order to tures. Dark red = high-risk patients and blue = low-risk the potential improvement in BC prognostication. patients. GENE70 = 70-gene signature, GENE76 = 76-gene Visitassess Benjamin Haibe-Kains (ULB) to NKI November 19, 2008 1 the Forest anceAdjuvant! Figure indices, plots 2 (and and Online B/the 95%classification CI) log2 forhazard the 45 / 80 Forest plots (and 95% CI) fo Prognostic Gene Signatures A Single Gene? From the validation studies, we learned that GGI yields similar (sometimes better) performance than other gene signatures [Haibe-Kains et al., 2008a]. Since GGI is a very simple model from a statistical and a biological (proliferation-related genes) points of view, we challenged the use of complex statistical methods for breast cancer prognostication. We compared simple to complex statistical methods to a single proliferation gene (AURKA) [Haibe-Kains et al., 2008b]. ß Due to the complexity of microarray data, it is very hard to build prognostic models statistically better than AURKA. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 46 / 80 Prognostic Gene Signatures A Single Gene? (cont.) A single gene for breast cancer prognostication? 2 Forestplot of the concordance index for each method in the training set and the three validation sets: CONCORDANCE INDEX FOR RISK SCORE PREDICTION AURKA BD.COMBUNIV.WILCOXON.HG AURKA AURKA AURKA AURKA AURKA BD.COMBUNIV.COX.SURV BD.COMBUNIV.WILCOXON.HG BD.COMBUNIV.WILCOXON.HG BD.COMBUNIV.WILCOXON.HG BD.COMBUNIV.WILCOXON.HG BD.COMBUNIV.WILCOXON.HG BD.MULTIV.LM.TOE BD.COMBUNIV.COX.SURV BD.COMBUNIV.COX.SURV BD.COMBUNIV.COX.SURV BD.COMBUNIV.COX.SURV BD.COMBUNIV.COX.SURV BD.MULTIV.COX.SURV BD.MULTIV.LM.TOE BD.MULTIV.LM.TOE BD.MULTIV.LM.TOE BD.MULTIV.LM.TOE BD.MULTIV.LM.TOE GW.RANK.COMBUNIV.WILCOXON.HG BD.MULTIV.COX.SURV BD.MULTIV.COX.SURV BD.MULTIV.COX.SURV BD.MULTIV.COX.SURV BD.MULTIV.COX.SURV GW.RANKCV.COMBUNIV.WILCOXON.HG GW.RANK.COMBUNIV.WILCOXON.HG GW.RANK.COMBUNIV.WILCOXON.HG GW.RANK.COMBUNIV.WILCOXON.HG GW.RANK.COMBUNIV.WILCOXON.HG GW.RANK.COMBUNIV.WILCOXON.HG GW.RANK.COMBUNIV.COX.SURV GW.RANKCV.COMBUNIV.WILCOXON.HG GW.RANKCV.COMBUNIV.WILCOXON.HG GW.RANKCV.COMBUNIV.WILCOXON.HG GW.RANKCV.COMBUNIV.WILCOXON.HG GW.RANKCV.COMBUNIV.WILCOXON.HG GW.RANKCV.COMBUNIV.COX.SURV GW.RANK.COMBUNIV.COX.SURVGW.RANK.COMBUNIV.COX.SURV GW.RANK.COMBUNIV.COX.SURV GW.RANK.COMBUNIV.COX.SURV GW.RANK.COMBUNIV.COX.SURV GW.RANK.MULTIV.RCOX.SURV GW.RANKCV.COMBUNIV.COX.SURV GW.RANKCV.COMBUNIV.COX.SURV GW.RANKCV.COMBUNIV.COX.SURV GW.RANKCV.COMBUNIV.COX.SURV GW.RANKCV.COMBUNIV.COX.SURV GW.RANKCV.MULTIV.RCOX.SURV GW.RANK.MULTIV.RCOX.SURV GW.RANK.MULTIV.RCOX.SURV GW.RANK.MULTIV.RCOX.SURV GW.RANK.MULTIV.RCOX.SURV GW.RANK.MULTIV.RCOX.SURV GW.PCA.COMBUNIV.WILCOXON.HG GW.RANKCV.MULTIV.RCOX.SURV GW.RANKCV.MULTIV.RCOX.SURV GW.RANKCV.MULTIV.RCOX.SURV GW.RANKCV.MULTIV.RCOX.SURVGW.RANKCV.MULTIV.RCOX.SURV GW.PCACV.COMBUNIV.WILCOXON.HG GW.PCA.COMBUNIV.WILCOXON.HG GW.PCA.COMBUNIV.WILCOXON.HG GW.PCA.COMBUNIV.WILCOXON.HG GW.PCA.COMBUNIV.WILCOXON.HG GW.PCA.COMBUNIV.WILCOXON.HG GW.PCA.COMBUNIV.COX.SURV GW.PCACV.COMBUNIV.WILCOXON.HG GW.PCACV.COMBUNIV.WILCOXON.HG GW.PCACV.COMBUNIV.WILCOXON.HG GW.PCACV.COMBUNIV.WILCOXON.HG GW.PCACV.COMBUNIV.WILCOXON.HG GW.PCACV.COMBUNIV.COX.SURV GW.PCA.COMBUNIV.COX.SURV GW.PCA.COMBUNIV.COX.SURV GW.PCA.COMBUNIV.COX.SURV GW.PCA.COMBUNIV.COX.SURV GW.PCA.COMBUNIV.COX.SURV GW.PCA.MULTIV.RCOX.SURV GW.PCACV.COMBUNIV.COX.SURV GW.PCACV.COMBUNIV.COX.SURVGW.PCACV.COMBUNIV.COX.SURV GW.PCACV.COMBUNIV.COX.SURV GW.PCACV.COMBUNIV.COX.SURV GW.PCACV.MULTIV.RCOX.SURV GW.PCA.MULTIV.RCOX.SURV GW.PCA.MULTIV.RCOX.SURV GW.PCA.MULTIV.RCOX.SURV GW.PCA.MULTIV.RCOX.SURV GW.PCA.MULTIV.RCOX.SURV GENE76 GW.PCACV.MULTIV.RCOX.SURV GW.PCACV.MULTIV.RCOX.SURV GW.PCACV.MULTIV.RCOX.SURV GW.PCACV.MULTIV.RCOX.SURV GW.PCACV.MULTIV.RCOX.SURV GGI GENE76 GENE76 GENE76 GENE76 GENE76 AOL GGI GGI GGI GGI GGI NPI AOL AOL NPI NPI 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.50.550.60.650.70.750.80.850.90.95 concordance index 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.50.550.60.650.70.750.80.850.90.95 Training VDX set Validation TBVDXset 1 concordance index Validation TAM set 2 0.5 0.55 0.6 0.65 0.7 0.75 0.8 concordance index Validation set 3 UPP Supplementary Figure 1: Forest plot of the concordance indices for the risk scores predicted by all the methods in the training set (VDX) and in the three validation sets (TBG, TAM, and UPP). AURKA and GGI models were not fitted on VDX which can be considered as a validation set for these models. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 47 / 80 Part IV Subtypes and Prognosis Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 48 / 80 Prognosis in Specific Subtypes The first publications attempted to build a prognostic model from the global population of breast cancer patients. In 2005, Wang et al. were the first to divide the global population based on ER status: I I I I As breast cancer biology is very different according to the ER status (IHC), prognostic models might be different too. They built a prognostic model for each subgroup of patients (ER+ and ER-). To make a prediction, they used one of the two models depending on the ER-status of the tumor. Unfortunately the group of ER- tumors was too small and their corresponding model was not generalizable. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 49 / 80 Prognosis in Specific Subtypes (cont.) Recently, Teschendorff et al. built a new prognostic model for ERtumors [Teschendorff et al., 2007] and validated it using large datasets [Teschendorff and Caldas, 2008]. I The signature is composed of 7 immune-related genes. We showed in two meta-analyses [Wirapati et al., 2008, Desmedt et al., 2008] I F I I that: Proliferation (AURKA) was the most prognostic factor in ER+/HER2tumors and the common driving force of the early gene signatures. Actually, these signatures (e.g. GENE70, GENE76, GGI) are prognostic in ER+/HER2- tumors only. Immune response (STAT1) is prognostic in ER-/HER2- and HER2+ tumors. Tumor invasion (PLAU or uPA) is prognostic in HER2+ tumors. Finak et al. introduced a stroma-derived prognostic predictor (SDPP) particularly efficient in HER2+ tumors [Finak et al., 2008]. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 50 / 80 New Prognostic Model Since current prognostic models/gene signatures are limited to some subtypes, we plan to develop a new prognostic model integrating the breast cancer subtypes identification in order to: I I Build a prognostic gene signatures specifically targeting each subtype. Build a global prognostic model able to predict the risk of the patients whatever the tumor subtype (ER-/HER2-, HER2+ or ER+/HER2-). We plan to assess and to compare the performance of this new model to current prognostic models using the thorough statistical framework developed in [Haibe-Kains et al., 2008b]. This new prognostic model is called GENIUS, standing for Gene Expression progNostic Index Using Subtypes Benjamin Haibe-Kains (ULB) Visit to NKI , November 19, 2008 51 / 80 Part V Conclusion Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 52 / 80 Conclusion Numerous studies confirmed the great potential of gene expression profiling using microarrays to better understand cancer biology and to improve current prediction models. This technology becomes more and more mature (MAQC [MAQC Consortium, 2006]) and is now ready for clinical applications. The promising results of early publications were validated in different independent studies. Recent meta-analyses successfully recapitulated the main discoveries made these late decades and refined our knowledge on breast cancer biology. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 53 / 80 Conclusion (cont.) We benefit from this strong basis to go a step further to improve breast cancer prognosis using microarrays. I Prognostic models/gene signatures in specific subtypes I [Teschendorff et al., 2007, Desmedt et al., 2008, Finak et al., 2008]. Development of GENIUS, a prognostic model integrating breast cancer molecular subtypes identification [manuscript in preparation]. A major issue remains: ”How to combine these microarray prognostic models with clinical variables?” I I Several studies showed the additional information of tumor size, nodal status, . . . However, we currently lack of data to fit robust prognostic models combining microarray and clinical variables. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 54 / 80 Thank you for your attention. This presentation is available from http: //www.ulb.ac.be/di/map/bhaibeka/papers/haibekains2008gene.pdf. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 55 / 80 Part VI Bibliography Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 56 / 80 Bibliography I Bild, A. H., Yao, G., Chang, J. T., Wang, Q., Potti, A., Chasse, D., Joshi, M.-B., Harpole, D., Lancaster, J. M., Berschuk, A., Olson Jr, J. A., Marks, J. R., Dressman, H. K., West, M., and Nevins, J. R. (2006). Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature, 439:353–356. Bonnefoi, H., Potti, A., Delorenzi, M., Mauriac, L., Campone, M., Tubiana-Hulin, M., Petit, T., Rouanet, P., Jassem, J., Blot, E., Becette, V., Farmer, P., André, S., Acharya, C. R., Mukherjee, S., Cameron, D., Bergh, J., Nevins, J. R., and Iggo, R. (2007). Validation of gene signatures that predict the response of breast cancer to neoadjuvant chemotherapy: a substudy of the eortc 10994/big 00-01 clinical trial. Lancet Oncology, 8(12):1071–1078. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 57 / 80 Bibliography II Buyse, M., Loi, S., van’t Veer, L., Viale, G., Delorenzi, M., Glas, A. M., Saghatchian d’Assignies, M., Bergh, J., Lidereau, R., Ellis, P., Harris, A., Bogaerts, J., Therasse, P., Floore, A., Amakrane, M., Piette, F., Rutgers, E., Sotiriou, C., Cardoso, F., and Piccart, M. J. (2006). Validation and Clinical Utility of a 70-Gene Prognostic Signature for Women With Node-Negative Breast Cancer. J. Natl. Cancer Inst., 98(17):1183–1192. Chin, K., Devries, S., Fridlyand, J., Spellman, P., Roydasgupta, R., Kuo, W. L., Lapuk, A., Neve, R., Qian, Z., Ryder, T., Chen, F., Feiler, H., Tokuyasu, T., Kingsley, C., Dairkee, S., Meng, Z., Chew, K., Pinkel, D., Jain, A., Ljung, B., Esserman, L., Albertson, D., Waldman, F., and Gray, J. (2006). Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer cell, 10:529–41. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 58 / 80 Bibliography III Desmedt, C., Haibe-Kains, B., Wirapati, P., Buyse, M., Larsimont, D., Bontempi, G., Delorenzi, M., Piccart, M., and Sotiriou, C. (2008). Biological Processes Associated with Breast Cancer Clinical Outcome Depend on the Molecular Subtypes. Clin Cancer Res, 14(16):5158–5165. Desmedt, C., Piette, F., Loi, S., Wang, Y., Lallemand, F., Haibe-Kains, B., Viale, G., Delorenzi, M., Zhang, Y., d’Assignies, M. S., Bergh, J., Lidereau, R., Ellis, P., Harris, A. L., Klijn, J. G., Foekens, J. A., Cardoso, F., Piccart, M. J., Buyse, M., and Sotiriou, C. (2007). Strong Time Dependence of the 76-Gene Prognostic Signature for Node-Negative Breast Cancer Patients in the TRANSBIG Multicenter Independent Validation Series. Clin Cancer Res, 13(11):3207–3214. Finak, G., Bertos, N., Pepin, F., Sadekova, S., Souleimanova, M., Zhao, H., Chen, H., Omeroglu, G., Meterissian, S., Omeroglu, A., Hallett, M., and Park, M. (2008). Stromal gene expression predicts clinical outcome in breast cancer. Nat Med, 14(5):518–527. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 59 / 80 Bibliography IV Foekens, J. A., Atkins, D., Zhang, Y., Sweep, F. C., Harbeck, N., Paradiso, A., Cufer, T., Sieuwerts, A. M., Talantov, D., Span, P. N., Tjan-Heijnen, V. C., Zito, A. F., Specht, K., Hioefler, H., Golouh, R., Schittulli, F., Schmitt, M., Beex, L. V., Klijn, J. G., and Wang, Y. (2006). Multicenter validation of a gene expression–based prognostic signature in lymph node–negative primary breast cancer. Journal of Clinical Oncology, 24(11). Haibe-Kains, B., Desmedt, C., Piette, F., Buyse, M., Cardoso, F., van’t Veer, L., Piccart, M., Bontempi, G., and Sotiriou, C. (2008a). Comparison of prognostic gene expression signatures for breast cancer. BMC Genomics, 9(1):394. Haibe-Kains, B., Desmedt, C., Sotiriou, C., and Bontempi, G. (2008b). A comparative study of survival models for breast cancer prognostication based on microarray data: does a single gene beat them all? Bioinformatics, 24(19):2200–2208. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 60 / 80 Bibliography V Hoadley, K., Weigman, V., Fan, C., Sawyer, L., He, X., Troester, M., Sartor, C., Rieger-House, T., Bernard, P., Carey, L., and Perou, C. (2007). Egfr associated expression profiles vary with breast tumor subtype. BMC Genomics, 8(1):258. Kapp, A., Jeffrey, S., Langerod, A., Borresen-Dale, A.-L., Han, W., Noh, D.-Y., Bukholm, I., Nicolau, M., Brown, P., and Tibshirani, R. (2006). Discovery and validation of breast cancer subtypes. BMC Genomics, 7(1):231. Loi, S., Haibe-Kains, B., Desmedt, C., Lallemand, F., Tutt, A. M., Gillet, C., Ellis, P., Harris, A., Bergh, J., Foekens, J. A., Klijn, J. G., Larsimont, D., Buyse, M., Bontempi, G., Delorenzi, M., Piccart, M. J., and Sotiriou, C. (2007). Definition of Clinically Distinct Molecular Subtypes in Estrogen Receptor-Positive Breast Carcinomas Through Genomic Grade. J Clin Oncol, 25(10):1239–1246. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 61 / 80 Bibliography VI MAQC Consortium (2006). The microarray quality control (maqc) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotech, 24(9):1151–1161. Miller, L. D., Smeds, J., George, J., Vega, V. B., Vergara, L., Pioner, A., Pawitan, Y., Hall, P., Klaar, S., Liu, E. T., and Bergh, J. (2005). An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. PNAS, 102(38):13550–13555. Minn, A. J., Gupta, G. P., Siegel, P. M., Bos, P. D., Shu, W., Giri, D. D., Viale, A., Olshen, A. B., Gerald, W. L., and Massague, J. (2005). Genes that mediate breast cancer metastasis to lung. Nature, 436(7050):518–524. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 62 / 80 Bibliography VII Naderi, A., Teschendorff, A. E., Barbosa-Morais, N. L., Pinder, S. E., Green, A. R., andJ . F. Robertson, D. G. P., Aparicio, S., Ellis, I. O., Brenton, J. D., and Caldas, C. (2007). A gene-expression signature to predict survival in breast cancer across independent data sets. Oncogene, 26:1507–1516. Pawitan, Y., Bjohle, J., Amler, L., Borg, A., Egyhazi, S., Hall, P., Han, X., Holmberg, L., Huang, F., Klaar, S., Liu, E. T., Miller, L., Nordgren, H., Ploner, A., Sandelin, K., Shaw, P. M., Smeds, J., Skoog, L., Wedren, S., and Bergh, J. (2005). Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Research, 7(6):953–964. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 63 / 80 Bibliography VIII Perou, C. M., Sorlie, T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., Rees, C. A., Pollack, J. R., Ross, D. T., Johnsen, H., Akslen, L. A., Fluge, O., Pergamenschikov, A., Williams, C., Zhu, S. X., Lonning, P. E., Borresen-Dale, A.-L., Brown, P. O., and Botstein, D. (2000). Molecular portraits of human breast tumours. Nature, 406(6797):747–752. Pusztai, L., Mazouni, C., Anderson, K., Wu, Y., and Symmans, W. F. (2006). Molecular Classification of Breast Cancer: Limitations and Potential. Oncologist, 11(8):868–877. Schmidt, M., Bohm, D., von Torne, C., Steiner, E., Puhl, A., Pilch, H., Lehr, H.-A., Hengstler, J. G., Kolbl, H., and Gehrmann, M. (2008). The Humoral Immune System Has a Key Prognostic Impact in Node-Negative Breast Cancer. Cancer Res, 68(13):5405–5413. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 64 / 80 Bibliography IX Sorlie, T., Perou, C. M., Tibshirani, R., Aas, T., Geisher, S., Johnsen, H., Hastie, T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., Thorsen, T., Quist, H., Matese., J. C., Brown, P. O., Botstein, D., Eystein, P. L., and Borresen-Dale, A. L. (2001). Gene expression patterns breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Matl. Acad. Sci. USA, 98(19):10869–10874. Sorlie, T., Tibshirani, R., Parker, J., Hastie, T., Marron, J. S., Nobel, A., Deng, S., Johnsen, H., Pesich, R., Geister, S., Demeter, J., Perou, C., Lonning, P. E., Brown, P. O., Borresen-Dale, A. L., and Botstein, D. (2003). Repeated observation of breast tumor subtypes in indepedent gene expression data sets. Proc Natl Acad Sci USA, 1(14):8418–8423. Sotiriou, C., Neo, S. Y., McShane, L. M., Korn, E. L., Long, P. M., Jazaeri, A., Martiat, P., Fox, S., Harris, A. L., and Liu, E. T. (2003). Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proc. Natl. Acad. Sci., 100(18):10393–10398. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 65 / 80 Bibliography X Sotiriou, C., Wirapati, P., Loi, S., Harris, A., Fox, S., Smeds, J., Nordgren, H., Farmer, P., Praz, V., Haibe-Kains, B., Desmedt, C., Larsimont, D., Cardoso, F., Peterse, H., Nuyten, D., Buyse, M., Van de Vijver, M. J., Bergh, J., Piccart, M., and Delorenzi, M. (2006). Gene Expression Profiling in Breast Cancer: Understanding the Molecular Basis of Histologic Grade To Improve Prognosis. J. Natl. Cancer Inst., 98(4):262–272. Teschendorff, A. and Caldas, C. (2008). A robust classifier of high predictive value to identify good prognosis patients in er-negative breast cancer. Breast Cancer Research, 10(4):R73. Teschendorff, A., Miremadi, A., Pinder, S., Ellis, I., and Caldas, C. (2007). An immune response gene expression module identifies a good prognosis subtype in estrogen receptor negative breast cancer. Genome Biology, 8(8):R157. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 66 / 80 Bibliography XI Tibshirani, R. and Walther, G. (2005). Cluster validation by prediction strength. Journal of Computational and Graphical Statistics, 14(3):511–528. van de Vijver, M. J., He, Y. D., van’t Veer, L., Dai, H., Hart, A. M., Voskuil, D. W., Schreiber, G. J., Peterse, J. L., Roberts, C., Marton, M. J., Parrish, M., Atsma, D., Witteveen, A., Glas, A., Delahaye, L., van der Velde, T., Bartelink, H., Rodenhuis, S., Rutgers, E. T., Friend, S. H., and Bernards, R. (2002). A gene expression signature as a predictor of survival in breast cancer. New England Journal of Medicine, 347(25):1999–2009. van’t Veer, L. J., Dai, H., van de Vijver, M. J., He, Y. D., Hart, A. A., Mao, M., Peterse, H. L., van der Kooy, K., Marton, M. J., Witteveen, A. T., Schreiber, G. J., Kerkhiven, R. M., Roberts, C., Linsley, P. S., Bernards, R., and Friend, S. H. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415:530–536. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 67 / 80 Bibliography XII Wang, Y., Klijn, J. G., Zhang, Y., Sieuwerts, A. M., Look, M. P., Yang, F., Talantov, D., Timmermans, M., van Gelder, M. E. M., Yu, J., Jatkoe, T., Berns, E. M., Atkins, D., and Forekens, J. A. (2005). Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet, 365:671–679. Wirapati, P., Sotiriou, C., Kunkel, S., Farmer, P., Pradervand, S., Haibe-Kains, B., Desmedt, C., Ignatiadis, M., Sengstag, T., Schutz, F., Goldstein, D., Piccart, M., and Delorenzi, M. (2008). Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures. Breast Cancer Research, 10(4):R65. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 68 / 80 Part VII Appendix Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 69 / 80 Gene Expression Profiling Technologies There exist several technologies to measure the expression of genes. Low throughput technologies such as RT-PCR, allow for measuring the expression of a few genes. High throughput technologies, such as microarrays, allows for measuring simultaneously the expression of thousands of genes (whole genome). Microarray principles will be illustrated through the Affymetrix technology. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 70 / 80 Microarray A microarray is composed of I I I I DNA fragments (probes) fixed on a solid support. Ordered position of probes. Principle of hybridization to a specific probe of complementary sequence. Molecular labeling. ß Simultaneous detection of thousands of sequences in parallel. Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 71 / 80 Affymetrix GeneChip Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 72 / 80 Affymetrix GeneChip Probes A Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 73 / 80 Hybridization Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 74 / 80 Detection Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 75 / 80 Affymetrix Design Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 76 / 80 Affymetrix Equipment Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 77 / 80 Mixture of Gaussians? 1.5 1 0.5 NKI 0 0.8 0.6 0.4 0.2 VDX 0 1.5 1 Density 1 0.5 0 module: ERBB2 module: ERBB2 Density module: AURKA module: AURKA 1 0.8 0.6 0.4 0.2 0 1.5 1 0.5 module: ESR1 module: ESR1 1 0.8 0.6 0.4 0.2 0 0 !2 !1 0 1 2 !2 score Benjamin Haibe-Kains (ULB) !1 0 1 2 score Visit to NKI November 19, 2008 78 / 80 Tools Bioinformatics softwares I I I I R is a widely used open source language and environment for statistical computing and graphics Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data Java Treeview is an open source software for clustering visualization BRB Array Tools is a software suite for microarray analysis working as an Excel macro Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 79 / 80 Links Personal webpage: http://www.ulb.ac.be/di/map/bhaibeka/ Machine Learning Group: http://www.ulb.ac.be/di/mlg Functional Genomics Unit: http://www.bordet.be/en/services/medical/array/practical.htm Master in Bioinformatics at ULB and other belgian universities: http://www.bioinfomaster.ulb.ac.be/ Benjamin Haibe-Kains (ULB) Visit to NKI November 19, 2008 80 / 80