Outline(Day1- Nov15) • IntroductiontoCancerGenomicsData • CancerGenomicsDataportals • ThecBioPortal forcancergenomics • Access/Explore/Analyse datathroughtheportal • Access/Explore/Analyse datausingtheCGDSR-library Outline(Day2- Nov16) • JournalClub • Cancermolecularheterogeneity • Disease-modelmatching:doesthegeneticsmatter? • CancerGenomicsDataportalsfortumormodels • TheCCLPdataportal • Celllinemolecularprofilesanddrugsensitivitydata • Downloadandexplorecelllineandhumantumordataformodelselection andtreatmentdesign Outline(Day3- Nov17) • BasicunderstandingofRNA-Seq dataprocessing. • Dimensionalityreduction. • Differentialexpression. • Introductiontosingle-cellRNA-Seq Outline(Day4- Nov18) • Immunesignaturesingeneexpressiondata. • Predictionofimmunecellfractions. • Predictionofpeptide-HLAinteractionsappliedtoneo-antigens Hands-on cancergenomics data GiovanniCiriello 15/11/16 [email protected] “Introto(Computational)CancerGenomics” FocusonData • Whatitis • Howtoread/interpret • Howtoexplore • Howtomanipulate “Introto(Computational)CancerGenomics” FocusonDataPortals • Todownload • Toexplore • Toanalyze • Tointegrate “Introto(Computational)CancerGenomics” Day1 • • • Cancergenomicsdata(anddatagenerators) Dataportalsforcancergenomicsdata Enablingcuratedandsystematicaccess Day2 • • • Exploringcancergenomicheterogeneity Theimportanceofmodelselection Dataportalsforcancermodelsandmodelfeatureinterrogation Why? 1.Itwillsaveyoualotoftime Why? 1.Itwillsaveyoualotoftime Why? 2.TheHumanRelevance Why? 3.Disease-modelmatching Why? CancerGenomics Genetics Epigenetics Transcriptomics Proteomics CancerGenomics Genetics Epigenetics Transcriptomics DNAmutations CopyNumberAlterations Translocation • GeneFusion DNAMethylation Histonemethylation Chromatinstructural changes miRNA lncRNA mRNA-seq • Geneexpression • Isoformquantification • Splicing junctions Single-cellseq Proteomics RPPA Mass-spec Single-cellphosphoprotein (CyTOF) CancerGenomics Genetics Epigenetics Transcriptomics DNAmutations CopyNumberAlterations Translocation • GeneFusion DNAMethylation Histonemethylation Chromatinstructural changes miRNA lncRNA mRNA-seq • Geneexpression • Isoformquantification • Splicingjunctions Single-cellseq Proteomics RPPA Mass-spec Single-cellphosphoprotein (CyTOF) CancerGenomics Genetics • FromSangertoNextGeneration Sequencing (Illumina technology) • Targetedsequencing • Whole-exomesequencing • Whole-genomesequencing DNAmutations CopyNumberAlterations CancerGenomics Genetics Coverage Targeted DNAmutations CopyNumberAlterations WES WGS Numberof mutations CancerGenomics ClinicalSetting Selectedgene/codon panels Noneedforgermline High resolution /high depth (500/1000x) Genetics Coverage Targeted DNAmutations CopyNumberAlterations WES WGS Numberof mutations CancerGenomics Genetics DNAmutations CopyNumberAlterations CancerGenomics Genetics DNAmutations CopyNumberAlterations CancerGenomics Genetics Coverage Targeted DNAmutations CopyNumberAlterations WES ResearchSetting Allcodinggenes Internationalconsortia Discoveryofdriver mutations Currently,themost diffuse typeofDNAseq data WGS Numberof mutations CancerGenomics Genetics DNAmutations CopyNumberAlterations CancerGenomics Genetics Targeted Coverage ResearchSetting Growing availability(ICGC) Non-coding mutations Structuralvariants Clonalityinference Challenging interpretability.. DNAmutations CopyNumberAlterations WES WGS Numberof mutations CancerGenomics Genetics DNAmutations CopyNumberAlterations CancerGenomics Genetics DNAmutations CopyNumberAlterations • ArrayComparativeGenomic Hybridization (aCGH) • Affymetrix SNP6.0(~2Mprobes) CancerGenomics • ArrayComparativeGenomic Hybridization (aCGH) • Affymetrix SNP6.0(~2Mprobes) Genetics DNAmutations CopyNumberAlterations Segments ofuniform copynumber status CancerGenomics Genetics DNAmutations CopyNumberAlterations • ArrayComparativeGenomic Hybridization (aCGH) • Affymetrix SNP6.0(~2Mprobes) • GISTIC:recurrentcopynumberalterations CancerGenomics Epigenetics DNAmethylation • Illumina infinium array27Kà 450Kà 800K • Probing DNAmethylationpreferentiallyatCpG promoters, butnowalsogenebody /up-downstream generegions • Additionofamethylgrouptothe5-carbonofcytosine CancerGenomics • Illumina infinium array27Kà 450Kà 800K • Probing DNAmethylationpreferentiallyatCpG promoters, butnowalsogenebody /up-downstream generegions Epigenetics DNAmethylation β = Methylatedmolecules Allprobedmol. CancerGenomics • Illumina infinium array27Kà 450Kà 800K • Probing DNAmethylationpreferentiallyatCpG promoters, butnowalsogenebody /up-downstream generegions Epigenetics DNAmethylation β = Methylatedmolecules Allprobedmol. CancerGenomics Transcriptomics • RNA-seq hastakenovermicroarrays • Statisticalanalysesoftenhasnot RNA-seq NegativeBinomialDistribution CancerGenomics Transcriptomics • RNA-seq hastakenovermicroarrays • Statisticalanalysesoftenhasnot CDK4 RNA-seq PTEN CancerGenomics Transcriptomics RNA-seq • RNA-seq hastakenovermicroarrays • Statisticalanalysesoftenhasnot • Log-transformation/qq-transformation • tomimicnormal distribution and • usenormaldistribution assuming statistics CancerGenomics Proteomics RPPA (ReversePhaseProtein Array) • Selectantibody panel • ~120proteinantibody • ~60phospho-protein antibody Readoutof signaling/pathway activity CancerGenomics Proteomics RPPA (ReversePhaseProtein Array) • Selectantibody panel • ~120proteinantibody • ~60phospho-protein antibody AKT pS473 PTEN (deletion/mutation) Thegenomicsrevolution (2001) Thecancergenomicsrevolution ICGC December 2015 Thecancergenomicsrevolution TheCancerGenomeAtlas TheCancerGenomeAtlas CancerGenomicsDataPortals GenomicDataCommons(GDC) GenomicDataCommons(GDC) GenomicDataCommons(GDC) GenomicDataCommons(GDC) Good featureselectionfor casedatasetbuilding GenomicDataCommons(GDC) Good featureselectionfor casedatasetbuilding GenomicDataCommons(GDC) Good featureselectionfor casedatasetbuilding GenomicDataCommons(GDC) Good featureselectionfor datadownload Onefilepersample– nodatamatrixdownload GenomicDataCommons(GDC) Good featureselectionfor datadownload ExampleofMAFfile GenomicDataCommons(GDC) • GDCDataPortal • • • • • • • https://gdc-portal.nci.nih.gov/ NOdatamatrixperdataset NOanalysiscapabilities NObrowsingbygene YEScontrolledaccesstoRAWdata YESfilteringcriteriaonpatients YEScrosscohortsfilesearch ICGCDataPortal (InternationalCancerGenomeConsortium) ICGCDataPortal (InternationalCancerGenomeConsortium) Browsebyproject ICGCDataPortal (InternationalCancerGenomeConsortium) Multipleselectionmechanisms ICGCDataPortal (InternationalCancerGenomeConsortium) Analyticaltools ICGCDataPortal (InternationalCancerGenomeConsortium) Similarcasestudyselectionas forGDC ICGCDataPortal (InternationalCancerGenomeConsortium) Casestudyselectioncombined witheasydatabulkdownload! ICGCDataPortal (InternationalCancerGenomeConsortium) Specificmutationtype andmutationoccurrences Notsimple graphical overview Allinfoarenotinthesame place ICGCDataPortal (InternationalCancerGenomeConsortium) • ICGCDataPortal (officialICGCportal) • • • • • https://dcc.icgc.org/ Bestcombinationofcasestudyselectionandbulkdownload Providesanalyticaltools Limitedgenesearch(oneatatime) Data/Informationnotalwaysinthesameplace(convoluted) (OutputofFireHose dataanalysispipelinefromtheBroadInstitute) GeneExpressionBox GeneExpressionBox StudySummaryBox (OutputofFireHose dataanalysispipelinefromtheBroadInstitute) Selectcohortfordatadownload (OutputofFireHose dataanalysispipelinefromtheBroadInstitute) (OutputofFireHose dataanalysispipelinefromtheBroadInstitute) Indexfor Flatdatafiles (OutputofFireHose dataanalysispipelinefromtheBroadInstitute) (OutputofFireHose dataanalysispipelinefromtheBroadInstitute) Indexfordataanalysisfiles (OutputofFireHose dataanalysispipelinefromtheBroadInstitute) (OutputofFireHose dataanalysispipelinefromtheBroadInstitute) (OutputofFireHose dataanalysispipelinefromtheBroadInstitute) • FireBrowse (TCGA) • • • • http://firebrowse.org/ BasicdatarepositoryforTCGAdata,plentyofflatfilestodownload PlusdownloadofanalysesfromtheAWG Allanalysesarerunfrompipeline,hencenopost-processing Selectoneor morecancer studies atonce 1 Selectoneor morecancer studies atonce 1 Selectthedatatype (thismayvary 2 betweenstudies) Selectoneor morecancer studies atonce 1 Selectthedatatype (thismayvary 2 betweenstudies) Selectthe patientset 3 Selectoneor morecancer studies atonce 1 Selectthedatatype (thismayvary 2 betweenstudies) Selectthe patientset 3 Queryyour 4 gene(s)ofinterest Handlingcomplexity • Eventcallabstraction: eventeitheroccurornot • Completeinformation is organizedandeasily accessible Handlingcomplexity • Eventcallabstraction: eventeitheroccurornot • Completeinformation is organizedandeasily accessible Handlingcomplexity • Eventcallabstraction: eventeitheroccurornot • Completeinformation is organizedandeasily accessible • Integratemultiple data types Handlingcomplexity • Eventcallabstraction: eventeitheroccurornot • Completeinformation is organizedandeasily accessible • Integratemultiple data types • Dataanalysis Cross-studiesquery/comparisons • Selectmultiple cancerstudiestoquerygeneticalterationsinaspecificgene Cross-studiesquery/comparisons • • Aggregating datatorevealmutationalhotspots Exploremutationatthestructurallevel EGFRmutationsacrosshumancancers Cross-studiesquery/comparisons • Gene-specificexpressionlandscapeacrosscancers EGFRmRNAexpressionacrosshumancancers