A Predictive Based Regression Algorithm for Gene Network Selection St´ ephane Guerrier

A Predictive Based Regression Algorithm
for Gene Network Selection
St´
ephane Guerrier1, Nabil Mili2& Samuel Orso2
1Department of Statistics, University of Illinois at Urbana-Champaign, USA
2Research Center for Statistics, University of Geneva, Switzerland
joint work with
Marco Avella Medina (U. Geneva), Yanyuan Ma (USC),
Roberto Molinari (U. Geneva)
June 6, 2016
S. Guerrier, N. Mili & S. Orso Panning Algorithm for Gene Selection June 6, 2016 1 / 32
Introduction Motivation
Introduction
Gene Selection Problems:
Selection of relevant genes is a common task in most gene expression
studies. Researchers try to identify the smallest possible set of genes
that can still achieve good predictive performance (D
´
ıaz-Uriarte &
Alvarez de Andr´
es, 2006).
How statisticians (typically) understand this definition:
We are looking for a single model.
For a given candidate model, picking the most likely parameters
given the data is optimal.
Predictive performance can be measured by the likelihood function
(typically out-of-sample).
The order in which the variables enter the model is unimportant
(implying: Model ABis equivalent to Model BA).
S. Guerrier, N. Mili & S. Orso Panning Algorithm for Gene Selection June 6, 2016 2 / 32
Introduction Motivation
Equivalence of outcomes according likelihood function
S. Guerrier, N. Mili & S. Orso Panning Algorithm for Gene Selection June 6, 2016 3 / 32
Introduction Potential drawbacks
Introduction
Is this a good idea?
According to our understanding of the problem (i.e. single model based on
likelihood methods): YES! However:
Focusing on a single model suggests a level of confidence in our final
result that is not justified by the data as other models generally exist
with similar good fit (Whittingham et al., 2006).
Maximizing the likelihood function does not guarantee finding the
best model(s) (and parameters) according to a given out-of-sample
(medically chosen) objective function (e.g. classification error, quality
of life, mortality, ... ).
The unimportance of the order of variable can causes
interpretation issues.
These methods are prone to overfitting (due to the asymmetric
effects of “under” vs “over” fitting).
S. Guerrier, N. Mili & S. Orso Panning Algorithm for Gene Selection June 6, 2016 4 / 32
Introduction Random Medical News
This can lead to...
S. Guerrier, N. Mili & S. Orso Panning Algorithm for Gene Selection June 6, 2016 5 / 32
1 / 32 100%

A Predictive Based Regression Algorithm for Gene Network Selection St´ ephane Guerrier

La catégorie de ce document est-elle correcte?
Merci pour votre participation!

Faire une suggestion

Avez-vous trouvé des erreurs dans linterface ou les textes ? Ou savez-vous comment améliorer linterface utilisateur de StudyLib ? Nhésitez pas à envoyer vos suggestions. Cest très important pour nous !