http://engr.case.edu/ray_soumya/mlrg/empirical_bayes_efron.pdf

Téléchargement

Statistical Science

2008, Vol. 23, No. 1, 1–22

DOI: 10.1214/07-STS236

©Institute of Mathematical Statistics, 2008

Microarrays, Empirical Bayes and the

Two-Groups Model

Bradley Efron

Abstract. The classic frequentist theory of hypothesis testing developed by

Neyman, Pearson and Fisher has a claim to being the twentieth century’s

most inﬂuential piece of applied mathematics. Something new is happen-

ing in the twenty-ﬁrst century: high-throughput devices, such as microar-

rays, routinely require simultaneous hypothesis tests for thousands of indi-

vidual cases, not at all what the classical theory had in mind. In these situa-

tions empirical Bayes information begins to force itself upon frequentists and

Bayesians alike. The two-groups model is a simple Bayesian construction

that facilitates empirical Bayes analysis. This article concerns the interplay

of Bayesian and frequentist ideas in the two-groups setting, with particular at-

tention focused on Benjamini and Hochberg’s False Discovery Rate method.

Topics include the choice and meaning of the null hypothesis in large-scale

testing situations, power considerations, the limitations of permutation meth-

ods, signiﬁcance testing for groups of cases (such as pathways in microarray

studies), correlation effects, multiple conﬁdence intervals and Bayesian com-

petitors to the two-groups model.

Key words and phrases: Simultaneous tests, empirical null, false discovery

rates.

1. INTRODUCTION

Simultaneous hypothesis testing was a lively re-

search topic during my student days, exempliﬁed by

Rupert Miller’s classic text “Simultaneous Statistical

Inference” (1966, 1981). Attention focused on testing

Nnull hypotheses at the same time, where Nwas typi-

cally less than half a dozen, though the requisite tables

might go up to N=20. Modern scientiﬁc technology,

led by the microarray, has upped the ante in dramatic

fashion: my examples here will have N’s ranging from

200 to 10,000, while N =500,000, from SNP analy-

ses, is waiting in the wings. [The astrostatistical appli-

cations in Liang et al. (2004) envision N=1010 and

more!]

Miller’s text is relentlessly frequentist, reﬂecting a

classic Neyman–Pearson testing framework, with the

Bradley Efron is Professor, Department of Statistics,

Stanford University, Stanford, California 94305, USA

(e-mail: [email protected]d.edu).

1Discussed in 10.1214/07-STS236B,10.1214/07-STS236C,

10.1214/07-STS236D and 10.1214/07-STS236A; rejoinder at

10.1214/08-STS236REJ.

main goal being preservation of “α,” overall test size,

in the face of multiple inference. Most of the current

microarray statistics literature shares this goal, and also

its frequentist viewpoint, as described in the nice re-

view article by Dudoit and Boldrick (2003).

Something changes, though, when Ngets big: with

thousands of parallel inference problems to consider si-

multaneously, Bayesian considerations begin to force

themselves even upon dedicated frequentists. The

“two-groups model” of the title is a particularly simple

Bayesian framework for large-scale testing situations.

This article explores the interplay of frequentist and

Bayesian ideas in the two-groups setting, with particu-

lar attention paid to False Discovery Rates (Benjamini

and Hochberg, 1995).

Figure 1concerns four examples of large-scale si-

multaneous hypothesis testing. Each example consists

of Nindividual cases, with each case represented by

its own z-value “zi,” for i=1,2,...,N.Thezi’s

are based on familiar constructions that, theoretically,

should yield standard N(0,1)normal distributions un-

2B. EFRON

FIG.1. Four examples of large-scale simultaneous inference,each panel indicating Nz-values as explained in the text.Panel A, prostate

cancer microarray study,N=6033 genes;panel B, comparison of advantaged versus disadvantaged students passing mathematics compe-

tency tests,N=3748 high schools;panel C, proteomics study,N=230 ordered peaks in time-of-ﬂight spectroscopy experiment;panel D,

imaging study comparing dyslexic versus normal children,showing horizontal slice of 655 voxels out of N=15,455, coded “−”for zi<0,

“+”for zi≥0and solid circle for zi>2.

MICROARRAYS, EMPIRICAL BAYES AND THE TWO-GROUPS MODEL 3

der a classical null hypothesis,

theoretical null : zi∼N(0,1).(1.1)

Here is a brief description of the four examples, with

further information following as needed in the sequel.

EXAMPLE A [Prostate data, Singh et al. (2002)].

N=6033 genes on 102 microarrays, n1=50 healthy

males compared with n2=52 prostate cancer patients;

zi’s based on two-sample tstatistics comparing the two

categories.

EXAMPLE B [Education data, Rogosa (2003)]. N=

3748 California high schools; zi’s based on binomial

test of proportion advantaged versus proportion dis-

advantaged students passing mathematics competency

tests.

EXAMPLE C [Proteomics data, Turnbull (2006)].

N=230 ordered peaks in time-of-ﬂight spectroscopy

study of 551 heart disease patients. Each peak’s z-value

was obtained from a Cox regression of the patients’

survival times, with the predictor variable being the

551 observed intensities at that peak.

EXAMPLE D [Imaging data, Schwartzman et al.

(2005]). N=15,445 voxels in a diffusion tensor

imaging (DTI) study comparing 6 dyslexic with six

normal children; zi’s based on two-sample tstatistics

comparing the two groups. The ﬁgure shows only a

single horizontal brain section having 655 voxels, with

“−” indicating zi<0, “+”forzi≥0, and solid circles

for zi>2.

Our four examples are enough alike to be usefully

analyzed by the two-groups model of Section 2, but

there are some striking differences, too: the theoretical

N(0,1)null (1.1) is obviously inappropriate for the ed-

ucation data of panel B; there is a hint of correlation of

z-value with peak number in panel C, especially near

the right limit; and there is substantial spatial correla-

tion appearing in the imaging data of panel D.

My plan here is to discuss a range of inference prob-

lems raised by large-scale hypothesis testing, many of

which, it seems to me, have been more or less under-

emphasized in a literature focused on controlling Type-

I errors: the choice of a null hypothesis, limitations of

permutation methods, the meaning of “null” and “non-

null” in large-scale settings, questions of power, test

of signiﬁcance for groups of cases (e.g., pathways in

microarray studies), the effects of correlation, multiple

conﬁdence statements and Bayesian competitors to the

two-groups model. The presentation is intended to be

as nontechnical as possible, many of the topics being

discussed more carefully in Efron (2004, 2005, 2006).

References will be provided as we go along, but this is

not intended as a comprehensive review. Microarrays

have stimulated a burst of creativity from the statistics

community, and I apologize in advance for this arti-

cle’s concentration on my own point of view, which

aims at minimizing the amount of statistical model-

ing required of the statistician. More model-intensive

techniques, including fully Bayesian approaches, as in

Parmigiani et al. (2002) or Lewin et al. (2006), have

their own virtues, which I hope will emerge in the Dis-

cussion.

Section 2discusses the two-groups model and false

discovery rates in an idealized Bayesian setting. Empir-

ical Bayes methods are needed to carry out these ideas

in practice, as discussed in Section 3. This discussion

assumes a “good” situation, like that of Example A,

where the theoretical null (1.1) ﬁts the data. When it

does not, as in Example B,theempirical null methods

of Section 4come into play. These raise interpretive

questions of their own, as mentioned above, discussed

in the later sections.

We are living through a scientiﬁc revolution pow-

ered by the new generation of high-throughput obser-

vational devices. This is a wonderful opportunity for

statisticians, to redemonstrate our value to the scien-

tiﬁc world, but also to rethink basic topics in statistical

theory. Hypothesis testing is the topic here, a subject

that needs a fresh look in contexts like those of Fig-

ure 1.

2. THE TWO-GROUPS MODEL AND FALSE

DISCOVERY RATES

The two-groups model is too simple to have a single

identiﬁable author, but it plays an important role in the

Bayesian microarray literature, as in Lee et al. (2000),

Newton et al. (2001) and Efron et al. (2001). We sup-

pose that the Ncases (“genes” as they will be called

now in deference to microarray studies, though they

are not genes in the last three examples of Figure 1)

are each either null or nonnull with prior probability

p0or p1=1−p0, and with z-values having density

either f0(z) or f1(z),

p0=Pr{null}f0(z) density if null,

(2.1)

p1=Pr{nonnull}f1(z) density if nonnull.

The usual purpose of large-scale simultaneous test-

ing is to reduce a vast set of possibilities to a much

smaller set of scientiﬁcally interesting prospects. In

4B. EFRON

Example A, for instance, the investigators were proba-

bly searching for a few genes, or a few hundred at most,

worthy of intensive study for prostate cancer etiology.

I will assume

p0≥0.90(2.2)

in what follows, limiting the nonnull genes to no more

than 10%.

False discovery rate (Fdr) methods have developed

in a strict frequentist framework, beginning with Ben-

jamini and Hochberg’s seminal 1995 paper, but they

also have a convincing Bayesian rationale in terms of

the two-groups model. Let F0(z) and F1(z) denote

the cumulative distribution functions (cdf) of f0(z)

and f1(z) in (2.1), and deﬁne the mixture cdf F(z)=

p0F0(z) +p1F1(z). Then Bayes’ rule yields the a pos-

teriori probability of a gene being in the null group of

(2.1) given that its z-value Zis less than some thresh-

old z,say“Fdr(z),” as

Fdr(z) ≡Pr{null|Z≤z}

(2.3)

=p0F0(z)/F (z).

[Here it is notationally convenient to consider the neg-

ative end of the zscale, values like z=−3. Deﬁni-

tion (2.3) could just as well be changed to Z>zor

Z>|z|.] Benjamini and Hochberg’s (1995) false dis-

covery rate control rule begins by estimating F(z)with

the empirical cdf

F(z)=#{zi≤z}/N,(2.4)

yielding Fdr(z) =p0F0(z)/ ¯

F(z). The rule selects a

control level “q,” say q=0.1, and then declares as

nonnull those genes having z-values zisatisfying zi≤

z0,wherez0is the maximum value of zsatisfying

Fdr(z0)≤q(2.5)

[usually taking p0=1in(2.3), and F0the theoretical

null, the standard normal cdf (z) of (1.1)].

The striking theorem proved in the 1995 paper

was that the expected proportion of null genes re-

ported by a statistician following rule (2.5) will be

no greater than q. This assumes independence among

the zi’s, extended later to various dependence models

in Benjamini and Yekutieli (2001). The theorem is a

purely frequentist result, but as pointed out in Storey

(2002) and Efron and Tibshirani (2002), it has a sim-

ple Bayesian interpretation via (2.3): rule (2.5)ises-

sentially equivalent to declaring nonnull those genes

whose estimated tail-area posterior probability of be-

ing null is no greater than q. It is usually a good sign

when Bayesian and frequentist ideas converge on a sin-

gle methodology, as they do here.

Densities are more natural than tail areas for Baye-

sian fdr interpretation. Deﬁning the mixture density

from (2.1),

f(z)=p0f0(z) +p1f1(z),(2.6)

Bayes’ rule gives

fdr(z) ≡Pr{null|Z=z}

(2.7)

=p0f0(z)/f (z)

for the probability of a gene being in the null group

given z-score z.Herefdr(z) is the local false discovery

rate (Efron et al., 2001; Efron, 2005).

There is a simple relationship between Fdr(z) and

fdr(z),

Fdr(z) =Ef{fdr(Z)|Z≤z},(2.8)

“Ef” indicating expectation with respect to the mix-

ture density f(z).Thatis,Fdr(z) is the mixture aver-

age of fdr(Z) for Z≤z. In the usual situation where

fdr(z) decreases as |z|gets large, Fdr(z) will be smaller

than fdr(z). Intuitively, if we decide to label all genes

with ziless than some negative value z0as nonnull,

then fdr(z0), the false discovery rate at the bound-

ary point z0, will be greater than Fdr(z0), the average

false discovery rate beyond the boundary. Figure 2il-

lustrates the geometrical relationship between Fdr(z)

and fdr(z); the Benjamini–Hochberg Fdr control rule

amounts to an upper bound on the secant slope.

For Lehmann alternatives

F1(z) =F0(z)γ,[γ<1],(2.9)

it turns out that

logfdr(z)

1−fdr(z) 

(2.10)

=logFdr(z)

1−Fdr(z) +log1

γ,

fdr(z) ˙=Fdr(z)/γ(2.11)

for small values of Fdr. The prostate data of Figure 1

has γabout 1/2 in each tail, making fdr(z) ∼2Fdr(z)

near the extremes.

The statistics literature has not reached consensus on

the choice of qfor the Benjamini–Hochberg control

rule (2.5)—what would be the equivalent of 0.05 for

classical testing—but Bayes factor calculations offer

some insight. Efron (2005, 2006) uses the cutoff point

fdr(z) ≤0.20(2.12)

MICROARRAYS, EMPIRICAL BAYES AND THE TWO-GROUPS MODEL 5

FIG.2. Relationship of Fdr(z) to fdr(z).Heavy curve plots numerator of Fdr,p0F0(z),versus denominator F(z);fdr(z) is slope of tangent,

Fdr slope of secant.

for reporting nonnull genes, on the admittedly subjec-

tive grounds that fdr values much greater than 0.20 are

dangerously prone to wasting investigators’ resources.

Then (2.6), (2.7) yield posterior odds ratio

Pr{nonnull|z}/Pr{null|z}

=(1−fdr(z))/fdr(z)

(2.13)

=p1f1(z)/p0f0(z)

≥0.8/0.2=4.

Since (2.2) implies p1/p0≤1/9, (2.13) corresponds to

requiring Bayes factor

f1(z)/f0(z) ≥36(2.14)

in favor of nonnull in order to declare signiﬁcance.

Factor (2.14) requires much stronger evidence

against the null hypothesis than in standard one-at-a-

time testing, where the critical threshold lies some-

where near 3 (Efron and Gous, 2001). The fdr 0.20

threshold corresponds to q-values in (2.5) between

0.05 and 0.15 for moderate choices of γ;suchq-value

thresholds can be interpreted as providing conservative

Bayes factors for Fdr testing.

Model (2.1) ignores the fact that investigators usu-

ally begin with hot prospects in mind, genes that have

high prior probability of being interesting. Suppose

p0(i) is the prior probability that gene iis null, and de-

ﬁne p0as the average of p0(i) over all Ngenes. Then

Bayes’ theorem yields this expression for fdri(z) =

Pr{geneinull|zi=z}:

fdri(z) =fdr(z) ri

1−(1−ri)fdr(z) ,

(2.15) ri=p0(i)

1−p0(i) p0

1−p0,

where fdr(z) =p0f0(z)/f (z) as before. So for a hot

prospect having p0(i) =0.50 rather than p0=0.90,

(2.15) changes an uninteresting result like fdr(zi)=

0.40 into fdri(zi)=0.069.

Wonderfully neat and exact results like the Benjamini–

Hochberg Fdr control rule exert a powerful inﬂuence

on statistical theory, sometimes more than is good for

applied work. Much of the microarray statistics liter-

ature seems to me to be overly concerned with exact

properties borrowed from classical test theory, at the

expense of ignoring the complications of large-scale

testing. Neatness and exactness are mostly missing in

what follows as I examine an empirical Bayes approach

to the application of two-groups/Fdr ideas to situations

like those in Figure 1.

3. EMPIRICAL BAYES METHODS

In practice, the difference between Bayesian and

frequentist statisticians is their self-conﬁdence in as-

signing prior distributions to complicated probability

models. Large-scale testing problems certainly look

1 / 22 100%

Documents connexes

[arxiv.org]

V rification symbolique de programmes manipulant des pointeurs

Marketing Trends: Novelty and Innovation in Research

Open access

[PDF]

Comprehensive molecular analysis of several prognostic signatures using molecular indices... to hallmarks of breast cancer: proliferation index appears to be...

A Support Vector Machine Classifier based on Recursive Feature Elimination

Meta-Analysis in Transcriptional Network Inference Introduction

Open access

Merci pour votre participation!

Faire une suggestion

Avez-vous trouvé des erreurs dans l'interface ou les textes ? Ou savez-vous comment améliorer l'interface utilisateur de StudyLib ? N'hésitez pas à envoyer vos suggestions. C'est très important pour nous!

GDPR Confidentialité Conditions d'utilisation

http://engr.case.edu/ray_soumya/mlrg/empirical_bayes_efron.pdf

Documents connexes

Faire une suggestion

Produits

Assistance

Produits

Assistance

http://engr.case.edu/ray_soumya/mlrg/empirical_bayes_efron.pdf

Documents connexes

Faire une suggestion

Produits

Assistance

Ajouter ce document à la (aux) collections

Ajouter ce document à enregistré

Suggérez-nous comment améliorer StudyLib