Telechargé par Romain H

Probability Theory by Athreya

publicité
Krishna B. Athreya
Soumendra N. Lahiri
Measure Theory
and Probability Theory
Preface
This book arose out of two graduate courses that the authors have taught
during the past several years; the first one being on measure theory followed
by the second one on advanced probability theory.
The traditional approach to a first course in measure theory, such as in
Royden (1988), is to teach the Lebesgue measure on the real line, then the
differentation theorems of Lebesgue, Lp -spaces on R, and do general measure at the end of the course with one main application to the construction
of product measures. This approach does have the pedagogic advantage
of seeing one concrete case first before going to the general one. But this
also has the disadvantage in making many students’ perspective on measure theory somewhat narrow. It leads them to think only in terms of the
Lebesgue measure on the real line and to believe that measure theory is
intimately tied to the topology of the real line. As students of statistics,
probability, physics, engineering, economics, and biology know very well,
there are mass distributions that are typically nonuniform, and hence it is
useful to gain a general perspective.
This book attempts to provide that general perspective right from the
beginning. The opening chapter gives an informal introduction to measure
and integration theory. It shows that the notions of σ-algebra of sets and
countable additivity of a set function are dictated by certain very natural approximation procedures from practical applications and that they
are not just some abstract ideas. Next, the general extension theorem of
Carathedory is presented in Chapter 1. As immediate examples, the construction of the large class of Lebesgue-Stieltjes measures on the real line
and Euclidean spaces is discussed, as are measures on finite and countable
viii
Preface
spaces. Concrete examples such as the classical Lebesgue measure and various probability distributions on the real line are provided. This is further
developed in Chapter 6 leading to the construction of measures on sequence
spaces (i.e., sequences of random variables) via Kolmogorov’s consistency
theorem.
After providing a fairly comprehensive treatment of measure and integration theory in the first part (Introduction and Chapters 1–5), the focus
moves onto probability theory in the second part (Chapters 6–13). The feature that distinguishes probability theory from measure theory, namely, the
notion of independence and dependence of random variables (i.e., measureable functions) is carefully developed first. Then the laws of large numbers
are taken up. This is followed by convergence in distribution and the central
limit theorems. Next the notion of conditional expectation and probability
is developed, followed by discrete parameter martingales. Although the development of these topics is based on a rigorous measure theoretic foundation, the heuristic and intuitive backgrounds of the results are emphasized
throughout. Along the way, some applications of the results from probability theory to proving classical results in analysis are given. These include,
for example, the density of normal numbers on (0,1) and the Wierstrass
approximation theorem. These are intended to emphasize the benefits of
studying both areas in a rigorous and combined fashion. The approach
to conditional expectation is via the mean square approximation of the
“unknown” given the “known” and then a careful approximation for the
L1 -case. This is a natural and intuitive approach and is preferred over the
“black box” approach based on the Radon-Nikodym theorem.
The final part of the book provides a basic outline of a number of special
topics. These include Markov chains including Markov chain Monte Carlo
(MCMC), Poisson processes, Brownian motion, bootstrap theory, mixing
processes, and branching processes. The first two parts can be used for a
two-semester sequence, and the last part could serve as a starting point for
a seminar course on special topics.
This book presents the basic material on measure and integration theory
and probability theory in a self-contained and step-by-step manner. It is
hoped that students will find it accessible, informative, and useful and also
that they will be motivated to master the details by carefully working out
the text material as well as the large number of exercises. The authors hope
that the presentation here is found to be clear and comprehensive without
being intimidating.
Here is a quick summary of the various chapters of the book. After giving
an informal introduction to the ideas of measure and integration theory,
the construction of measures starting with set functions on a small class of
sets is taken up in Chapter 1 where the Caratheodory extension theorem is
proved and then applied to construct Lebesgue-Stieltjes measures. Integration theory is taken up in Chapter 2 where all the basic convergence theorems including the MCT, Fatou, DCT, BCT, Egorov’s, and Scheffe’s are
Preface
ix
proved. Included here are also the notion of uniform integrability and the
classical approximation theorem of Lusin and its use in Lp -approximation
by smooth functions. The third chapter presents basic inequalities for Lp spaces, the Riesz-Fischer theorem, and elementary theory of Banach and
Hilbert spaces. Chapter 4 deals with Radon-Nikodym theory via the Riesz
representation on L2 -spaces and its application to differentiation theorems
on the real line as well as to signed measures. Chapter 5 deals with product measures and the Fubini-Tonelli theorems. Two constructions of the
product measure are presented: one using the extension theorem and another via iterated integrals. This is followed by a discussion on convolutions,
Laplace transforms, Fourier series, and Fourier transforms. Kolmogorov’s
consistency theorem for the construction of stochastic processes is taken up
in Chapter 6 followed by the notion of independence in Chapter 7. The laws
of large numbers are presented in a unified manner in Chapter 8 where the
classical Kolmogorov’s strong law as well as Etemadi’s strong law are presented followed by Marcinkiewicz-Zygmund laws. There are also sections
on renewal theory and ergodic theorems. The notion of weak convergence of
probability measures on R is taken up in Chapter 9, and Chapter 10 introduces characteristic functions (Fourier transform of probability measures),
the inversion formula, and the Levy-Cramer continuity theorem. Chapter
11 is devoted to the central limit theorem and its extensions to stable and
infinitely divisible laws. Chapter 12 discusses conditional expectation and
probability where an L2 -approach followed by an approximation to L1 is
presented. Discrete time martingales are introduced in Chapter 13 where
the basic inequalities as well as convergence results are developed. Some
applications to random walks are indicated as well. Chapter 14 discusses
discrete time Markov chains with a discrete state space first. This is followed by discrete time Markov chains with general state spaces where the
regeneration approach for Harris chains is carefully explained and is used
to derive the basic limit theorems via the iid cycles approach. There are
also discussions of Feller Markov chains on Polish spaces and Markov chain
Monte Carlo methods. An elementary treatment of Brownian motion is
presented in Chapter 15 along with a treatment of continuous time jump
Markov chains. Chapters 16–18 provide brief outlines respectively of the
bootstrap theory, mixing processes, and branching processes. There is an
Appendix that reviews basic material on elementary set theory, real and
complex numbers, and metric spaces.
Here are some suggestions on how to use the book.
1. For a one-semester course on real analysis (i.e., measure end integration theory), material up to Chapter 5 and the Appendix should
provide adequate coverage with Chapter 6 being optional.
2. A one-semester course on advanced probability theory for those with
the necessary measure theory background could be based on Chapters
6–13 with a selection of topics from Chapters 14–18.
x
Preface
3. A one-semester course on combined treatment of measure theory and
probability theory could be built around Chapters 1, 2, Sections 3.1–
3.2 of Chapter 3, all of Chapter 4 (Section 4.2 optional), Sections
5.1 and 5.2 of Chapter 5, Chapters 6, 7, and Sections 8.1, 8.2, 8.3
(Sections 8.5 and 8.6 optional) of Chapter 8. Such a course could
be followed by another that includes some coverage of Chapters 9–
12 before moving on to other areas such as mathematical statistics
or martingales and financial mathematics. This will be particularly
useful for graduate programs in statistics.
4. A one-semester course on an introduction to stochastic processes or
a seminar on special topics could be based on Chapters 14–18.
A word on the numbering system used in the book. Statements of results
(i.e., Theorems, Corollaries, Lemmas, and Propositions) are numbered consecutively within each section, in the format a.b.c, where a is the chapter
number, b is the section number, and c is the counter. Definitions, Examples, and Remarks are numbered individually within each section, also of
the form a.b.c, as above. Sections are referred to as a.b where a is the chapter number and b is the section number. Equation numbers appear on the
right, in the form (b.c), where b is the section number and c is the equation
number. Equations in a given chapter a are referred to as (b.c) within the
chapter but as (a.b.c) outside chapter a. Problems are listed at the end of
each chapter in the form a.c, where a is the chapter number and c is the
problem number.
In the writing of this book, material from existing books such as Apostol
(1974), Billingsley (1995), Chow and Teicher (2001), Chung (1974), Durrett (2004), Royden (1988), and Rudin (1976, 1987) has been freely used.
The authors owe a great debt to these books. The authors have used this
material for courses taught over several years and have benefited greatly
from suggestions for improvement from students and colleagues at Iowa
State University, Cornell University, the Indian Institute of Science, and
the Indian Statistical Institute. We are grateful to them.
Our special thanks go to Dean Issacson, Ken Koehler, and Justin Peters at Iowa State University for their administrative support of this long
project. Krishna Athreya would also like to thank Cornell University for
its support.
We are most indebted to Sharon Shepard who typed and retyped several
times this book, patiently putting up with our never-ending “final” versions.
Without her patient and generous help, this book could not have been
written. We are also grateful to Denise Riker who typed portions of an
earlier version of this book.
John Kimmel of Springer got the book reviewed at various stages. The
referee reports were very helpful and encouraging. Our grateful thanks to
both John Kimmel and the referees.
Preface
xi
We have tried hard to make this book free of mathematical and typographical errors and misleading or ambiguous statements, but we are
aware that there will still be many such remaining that we have not
caught. We will be most grateful to receive such corrections and suggestions for improvement. They can be e-mailed to us at [email protected] or
[email protected].
On a personal note, we would like to thank our families for their patience
and support. Krishna Athreya would like to record his profound gratitude
to his maternal granduncle, the late Shri K. Venkatarama Iyer, who opened
the door to mathematical learning for him at a crucial stage in high school,
to the late Professor D. Basu of the Indian Statistical Institute who taught
him to think probabilistically, and to Professor Samuel Karlin of Stanford
University for initiating him into research in mathematics.
K. B. Athreya
S. N. Lahiri
May 12, 2006
Contents
Preface
vii
Measures and Integration: An Informal Introduction
1
1 Measures
1.1 Classes of sets . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Measures . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 The extension theorems and Lebesgue-Stieltjes measures
1.3.1 Caratheodory extension of measures . . . . . . .
1.3.2 Lebesgue-Stieltjes measures on R . . . . . . . . .
1.3.3 Lebesgue-Stieltjes measures on R2 . . . . . . . .
1.3.4 More on extension of measures . . . . . . . . . .
1.4 Completeness of measures . . . . . . . . . . . . . . . . .
1.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
14
19
19
25
27
28
30
31
2 Integration
2.1 Measurable transformations . . . . . . . . .
2.2 Induced measures, distribution functions . .
2.2.1 Generalizations to higher dimensions
2.3 Integration . . . . . . . . . . . . . . . . . .
2.4 Riemann and Lebesgue integrals . . . . . .
2.5 More on convergence . . . . . . . . . . . . .
2.6 Problems . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
39
39
44
47
48
59
61
71
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xiv
Contents
3 Lp -Spaces
3.1 Inequalities . . . . . . . . . .
3.2 Lp -Spaces . . . . . . . . . . .
3.2.1 Basic properties . . .
3.2.2 Dual spaces . . . . . .
3.3 Banach and Hilbert spaces . .
3.3.1 Banach spaces . . . .
3.3.2 Linear transformations
3.3.3 Dual spaces . . . . . .
3.3.4 Hilbert space . . . . .
3.4 Problems . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
83
. 83
. 89
. 89
. 93
. 94
. 94
. 96
. 97
. 98
. 102
4 Differentiation
4.1 The Lebesgue-Radon-Nikodym theorem
4.2 Signed measures . . . . . . . . . . . . .
4.3 Functions of bounded variation . . . . .
4.4 Absolutely continuous functions on R .
4.5 Singular distributions . . . . . . . . . .
4.5.1 Decomposition of a cdf . . . . . .
4.5.2 Cantor ternary set . . . . . . . .
4.5.3 Cantor ternary function . . . . .
4.6 Problems . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
113
113
119
125
128
133
133
134
136
137
5 Product Measures, Convolutions, and Transforms
5.1 Product spaces and product measures . . . . . . . .
5.2 Fubini-Tonelli theorems . . . . . . . . . . . . . . . .
5.3 Extensions to products of higher orders . . . . . . .
5.4 Convolutions . . . . . . . . . . . . !. . . . . ". . . . .
5.4.1 Convolution of measures on R, B(R) . . . .
5.4.2 Convolution of sequences . . . . . . . . . . .
5.4.3 Convolution of functions in L1 (R) . . . . . .
5.4.4 Convolution of functions and measures . . . .
5.5 Generating functions and Laplace transforms . . . .
5.6 Fourier series . . . . . . . . . . . . . . . . . . . . . .
5.7 Fourier transforms on R . . . . . . . . . . . . . . . .
5.8 Plancherel transform . . . . . . . . . . . . . . . . . .
5.9 Problems . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
147
147
152
157
160
160
162
162
164
164
166
173
178
181
6 Probability Spaces
6.1 Kolmogorov’s probability model . . . .
6.2 Random variables and random vectors
6.3 Kolmogorov’s consistency theorem . .
6.4 Problems . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
189
189
191
199
212
7 Independence
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
219
Contents
7.1
7.2
7.3
xv
Independent events and random variables . . . . . . . . . . 219
Borel-Cantelli lemmas, tail σ-algebras, and Kolmogorov’s
zero-one law . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
8 Laws of Large Numbers
8.1 Weak laws of large numbers . . . . . . . . . . .
8.2 Strong laws of large numbers . . . . . . . . . .
8.3 Series of independent random variables . . . . .
8.4 Kolmogorov and Marcinkiewz-Zygmund SLLNs
8.5 Renewal theory . . . . . . . . . . . . . . . . . .
8.5.1 Definitions and basic properties . . . . .
8.5.2 Wald’s equation . . . . . . . . . . . . .
8.5.3 The renewal theorems . . . . . . . . . .
8.5.4 Renewal equations . . . . . . . . . . . .
8.5.5 Applications . . . . . . . . . . . . . . .
8.6 Ergodic theorems . . . . . . . . . . . . . . . . .
8.6.1 Basic definitions and examples . . . . .
8.6.2 Birkhoff’s ergodic theorem . . . . . . . .
8.7 Law of the iterated logarithm . . . . . . . . . .
8.8 Problems . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
237
237
240
249
254
260
260
262
264
266
268
271
271
274
278
279
9 Convergence in Distribution
9.1 Definitions and basic properties . . . . . . . . . . . . . . .
9.2 Vague convergence, Helly-Bray theorems, and tightness .
9.3 Weak convergence on metric spaces . . . . . . . . . . . . .
9.4 Skorohod’s theorem and the continuous mapping theorem
9.5 The method of moments and the moment problem . . . .
9.5.1 Convergence of moments . . . . . . . . . . . . . . .
9.5.2 The method of moments . . . . . . . . . . . . . . .
9.5.3 The moment problem . . . . . . . . . . . . . . . .
9.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
287
287
291
299
303
306
306
307
307
309
10 Characteristic Functions
10.1 Definition and examples . . . . .
10.2 Inversion formulas . . . . . . . .
10.3 Levy-Cramer continuity theorem
10.4 Extension to Rk . . . . . . . . .
10.5 Problems . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
317
317
323
327
332
337
11 Central Limit Theorems
11.1 Lindeberg-Feller theorems . . . . . . . .
11.2 Stable distributions . . . . . . . . . . . .
11.3 Infinitely divisible distributions . . . . .
11.4 Refinements and extensions of the CLT
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
343
343
352
358
361
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xvi
Contents
11.4.1 The Berry-Esseen theorem . . . . . . . .
11.4.2 Edgeworth expansions . . . . . . . . . .
11.4.3 Large deviations . . . . . . . . . . . . .
11.4.4 The functional central limit theorem . .
11.4.5 Empirical process and Brownian bridge
11.5 Problems . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
361
364
368
372
374
376
12 Conditional Expectation and Conditional Probability
12.1 Conditional expectation: Definitions and examples . . .
12.2 Convergence theorems . . . . . . . . . . . . . . . . . . .
12.3 Conditional probability . . . . . . . . . . . . . . . . . .
12.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
383
383
389
392
393
13 Discrete Parameter Martingales
13.1 Definitions and examples . . . . . . . . . . . . .
13.2 Stopping times and optional stopping theorems
13.3 Martingale convergence theorems . . . . . . . .
13.4 Applications of martingale methods . . . . . .
13.4.1 Supercritical branching processes . . . .
13.4.2 Investment sequences . . . . . . . . . .
13.4.3 A conditional Borel-Cantelli lemma . . .
13.4.4 Decomposition of probability measures .
13.4.5 Kakutani’s theorem . . . . . . . . . . .
13.4.6 de Finetti’s theorem . . . . . . . . . . .
13.5 Problems . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
399
399
405
417
424
424
425
425
427
429
430
430
14 Markov Chains and MCMC
14.1 Markov chains: Countable state space . . . . . . . . . . .
14.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . .
14.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . .
14.1.3 Existence of a Markov chain . . . . . . . . . . . . .
14.1.4 Limit theory . . . . . . . . . . . . . . . . . . . . .
14.2 Markov chains on a general state space . . . . . . . . . . .
14.2.1 Basic definitions . . . . . . . . . . . . . . . . . . .
14.2.2 Examples . . . . . . . . . . . . . . . . . . . . . . .
14.2.3 Chapman-Kolmogorov equations . . . . . . . . . .
14.2.4 Harris irreducibility, recurrence, and minorization .
14.2.5 The minorization theorem . . . . . . . . . . . . . .
14.2.6 The fundamental regeneration theorem . . . . . .
14.2.7 Limit theory for regenerative sequences . . . . . .
14.2.8 Limit theory of Harris recurrent Markov chains . .
14.2.9 Markov chains on metric spaces . . . . . . . . . . .
14.3 Markov chain Monte Carlo (MCMC) . . . . . . . . . . . .
14.3.1 Introduction . . . . . . . . . . . . . . . . . . . . .
14.3.2 Metropolis-Hastings algorithm . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
439
439
439
440
442
443
457
457
458
461
462
464
465
467
469
473
477
477
478
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Contents
xvii
14.3.3 The Gibbs sampler . . . . . . . . . . . . . . . . . . . 480
14.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
15 Stochastic Processes
15.1 Continuous time Markov chains . . . . . . . . . . . . . .
15.1.1 Definition . . . . . . . . . . . . . . . . . . . . . .
15.1.2 Kolmogorov’s differential equations . . . . . . . .
15.1.3 Examples . . . . . . . . . . . . . . . . . . . . . .
15.1.4 Limit theorems . . . . . . . . . . . . . . . . . . .
15.2 Brownian motion . . . . . . . . . . . . . . . . . . . . . .
15.2.1 Construction of SBM . . . . . . . . . . . . . . . .
15.2.2 Basic properties of SBM . . . . . . . . . . . . . .
15.2.3 Some related processes . . . . . . . . . . . . . . .
15.2.4 Some limit theorems . . . . . . . . . . . . . . . .
15.2.5 Some sample path properties of SBM . . . . . .
15.2.6 Brownian motion and martingales . . . . . . . .
15.2.7 Some applications . . . . . . . . . . . . . . . . .
15.2.8 The Black-Scholes formula for stock price option
15.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
487
487
487
488
489
491
493
493
495
498
498
499
501
502
503
504
16 Limit Theorems for Dependent Processes
16.1 A central limit theorem for martingales . .
16.2 Mixing sequences . . . . . . . . . . . . . . .
16.2.1 Mixing coefficients . . . . . . . . . .
16.2.2 Coupling and covariance inequalities
16.3 Central limit theorems for mixing sequences
16.4 Problems . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
509
509
513
514
516
519
529
17 The Bootstrap
17.1 The bootstrap method for independent variables . . . . .
17.1.1 A description of the bootstrap method . . . . . . .
17.1.2 Validity of the bootstrap: Sample mean . . . . . .
17.1.3 Second order correctness of the bootstrap . . . . .
17.1.4 Bootstrap for lattice distributions . . . . . . . . .
17.1.5 Bootstrap for heavy tailed random variables . . . .
17.2 Inadequacy of resampling single values under dependence
17.3 Block bootstrap . . . . . . . . . . . . . . . . . . . . . . . .
17.4 Properties of the MBB . . . . . . . . . . . . . . . . . . . .
17.4.1 Consistency of MBB variance estimators . . . . . .
17.4.2 Consistency of MBB cdf estimators . . . . . . . . .
17.4.3 Second order properties of the MBB . . . . . . . .
17.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
533
533
533
535
536
537
540
545
547
548
549
552
554
556
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18 Branching Processes
561
18.1 Bienyeme-Galton-Watson branching process . . . . . . . . . 562
xviii
Contents
18.2 BGW process: Multitype case . . . . . . . . . . . . . . . . .
18.3 Continuous time branching processes . . . . . . . . . . . . .
18.4 Embedding of Urn schemes in continuous time branching
processes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . .
564
566
A Advanced Calculus: A Review
A.1 Elementary set theory . . . . . . . . . . . . . . . . . . . . .
A.1.1 Set operations . . . . . . . . . . . . . . . . . . . . .
A.1.2 The principle of induction . . . . . . . . . . . . . . .
A.1.3 Equivalence relations . . . . . . . . . . . . . . . . . .
A.2 Real numbers, continuity, differentiability, and integration .
A.2.1 Real numbers . . . . . . . . . . . . . . . . . . . . . .
A.2.2 Sequences, series, limits, limsup, liminf . . . . . . . .
A.2.3 Continuity and differentiability . . . . . . . . . . . .
A.2.4 Riemann integration . . . . . . . . . . . . . . . . . .
A.3 Complex numbers, exponential and trigonometric functions
A.4 Metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . .
A.4.1 Basic definitions . . . . . . . . . . . . . . . . . . . .
A.4.2 Continuous functions . . . . . . . . . . . . . . . . . .
A.4.3 Compactness . . . . . . . . . . . . . . . . . . . . . .
A.4.4 Sequences of functions and uniform convergence . .
A.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . .
573
573
574
577
577
578
578
580
582
584
586
590
590
592
592
593
594
568
569
B List of Abbreviations and Symbols
599
B.1 Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . 599
B.2 Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
References
603
Author Index
610
Subject Index
612
Measures and Integration: An Informal
Introduction
For many students who are learning measure and integration theory for
the first time, the notions of a σ-algebra of subsets of a set Ω, countable
additivity of a set function λ, measurability of a function, the definition
of an integral, and the interchange of limits and integration are not easy
to understand and often seem not so intuitive. The goals of this informal
introduction to this subject are (1) to show that the notions of σ-algebra
and countable additivity are logical consequences of certain natural approximation procedures; (2) the dividends for the assumption of these two
properties are great, and they lead to a nice and natural theory that is also
very powerful for the handling of limits. Of course, as the saying goes, the
devil is in the details. After this informal introduction, the necessary details are given in the next few sections. It is hoped that after this heuristic
explanation of the subject, the motivation for and the process of mastering
the details on the part of the students will be forthcoming.
What is Measure Theory?
A simple answer is that it is a theory about the distribution of mass over
a set S. If the mass is uniformly distributed and S is an Euclidean space
Rk , it is the theory of Lebesgue measure on Rk (i.e., length in R, area in
R2 , volume in R3 , etc.). Probability theory is concerned with the case when
S is the sample space of a random experiment and the total mass is one.
Consider the following example.
Imagine an open field S and a snowy night. At daybreak one goes to
the field to measure the amount of snow in as many of the subsets of S as
2
Measures and Integration: An Informal Introduction
possible. Suppose now that one has the tools to measure the snow exactly
on a class of subsets, such as triangles, rectangles, circular shapes, elliptic
shapes, etc., no matter how small. It is natural to try to approximate oddlyshaped regions by combinations of these “standard shapes,” and then use
a limiting process to obtain a measure for the oddly-shaped regions and
reach some limit for such sets. Let B denote the class of subsets of S whose
measure is obtained this way and let λ(B) denote the amount of snow in
each B ∈ B. Call B the class of all (snow) measurable sets and λ(B) the
measure (of snow) on B for each B ∈ B. It is reasonable to expect that the
following properties of B and λ(·) hold:
Properties of B
(i) A ∈ B ⇒ Ac ∈ B (i.e., if one can measure the amount of snow on
A and knows the total amount on S, then one knows the amount of
snow on Ac ).
(ii) A1 , A2 ∈ B ⇒ A1 ∪ A2 ∈ B (i.e., if one can measure the amount of
snow on A1 and A2 , then one can do the same for A1 ∪ A2 ).
(iii) #
If {An : n ≥ 1} ⊂ B, and An ⊂ An+1 for all n ≥ 1, then limn→∞ An ≡
∞
n=1 An ∈ B (i.e., if one can measure the amount of snow on An for
each n ≥ 1 on an increasing sequence of sets, then one can do so on
the limit of An ).
(iv) C ⊂ B where C is the class of nice sets such as triangles, squares, etc.,
that one started with.
Properties of λ(·)
(i) λ(A) ≥ 0 for A ∈ B (i.e., the amount of snow on any set is nonnegative!)
(ii) If A1 , A2 ∈ B, A1 ∩ A2 = ∅, λ(A1 ∪ A2 ) = λ(A1 ) + λ(A2 ) (i.e., the
amounts of snow on two disjoint sets simply add up! This property
of λ is referred to as finite additivity).
(iii) If {An : n ≥ 1} ⊂#B, are such that An ⊂ An+1 for all n, then
∞
λ(limn→∞ An ) = λ( n=1 An ) = limn→∞ λ(An ) (i.e., if we can approximate a set A by an increase sequence of sets {An }n≥1 from
B, then λ(A) = limn→∞ λ(An ). This property of λ is referred to as
monotone continuity from below, or m.c.f.b. in short).
This last assumption (iii) is what guarantees that different approximations lead to consistent limits. Thus, if there are two increasing sequences
!!
{A$n }n≥1 and {An }n≥1 having the same limit A but {λ(A$n )}n≥1 and
!!
{λ(An )}n≥1 have different limits, then the approximating procedures are
not consistent.
Measures and Integration: An Informal Introduction
3
It turns out that the above set of reasonable and natural assumptions
lead to a very rich and powerful theory that is widely applicable.
A triplet (S, B, λ) that satisfies the above two sets of assumptions is
called a measure space. The assumptions on B and λ are equivalent to the
following:
On B
B(i)$ : ∅, the empty set, lies in B
B(ii)$ : A ∈ B ⇒ Ac ∈ B (same as (i) before)
B(iii)$ : A1 , A2 , . . . ∈ B ⇒ ∪i Ai ∈ B (combines (ii) and (iii) above) (Closure
under countable unions).
On λ
λ(i)$ : λ(·) ≥ 0 (same as (i) before) and λ(∅) = 0.
$∞
λ(ii)$ : λ(∪n≥1 An ) =
n=1 λ(An ) if {An }n≥1 ⊂ B are pairwise disjoint,
i.e., Ai ∩ Aj = ∅ for i )= j (Countable additivity).
Any collection B of subsets of S that satisfies B(i)$ , B(ii)$ , B(iii)$ above
is called a σ-algebra. Any set function λ on a σ-algebra B that satisfies
λ(i)$ and λ(ii)$ above is called a measure. Thus, a measure space is a triplet
(S, B, λ) where S is a nonempty set, B is a σ-algebra of subsets of S and λ is
a measure on B. Notice that the σ-algebra structure on B and the countable
additivity of λ are necessary consequences of the very natural assumptions
(i), (ii), and (iii) on B and λ defined at the beginning.
It is not often the case that one is given B and λ explicitly. Typically,
one starts with a small collection C of subsets of S that have properties
resembling intervals or rectangles and a set function λ on C. Then, B is the
smallest σ-algebra containing C obtained from C by various operations such
as countable unions, intersections, and their limits. The key properties on
C that one needs are:
(i) A, B ∈ C ⇒ A ∩ B ∈ C (e.g., intersection of intervals is an interval).
(ii) A ∈ C ⇒ Ac is a finite union of sets from C (e.g., the complement of
an interval is the union of two intervals or an interval itself).
A collection C satisfying (i) and (ii) is called a semialgebra. The function
λ on B is an extension of λ on C. For this extension to be a measure on B,
the conditions needed are
(i) λ(A) ≥ 0
for all A ∈ C
(ii) If A1 , A2 , . . . ∈ C are pairwise disjoint and A =
$∞
λ(A) = n=1 λ(An ).
#
n≥1
An ∈ C, then
4
Measures and Integration: An Informal Introduction
There is a result, known as the extension theorem, that says that given
such a pair (C, λ), it is possible to extend λ to B, the smallest σ-algebra
containing C, such that (S, B, λ) is a measure space. Actually, it does more.
It constructs a σ-algebra B ∗ larger than B and a measure λ∗ on B ∗ such
that (S, B ∗ , λ∗ ) is a larger measure space, λ∗ coincides with λ on C and it
provides nice approximation theorems. For example, the following approximation result is available:
If B ∈ B∗ with λ∗ (B) < ∞, then for every # > 0, B can be approximated by
a finite union of sets from C, i.e., there
#kexist sets A1 , . . . , Ak ∈ C with k < ∞
such that λ∗ (A+B) < # where A ≡ i=1 Ai and A+B = (A∩B c )∪(Ac ∩B),
the symmetric difference between A and B.
That is, in principle, every (measurable) set B of finite measure (i.e., B belonging to B∗ with λ∗ (B) < ∞) is nearly a finite union of (elementary) sets
that belong to C. For example if S = R and C is the class of intervals, then
every measurable set of finite measure is nearly a finite union of disjoint
bounded open intervals.
The following are some concrete examples of the above extension procedure.
Theorem: (Lebesgue-Stieltjes measures on R). Let F : R → R satisfy
(i) x1 < x2 ⇒ F (x1 ) ≤ F (x2 ) (nondecreasing);
(ii) F (x) = F (x+) ≡ limy↓x F (y) for all x ∈ R (i.e., F (·) is right continuous).
Let C be the class of sets of the form (a, b], or (b, ∞), −∞ ≤ a < b < ∞.
Then, there exists a measure µF defined on B ≡ B(R), the smallest σalgebra generated by C such that
µF ((a, b]) = F (b) − F (a)
for all
− ∞ < a < b < ∞.
The σ-algebra B ≡ B(R) is called the Borel σ-algebra of R.
Corollary: There exists a measure m on B(R) such that m(I) = the length
of I, for any interval I.
Proof: Take F (x) ≡ x in the above theorem.
This measure is called the Lebesgue measure on R.
Corollary: There exists a measure λ on B(R) such that
% b
2
1
e−x /2 dx.
λ((a, b]) = √
2π a
Proof: Take F (x) =
&x
−∞
2
√1 e−u /2 du,
2π
x ∈ R.
!
Measures and Integration: An Informal Introduction
5
This measure is called the standard normal probability measure on R. !
Theorem: (Lebesgue-Stieltjes measures on R2 ). Let F : R2 → R be a
function satisfying the following:
(i) (Monotonicity)
For x = (x1 , x2 )$ , y = (y1 , y2 )$ with xi ≤ yi for i =
!
"
1, 2, ∆F (x, y) ≡ F (y1 , y2 ) − F (x1 , y2 ) − F (y1 , x2 ) + F (x1 , x2 ) ≥ 0.
(ii) (Continuity from above) F (x) =
lim
yi ↓xi ,i=1,2
F (y) for all x ∈ R2 .
Let C be the class of all rectangles of the form (a, b] ≡ (a1 , b1 ] × (a2 , b2 ]
with a = (a1 , a2 )$ , b = (b1 , b2 )$ ∈ R2 . Then there exists a measure µF ,
defined on the σ-algebra B ≡ B(R2 ), generated by C, such that
'
(
µF ((a, b]) = ∆F (a, b).
The above theorems have a converse that says that every measure on
(Rk , B(Rk )) that is finite on bounded sets arises from some function F
(called a distribution function) and is, therefore, a Lebesgue-Stieltjes measure.
Here is another simple example of a measure space (with discrete S).
Example: Let S = {s1 , s2 , . . . , sk }, k ≤ ∞, and let B = P(S), the power
set of S, i.e., the collection of all possible subsets of S. Let p1 , p2 , . . . be
nonnegative numbers. Let
)
pi IA (si ),
λ(A) ≡
1≤i≤k
where IA is the indicator function of the set A, defined by IA (s) = 1 if
s ∈ A and 0 otherwise. It is easy to verify that (S, B, λ) is a measure space
and also that every measure λ on B arises this way.
What is Integration Theory?
In short, it is a theory about weighted sums of functions on a set S when
the weights are specified by a mass distribution λ. Here is a more detailed
answer.
Let (S, B, λ) be a measure space. Suppose f : S → R is a simple function,
i.e., f is such that f (S) is a finite set {a1 , a2 , . . . , ak }. It is reasonable to
$k
define the weighted sum of f with respect to λ as
i=1 ai λ(Ai ) where
Ai = f −1 {ai }. Of course, for this to be well defined, one needs Ai to be in
B and λ(Ai ) < ∞ for all i such that ai )= 0.
$k
Notice that the quantity i=1 ai λ(Ai ) remains the same whether the ai ’s
are distinct or not. Call this the integral of f with respect to λ and denote
6
Measures and Integration: An Informal Introduction
&
&
this
& by f dλ.
& If f and g are simple, then for α,&β ∈ R, (αf + βg)dλ =
α f dx + β gdλ. Now how should one define f dλ (integral of f with
respect to λ) for a nonsimple f ? The answer, of course, is to “approximate”
by simple functions. Let f be a nonnegative function. To define the integral
of f , one would like to approximate f by simple functions. It turns out
that a necessary and sufficient condition for this is that for any a ∈ R,
the set {s : f (s) ≤ a} is in B. Such a function f is called measurable
with respect to B or B-measurable or simply, measurable (if B is kept fixed
throughout). Let f be a nonnegative B measurable function. Then there
exists a sequence {fn }n≥1 of simple nonnegative functions such that for
each s ∈ S, {fn (s)}n≥1 is a nondecreasing sequence converging to f (s). It
is now natural to define the weighted sum &of f with respect to λ, i.e., the
integral of f with respect to λ, denoted by f dλ, as
%
f dλ = lim
n→∞
%
fn dλ.
An immediate question is: Is the right side the same for all such approximating sequences {fn }n≥1 ? The answer is a yes; it is guaranteed by the
very natural assumption imposed on λ that it is finitely additive and monotone continuous from below, i.e. λ(ii) and λ(iii) (or equivalently, that λ is
countably additive, i.e., λ(ii)$ ).
One can strengthen this to a stronger result known as the monotone
convergence theorem, a key result that in turn leads to two other major
convergence results.
The monotone convergence theorem (MCT): Let (S, B, λ) be a measure space and let fn : S → R+ , n ≥ 1 be a sequence of nonnegative Bmeasurable functions (not necessarily simple) such that for all s ∈ S,
(i) fn (s) ≤ fn+1 (s),
for all
n ≥ 1,
and
(ii) lim fn (s) = f (s).
n→∞
Then f is B-measurable and
&
f dλ = lim
n→∞
&
fn dλ.
This says that the integral and the limit can be interchanged for monotone nondecreasing nonnegative B-measurable functions. Note that if fn =
IAn , the indicator function of a set An and if An ⊂ An+1 for each n, then
the MCT is the same as m.c.f.b. (cf. property λ(ii)). Thus, the very natural assumption of m.c.f.b. yields a basic convergence result that makes the
integration theory so elegant and
& powerful.
To extend the definition of f dλ to a real valued, B-measurable function
f : S → R, one uses the simple idea that f can be decomposed as f =
f + − f − where f + (s) = max{f (s), 0} and f − (s) = max{−f (s),
& 0}, s ∈ S.
Since both f + and f − are nonnegative and B-measurable, f + dλ and
Measures and Integration: An Informal Introduction
&
7
f − dλ are both well defined. Now set
%
%
%
f dλ = f + dλ − f − dλ,
provided at least one of the two terms on the right is finite.
function f
& The
+
dλ
and f − dλ
is said to be integrable with respect
to
(w.r.t.)
λ
if
both
f
&
are finite or, equivalently, if |f |dλ < ∞. The following is a consequence
of the MCT.
Fatou’s lemma: Let {fn }n≥1 be a sequence of nonnegative B-measurable
functions on a measure space (S, B, λ). Then
%
%
lim inf fn dλ ≤ lim inf fn dλ.
n→∞
n→∞
This in turn leads to
(Lebesgue’s) dominated convergence theorem (DCT): Let {fn }n≥1
be a sequence of B-measurable functions from a measure space (S, B, λ) to
R and let g be a B-measurable nonnegative integrable function on (S, B, λ).
Suppose that for each s in S,
(i) |fn (s)| ≤ g(s)
for all
n≥1
and
(ii) lim fn (s) = f (s).
n→∞
Then, f is integrable and
%
%
%
fn dλ = f dλ =
lim
lim fn dλ.
n→∞
n→∞
Thus some very natural assumptions on B and λ lead to an interesting
measure and integration theory that is quite general and that allows the
interchange of limits and integrals under fairly general conditions. A systematic treatment of the measure and integration theory is given in the
next five chapters.
1
Measures
Section 1.1 deals with algebraic operations on subsets of a given nonempty
set Ω. Section 1.2 treats nonnegative set functions on classes of sets and defines the notion of a measure on an algebra. Section 1.3 treats the extension
theorem, and Section 1.4 deals with completeness of measures.
1.1 Classes of sets
Let Ω be a nonempty set and P(Ω) ≡ {A : A ⊂ Ω} be the power set of Ω,
i.e., the class of all subsets of Ω.
Definition 1.1.1: A collection of sets F ⊂ P(Ω) is called an algebra if
(a) Ω ∈ F, (b) A ∈ F implies Ac ∈ F, and (c) A, B ∈ F implies A ∪ B ∈ F
(i.e., closure under pairwise unions).
Thus, an algebra is a class of sets containing Ω that is closed under
complementation and pairwise (and hence finite) unions. It is easy to see
that one can equivalently define an algebra by requiring that properties
(a), (b) hold and that the property
(c)
$
A, B ∈ F ⇒ A ∩ B ∈ F
holds (i.e. closure under finite intersections).
10
1. Measures
Definition 1.1.2: A class F ⊂ P(Ω) is called a σ-algebra if it is an algebra
and if it satisfies
*
An ∈ F.
(d) An ∈ F for n ≥ 1 ⇒
n≥1
Thus, a σ-algebra is a class of subsets of Ω that contains Ω and is closed
under complementation and countable unions. As pointed out in the introductory chapter, a σ-algebra can be alternatively defined as an algebra
that is closed under monotone unions as the following shows.
Proposition 1.1.1: Let F ⊂ P(Ω). Then F is a σ-algebra if and only if
F is an algebra and satisfies
*
An ∈ F.
An ∈ F, An ⊂ An+1 for all n ⇒
n≥1
∞
Proof: The ‘only if’ part is obvious.
#nFor the ‘if’ part, let {Bn }n=1 ⊂ F.
Then, since F is an#algebra, An#≡ j=1 Bj ∈ F for all n. Further, An ⊂
An+1 for all n and n≥1 Bn = n≥1 An . Since by hypothesis ∪n An ∈ F,
∪n Bn ∈ F.
!
Here are some examples of algebras and σ-algebras.
Example 1.1.1: Let Ω = {a, b, c, d}. Consider the classes
and
F1 = {Ω, ∅, {a}}
F2 = {Ω, ∅, {a}, {b, c, d}}.
Then, F2 is an algebra (and also a σ-algebra), but F1 is not an algebra,
since {a}c )∈ F1 .
Example 1.1.2: Let Ω be any nonempty set and let
and
F3 = P(Ω) ≡ {A : A ⊂ Ω},
the power set of
Ω
F4 = {Ω, ∅}.
Then, it is easy to check that both F3 and F4 are σ-algebras. The latter
σ-algebra is often called the trivial σ-algebra on Ω (Problem 1.1).
From the definition it is clear that any σ-algebra is also an algebra and
thus F2 , F3 , F4 are examples of algebras, too. The following is an example
of an algebra that is not a σ-algebra.
Example 1.1.3: Let Ω be a nonempty set, and let |A| denote the number
of elements of a set A ⊂ Ω. Define.
F5 = {A ⊂ Ω : either |A| is finite or |Ac | is finite}.
1.1 Classes of sets
11
Then, note that (i) Ω ∈ F5 (since |Ωc | = |∅| = 0)), (ii) A ∈ F5 implies
Ac ∈ F5 (if |A| < ∞, then |(Ac )c | = |A| < ∞ and if |Ac | < ∞, then
Ac ∈ F5 trivially). Next, suppose that A, B ∈ F5 . If either |A| < ∞ or
|B| < ∞, then
|A ∩ B| ≤ min{|A|, |B|} < ∞,
so that A ∩ B ∈ F5 . On the other hand, if both |Ac | < ∞ and |B c | < ∞,
then
|(A ∩ B)c | = |Ac ∪ B c | ≤ |Ac | + |B c | < ∞,
implying that A ∩ B ∈ F5 . Thus, property (c)$ holds, and F5 is an algebra.
However, if |Ω| = ∞, then F5 is not a σ-algebra. To see this, suppose that
|Ω| = ∞ and {ω#
1 , ω2 , . . .} ⊂ Ω. Then, by definition, Ai = {ωi } ∈ F5 for all
∞
i ≥ 1, but A ≡ i=1 A2i−1 = {ω1 , ω3 , . . .} )∈ F5 , since |A| = |Ac | = ∞.
Example 1.1.4: Let Ω be a nonempty set and let
F6 = {A ⊂ Ω : A is countable or Ac is countable}.
Then, it is easy to show that F6 is a σ-algebra (Problem 1.3).
Suppose {Fθ : θ ∈ Θ} is a family
+ of σ-algebras on Ω. From the definition,
it follows that the intersection θ∈Θ Fθ is a σ-algebra, no matter how large
the index set Θ is (Problem 1.4). However, the union of two σ-algebras may
not even be an algebra (Problem 1.5). For the development of measure
theory and probability theory, the concept of a σ-algebra plays a crucial
role. In many instances, given an arbitrary collection of subsets of Ω, one
would like to extend it to a possibly larger class that is a σ-algebra. This
leads to the following definition.
Definition 1.1.3: If A is a class of subsets of Ω, then the σ-algebra
generated by A, denoted by σ1A2, is defined as
,
σ1A2 =
F,
F∈I(A)
where I(A) ≡ {F : A ⊂ F and F is a σ-algebra on Ω} is the collection of
all σ-algebras containing the class A.
Note that since the power set P(Ω) contains A and is itself a σ-algebra,
the collection I(A) is not empty and hence, the intersection in the above
definition is well defined.
Example 1.1.5: In the setup of Example 1.1.1, σ1F1 2 = F2 (why?).
A particularly useful class of σ-algebras are those generated by open
sets of a topological space. These are called Borel σ-algebras. A topological
space is a pair (S, T ) where S is a nonempty set and T is a collection of
subsets of S such that#(i) S ∈ T , (ii) O1 , O2 ∈ T ⇒ O1 ∩ O2 ∈ T , and (iii)
{Oα : α ∈ I} ⊂ T ⇒ α∈I Oα ∈ T . Elements of T are called open sets.
12
1. Measures
A metric space is a pair (S, d) where S is a nonempty set and d is a
function from S × S to R+ satisfying (i) d(x, y) = d(y, x) for all x, y in S,
(ii) d(x, y) = 0 iff x = y, and (iii) d(x, z) ≤ d(x, y) + d(y, z) for all x, y, z in
S. Property (iii) is called the triangle inequality. The function d is called a
metric on S (cf. see A.4).
Any Euclidean space Rn (1 ≤ n < ∞) is a metric space under any one of
the following metrics:
(a) For 1 ≤ p < ∞, dp (x, y) =
(b) d∞ (x, y) = max |xi − yi |.
'$
n
i=1
|xi − yi |p
(1/p
.
1≤i≤n
(c) For 0 < p < 1, dp (x, y) =
'$
n
i=1
(
|xi − yi |p .
A metric space (S, d) is a topological space where a set O is open if for all
x ∈ O, there is an # > 0 such that B(x, #) ≡ {y : d(y, x) < #} ⊂ O.
Definition 1.1.4: The Borel σ-algebra on a topological space S (in particular, on a metric space or an Euclidean space) is defined as the σ-algebra
generated by the collection of open sets in S.
Example 1.1.6: Let B(Rk ) denote the Borel σ-algebra on Rk , 1 ≤ k < ∞.
Then,
B(Rk ) ≡ σ1{A : A is an open subset of Rk }2
is also generated by each of the following classes of sets
O1
O2
= {(a1 , b1 ) × . . . × (ak , bk ) : −∞ ≤ ai < bi ≤ ∞, 1 ≤ i ≤ k};
O4
= {(−∞, x1 ) × . . . × (−∞, xk ) : x1 , . . . , xk ∈ Q},
O3
= {(−∞, x1 ) × · · · × (−∞, xk ) : x1 , . . . , xk ∈ R};
= {(a1 , b1 ) × . . . × (ak , bk ) : ai , bi ∈ Q, ai < bi , 1 ≤ i ≤ k};
where Q denotes the set of all rational numbers.
To show this, note that σ1Oi 2 ⊂ B(Rk ) for i = 1, 2, 3, 4, and hence, it
is enough to show that σ1Oi 2 ⊃ B(Rk ). Let G be a σ-algebra containing
set A ⊂ Rk , there exist a sequence of
O3 . Observe that given any open #
sets {Bn }n≥1 in O3 such that A = n≥1 Bn (Problem 1.9). Since G is a σalgebra and Bn ∈ G for all n ≥ 1, A ∈ G. Thus, G is a σ-algebra containing
all open subsets of Rk , and hence G ⊃ B(Rk ). Hence, it follows that
B(Rk ) ⊃ σ1O1 2 ⊃ σ1O3 2 =
,
G:G⊃O3
G ⊃ B(Rk ).
1.1 Classes of sets
13
Next note that any interval (a, b) ⊂ R can be expressed in terms of half
spaces of the form (−∞, x), x ∈ R as
(a, b) =
∞
*
n=1
[(−∞, b) \ (−∞, a + n−1 )],
where for any two sets A and B, A\B = {x : x ∈ A, x ∈
/ B}. It is not
difficult to show that this implies that σ1Oi 2 = B(Rk ) for i = 2, 4 (Problem
1.10).
Example 1.1.7: Let Ω be a nonempty set with |Ω| = ∞ and F5 and F6
be as in Examples 1.1.3 and 1.1.4. Then F6 = σ1F5 2. To see this, note
that F6 is a σ-algebra containing F5 , so that σ1F5 2 ⊂ F6 . To prove the
reverse inclusion, let G be a σ-algebra containing F5 . It is enough to show
that F6 ⊂ G. Let A ∈ F6 . If A is countable, say
#∞A = {ω1 , ω2 , . . .}, then
Ai ≡ {ωi } ∈ F5 ⊂ G for all i ≥ 1 and hence A = i=1 Ai ∈ G. On the other
hand, if Ac is countable, then by the above argument, Ac ∈ G ⇒ A ∈ G.
Definition 1.1.5: A class C of subsets of Ω is a π-system or a π-class if
A, B ∈ C ⇒ A ∩ B ∈ C.
Example 1.1.8: The class C of intervals in R is a π-system whereas the
class of all open discs in R2 is not.
Definition 1.1.6: A class L of subsets of Ω is a λ-system or a λ-class if
(i) Ω ∈ L, (ii) A,#
B ∈ L, A ⊂ B ⇒ B \ A ∈ L, and (iii) An ∈ L, An ⊂ An+1
for all n ≥ 1 ⇒ n≥1 An ∈ L.
Example 1.1.9: Every σ-algebra F is a λ-system. But an algebra need
not be a λ-system.
It is easily checked that if L1 and L2 are λ-systems, then L1 ∩ L2 is also a
λ-system. Recall that σ1B2, the σ-algebra generated by B, is the intersection
of all σ-algebras containing B and is also the smallest σ-algebra containing
B. Similarly, for any B ⊂ P(Ω), the λ-system generated by B, denoted by
λ1B2, is defined as the intersection of all λ-systems containing B. It is the
smallest λ-system containing B.
Theorem 1.1.2: (The π-λ theorem). If C is a π-system, then λ1C2 = σ1C2.
Proof: For any C, σ1C2 is a λ-system and σ1C2 contains C. Thus, λ1C2 ⊂
σ1C2 for any C. Hence, it suffices to show that if C is a π-system, then λ1C2
is a σ-algebra . Since λ1C2 is a λ-system, it is closed under complementation
and monotone increasing unions. By Proposition 1.1.1, it is enough to show
that it is closed under intersection. Let λ1 (C) ≡ {A : A ∈ λ1C2, A ∩ B ∈
λ1C2 for all B ∈ C}. Then, λ1 (C) is a λ-system and C being a π-system,
λ1 (C) ⊃ C. Therefore, λ1 (C) ⊃ λ1C2. But λ1 (C) ⊂ λ1C2. So λ1 (C) = λ1C2.
14
1. Measures
Next, let λ2 (C) ≡ {A : A ∈ λ1C2, A ∩ B ∈ λ1C2 for all B ∈ λ1C2}. Then
λ2 (C) is a λ-system and by the previous step C ⊂ λ2 (C) ⊂ λ1C2. Hence,
it follows that λ2 (C) = λ1C2, i.e., λ1C2 is closed under intersection. This
completes the proof of the theorem.
!
Corollary 1.1.3: If C is a π-system and L is a λ-system containing C,
then L ⊃ σ1C2.
Remark 1.1.1: There are several equivalent definitions of λ-systems; see,
for example, Billingsley (1995). A closely related concept is that of a monotone class; see, for example, Chung (1974).
1.2 Measures
A set function is an extended real valued function defined on a class of
subsets of a set Ω. Measures are nonnegative set functions that, intuitively
speaking, measure the content of a subset of Ω. As explained in Section
2 of the introductory chapter, a measure has to satisfy certain natural
requirements, such as the measure of the union of a countable collection
of disjoint sets is the sum of the measures of the individual sets. Formally,
one has the following definition.
Definition 1.2.1: Let Ω be a nonempty set and F be an algebra on Ω.
Then, a set function µ on F is called a measure if
(a) µ(A) ∈ [0, ∞] for all A ∈ F;
(b) µ(∅) = 0;
(c) for any disjoint collection of sets A1 , A2 , . . . , ∈ F with
∞
'*
( )
µ
An =
µ(An ).
n≥1
#
n≥1
An ∈ F,
n=1
As discussed in Section 2 of the introductory chapter, these conditions on
µ are equivalent to finite additivity and monotone continuity from below.
Proposition 1.2.1: Let Ω be a nonempty set and F be an algebra of
subsets of Ω and µ be a set function on F with values in [0, ∞] and with
µ(∅) = 0. Then, µ is a measure iff µ satisfies
(iii)$a : (finite additivity) for all A1 , A2 ∈ F with A1 ∩A2 = ∅, µ(A1 ∪A2 ) =
µ(A1 ) + µ(A2 ), and
(iii)$b : (monotone continuity from below or, m.c.f.b., in short) for any
collection {An }n≥1 of sets in F such that An ⊂ An+1 for all n ≥ 1
1.2 Measures
and
#
n≥1
An ∈ F,
µ
'*
n≥1
15
(
An = lim µ(An ).
n→∞
Proof: Let µ be a measure on F. Since µ satisfies (iii), taking A3 , A4 , . . .
to be ∅ yields (iii)$a . This implies that for A and B in F, A ⊂ B ⇒
µ(B) = µ(A) + µ(B \ A) ≥ µ(A), i.e., µ is monotone. To establish (iii)$b ,
note that
# if µ(An ) = ∞ for some n = n0 , then µ(An ) = ∞ for all n ≥ n0
and µ( n≥1 An ) = ∞ and (iii)$b holds in this case. Hence, suppose that
µ(An ) < ∞ for all n ≥ 1. Setting Bn = An \ An−1 for n ≥ 1 (with A0 = ∅),
by (iii)$a , µ(Bn ) =#µ(An ) − µ(A
#n−1 ). Since {Bn }n≥1 is a disjoint collection
of sets in F with n≥1 Bn = n≥1 An , by (iii)
µ
'*
n≥1
An
(
∞
N
'*
( )
)
= µ
Bn =
µ(Bn ) = lim
[µ(An ) − µ(An−1 )]
=
N →∞
n=1
n≥1
n=1
lim µ(AN ),
N →∞
and so (iii)$b holds also in this case.
$
$
Conversely, let µ satisfy µ(∅) = 0 and
#nn }n≥1 be
# (iii)a and (iii)b . Let {A
a disjoint collection of sets in F with i≥1 Ai ∈ F. Let Cn = j=1 Aj for
n ≥ 1. Since F is#an algebra,#Cn ∈ F for all n ≥ 1. Also, Cn ⊂ Cn+1 for
all n ≥ 1. Hence, n≥1 Cn = j≥1 Aj . By (iii)$b ,
'*
(
'* (
= µ
Aj
Cn = lim µ(Cn )
µ
j≥1
=
=
n≥1
n
)
lim
n→∞
∞
)
n→∞
µ(Aj )
$
(by (iii)a )
j=1
µ(Aj ).
j=1
Thus, (iii) holds.
!
Remark 1.2.1: The definition of a measure given in Definition 1.2.1 is
valid when F is a σ-algebra. However, very often one may start with a
measure on an algebra A and then extend it to a measure on the σ-algebra
σ1A2. This is why the definition of a measure on an algebra is given here.
In the same vein, one may begin with a definition of a measure on a class of
subsets of Ω that form only a semialgebra (cf. Definition 1.3.1). As described
in the introductory chapter, such preliminary collections of sets are “nice”
sets for which the measure may be defined easily, and the extension to a σalgebra containing these sets may be necessary if one is interested in more
general sets. This topic is discussed in greater detail in the next section.
16
1. Measures
Definition 1.2.2: A measure µ is called finite or infinite according as
µ(Ω) < ∞ or µ(Ω) = ∞. A finite measure with µ(Ω) = 1 is called a
probability measure. A measure µ on a σ-algebra F is called σ-finite if
there exist a countable collection of sets A1 , A2 , . . . , ∈ F, not necessarily
disjoint, such that
*
An = Ω and (b) µ(An ) < ∞ for all n ≥ 1.
(a)
n≥1
Here are some examples of measures.
Example 1.2.1: (The counting measure). Let Ω be a nonempty set and
F3 = P(Ω) be the set of all subsets of Ω (cf. Example 1.1.2). Define
µ(A) = |A|,
A ∈ F3 ,
where |A| denotes the number of elements in A. It is easy to check that µ
satisfies the requirements (a)–(c) of a measure. This measure µ is called the
counting measure on Ω. Note that µ is finite iff Ω is finite and it is σ-finite
if Ω is countably infinite.
Example 1.2.2: (Discrete probability
measures). Let ω1 , ω2 , . . . , ∈ Ω and
$∞
p1 , p2 , . . . ∈ [0, 1] be such that i=1 pi = 1. Define for any A ⊂ Ω
P (A) =
∞
)
pi IA (ωi ),
i=1
where IA (·) denotes the indicator function of a set A, defined by IA (ω) = 0
or 1 according as ω )∈ A or ω ∈ A. For any disjoint collection of sets
A1 , A2 , . . . ∈ P(Ω),
P
-*
∞
i=1
Ai
.
=
∞
)
j=1
=
=
=
∞
)
pj I!∞
(ωj )
i=1 Ai
-)
∞
.
pj
IAi (ωj )
j=1
i=1
.
∞ -)
∞
)
pj IAi (ωj )
i=1
∞
)
j=1
P (Ai ),
i=1
where interchanging the order of summation is permissible since the summands are nonnegative. This shows that P is a probability measure on
P(Ω).
1.2 Measures
17
Example 1.2.3: (Lebesgue-Stieltjes measures on R). As mentioned in the
previous chapter (cf. Section 2), a large class of measures on the Borel
σ-algebra B(R) of subsets of R, known as the Lebesgue-Stieltjes measures,
arise from nondecreasing right continuous functions F : R → R. For each
such F , the corresponding measure µF satisfies µF ((a, b]) = F (b) − F (a)
for all −∞ < a < b < ∞. The construction of these µF ’s via the extension
theorem will be discussed in the
# next section. Also, note that if An =
(−n, n), n = 1, 2, . . ., then R = n≥1 An and µF (An ) < ∞ for each n ≥ 1
(such measures are called Radon measures) and thus, µF is necessarily
σ-finite.
Proposition 1.2.2: Let µ be a measure on an algebra F, and let
A, B, A1 , . . . , Ak ∈ F, 1 ≤ k < ∞. Then,
(i) (Monotonicity) µ(A) ≤ µ(B) if A ⊂ B;
(ii) (Finite subadditivity) µ(A1 ∪ . . . ∪ Ak ) ≤ µ(A1 ) + . . . + µ(Ak );
(iii) (Inclusion-exclusion formula) If µ(Ai ) < ∞ for all i = 1, . . . , k, then
µ(A1 ∪ . . . ∪ Ak )
=
k
)
i=1
µ(Ai ) −
)
1≤i<j<k
µ(Ai ∩ Aj )
+ . . . + (−1)k−1 µ(A1 ∩ . . . ∩ Ak ).
Proof: µ(B) = µ(A ∪ (Ac ∩ B)) = µ(A) + µ(B \ A) ≥ µ(A), by (a) and
(c) of Definition 1.2.1. This proves (i).
To prove (ii), note that if either µ(A) or µ(B) is finite, then µ(A∩B) < ∞,
by (i). Hence, using the countable additivity property (c), we have
µ(A ∪ B)
= µ(A) + µ(B \ A)
= µ(A) + [µ(B \ A) + µ(A ∩ B)] − µ(A ∩ B)
= µ(A) + µ(B) − µ(A ∩ B).
(2.1)
Hence, (ii) follows from (2.1) and by induction.
To prove (iii), note that the case k = 2 follows from (2.1). Next, suppose
that (iii) holds for all sets A1 , . . . , Ak ∈ F with µ(Ai ) < ∞ for all i =
1, . . . , k for some k = n, n ∈ N. To show that it holds for k = n + 1, note
that by (2.1),
µ
= µ
- n+1
*
-
i=1
n
*
i=1
Ai
Ai
.
.
-*
.
n
(Ai ∩ An+1 )
+ µ(An+1 ) − µ
i=1
18
1. Measures
=
/)
n
i=1
µ(Ai ) −
+ µ(An+1 ) −
)
1<i<j≤n
1)
n
i=1
µ(Ai ∩ Aj ) + · · · + (−1)n−1 µ(A1 ∩ . . . ∩ An )
µ(Ai ∩ An+1 ) −
)
1≤i<j≤n
+ · · · + (−1)
n−1
=
n+1
)
i=1
µ(Ai ) −
)
µ(Ai ∩ Aj ∩ An+1 )
µ(A1 ∩ . . . ∩ An+1 )
µ(Ai ∩ Aj ) + · · · + (−1) µ
n
1≤i<j≤n+1
0
- n+1
,
j=1
By induction, this completes the proof of Proposition 1.2.2.
.
2
Aj .
!
In Proposition 1.2.1, it was shown that a set function µ on an algebra
F is a measure iff it is finitely additive and monotone continuous from
below. A natural question is: if µ is a measure
+ on F and {An }n≥1 is a
collection of decreasing sets in F with A ≡ n≥1 An also in F, does the
relation µ(A) = limn→∞ µ(An ) hold, i.e., does monotone continuity from
above hold? The answer is positive under the assumption µ(An0 ) < ∞ for
some n0 ∈ N. It turns out that, in general, this assumption cannot be
dropped (Problem 1.18).
Proposition 1.2.3: Let µ be a measure on an algebra F.
(i) (Monotone continuity from above) Let {An }n≥1 be
+ a sequence of sets
in F such that An+1 ⊂ An for all n ≥ 1 and A ≡ n≥1 An ∈ F. Also,
let µ(An0 ) < ∞ for some n0 ∈ N. Then,
lim µ(An ) = µ(A).
n→∞
(ii) (Countable
subadditivity) If {An }n≥1 is a sequence of sets in F such
#
that n≥1 An ∈ F, then
. )
-*
∞
∞
An ≤
µ(An ).
µ
n=1
n=1
Proof: To prove part (i), without loss of generality (w.l.o.g.), assume that
n0 = 1, i.e., µ(A1 ) < ∞. Let Cn = A1 \ An for n ≥ 1, and C∞ = A1 \ A.
!
Then Cn and C∞ belong to F and Cn ↑ C∞ . By Proposition 1.2.1 (iii)b ,
!
(i.e., by the m.c.f.b. property), µ(Cn ) ↑ µ(C∞ ) and by (iii)a , (i.e., finite
additivity), µ(Cn ) = µ(A1 ) − µ(An ) for all 1 ≤ n < ∞, due to the fact
µ(A1 ) < ∞. This proves (i).
1.3 The extension theorems and Lebesgue-Stieltjes measures
19
#n
#
To prove part (ii), let Dn = i=1 Ai , n ≥ 1. Then, Dn ↑ D ≡ i≥1 Ai .
Hence, by m.c.f.b. and finite subadditivity,
µ(D) = lim µ(Dn ) ≤ lim
n→∞
n→∞
n
)
µ(Ai ) =
i=1
∞
)
µ(An ).
!
n=1
Theorem 1.2.4: (Uniqueness of measures). Let µ1 and µ2 be two finite
measures on a measurable space (Ω, F). Let C ⊂ F be a π-system such
that F = σ1C2. If µ1 (C) = µ2 (C) for all C ∈ C and µ1 (Ω) = µ2 (Ω), then
µ1 (A) = µ2 (A) for all A ∈ F.
Proof: Let L ≡ {A : A ∈ F, µ1 (A) = µ2 (A)}. It is easy to verify that L
is a λ-system. Since C ⊂ L, by Theorem 1.1.2, L = σ1C2 = F.
!
1.3 The extension theorems and Lebesgue-Stieltjes
measures
As discussed earlier, in many situations, one starts with a given set function µ defined on a small class C of subsets of a set Ω and then wants to
extend µ to a larger class M by some approximation procedure. In this
section, a general result in this direction, known as the extension theorem,
is established. This is then applied to the construction of Lebesgue-Stieltjes
measures on Euclidean spaces. For another application, see Chapter 6.
1.3.1
Caratheodory extension of measures
Definition 1.3.1: Let Ω be a nonempty set and let P(Ω) be the power set
of Ω. A class C ⊂ P(Ω) is called a semialgebra if (i) A, B ∈ C ⇒ A ∩ B ∈ C
and (ii) for any A ∈ C, there exist sets B1 , B2 , . . . , Bk ∈ C, for some 1 ≤
#k
k < ∞, such that Bi ∩ Bj = ∅ for i )= j, and Ac = i=1 Bi .
Example 1.3.1: Ω = R, C ≡ {(a, b], (b, ∞) : −∞ ≤ a, b < ∞}.
Example 1.3.2: Ω = R, C ≡ {I : I is an interval}. An interval I in R is
a set in R such that a, b ∈ I, a < b ⇒ (a, b) ⊂ I.
Example 1.3.3: Ω = Rk , C ≡ {I1 × I2 × ×Ik : Ij is an interval in R for
1 ≤ j ≤ k}.
Recall that a collection A ⊂ P(Ω) is an algebra if it is closed under
finite union and complementation. It is easily verified (Problem 1.19) that
the smallest algebra containing a semialgebra C is A(C) ≡ {A : A =
#k
i=1 Bi , Bi ∈ C for i = 1, . . . , k, k < ∞, }, i.e., the class of finite unions of
sets from C.
20
1. Measures
Definition 1.3.2: A set function µ on a semialgebra C, taking values in
R̄+ ≡ [0, ∞], is called a measure
# if (i) µ(∅) = 0 and (ii) for any sequence
of sets {An }n≥1 ⊂ C with n≥1 An ∈ C, and Ai ∩ Aj = ∅ for i )= j,
$∞
#
µ( n≥1 An ) = n=1 µ(An ).
Proposition 1.3.1: Let µ be a measure on a semialgebra C. Let A ≡ A(C)
be the smallest algebra generated by C. For each A ∈ A, set
µ̄(A) =
k
)
µ(Bi ),
i=1
#k
if the set A has the representation A = i=1 Bi for some B1 , . . . , Bk ∈
C, k < ∞ with Bi ∩ Bj = ∅ for i )= j. Then,
#k
(i) µ̄ is independent of the representation of A as A = i=1 Bi ;
(ii) µ̄ is finitely additive on A, i.e., A, B ∈ A, A ∩ B = ∅ ⇒ µ̄(A ∪ B) =
µ̄(A) + µ̄(B); and
(iii) µ̄ is countably additive
# on A, i.e., if An ∈ A for all n ≥ 1, Ai ∩Aj = ∅
for all i = j, and n≥1 An ∈ A, then
∞
( )
'*
An =
µ̄(An ).
µ̄
n≥1
n=1
Proof: Parts (i) and (ii) are easy to verify. Turning to part (iii), let each
# kn
#
kn
n ≥ 1, An = j=1
Bnj , Bnj ∈ C, {Bnj }j=1
disjoint. Since n≥1 An ∈ A
#
#k
then exist disjoint sets {Bi }ki=1 ⊂ C such that n≥1 An = i=1 Bi . Now
(
'*
*
Bi = Bi ∩
An =
(Bi ∩ An )
n≥1
=
*
kn
*
n≥1 j=1
n≥1
(Bi ∩ Bnj ).
Since for all i, Bi ∈ C, Bi ∩ Bnj ∈ C for all j, n and µ is a measure on C
µ(Bi ) =
kn
))
n≥1 j=1
µ(Bi ∩ Bnj ).
Thus,
'*
(
µ̄
An
n≥1
=
k
)
i=1
µ(Bi ) =
kn
k )-)
)
i=1 n≥1
j=1
µ(Bi ∩ Bnj )
.
1.3 The extension theorems and Lebesgue-Stieltjes measures
=
kn
k )
)-)
i=1 j=1
n≥1
=
)
µ(Bi ∩ Bnj )
21
.
µ̄(An ),
n≥1
since
An
=
=
An ∩
k
*
Bi
i=1
kn
k *
*
i=1 j=1
(Bi ∩ Bnj ).
!
Thus, the set function µ̄ defined above is a measure on A. To go beyond
A = A(C) to σ1A2, which is also the same as σ1C2 (see Problem 1.19 (ii)),
the first step involves using the given set function µ on C to define a set
function µ∗ on the class of all subsets of Ω, i.e., on P(Ω), where for any
A ∈ P(Ω), µ∗ (A) is an estimate ‘from above’ of the µ-content of A.
Definition 1.3.3: Given a measure µ on a semialgebra C, the outer measure induced by µ is the set function µ∗ , defined on P(Ω), as
0
/)
∞
*
∗
µ (A) ≡ inf
(3.1)
µ(An ) : {An }n≥1 ⊂ C, A ⊂
An .
n=1
n≥1
Thus, a given set A is covered by countable unions of sets from C and
the sums of the measures on such covers are computed and µ∗ (A) is the
greatest lower bound one can get in this way. It is not difficult to show
that on C and A, this is not an overestimate. That is, µ∗ = µ on C and
on A, µ∗ = µ̄. Now suppose µ(Ω) < ∞. Let A ⊂ Ω be a set such that the
overestimates of the µ-contents of A and Ac , namely, µ∗ (A) and µ∗ (Ac ),
add up to µ(Ω), i.e.,
µ(Ω) = µ∗ (Ω) = µ∗ (A) + µ∗ (Ac ).
(3.2)
Then, there is no room for error and the estimates µ∗ (A) and µ∗ (Ac ) are
in fact not overestimates at all. In this case, the set A may be considered
‘exactly measurable’ by this approximation procedure. This observation
motivates the following definition.
Definition 1.3.4: A set A is said to be µ∗ -measurable if
µ∗ (E) = µ∗ (E ∩ A) + µ∗ (E ∩ Ac ) for all E ⊂ Ω.
(3.3)
In other words, an analog of (3.2) should hold in every portion E of Ω for
a set A to be µ∗ -measurable.
22
1. Measures
It can be shown (Problem 1.20) that µ∗ defined in (3.1) satisfies:
µ∗ (∅)
A⊂B
=
⇒
0,
µ∗ (A) ≤ µ∗ (B),
(3.4)
(3.5)
and for any {An }n≥1 ⊂ P(Ω),
µ∗
'*
n≥1
∞
( )
An ≤
µ∗ (An ) .
(3.6)
n=1
Definition 1.3.5: Any set function µ∗ : P(Ω) → R̄+ ≡ [0, ∞] satisfying
(3.4)–(3.6) is called an outer measure on Ω.
The following result (due to C. Caratheodory) yields a measure space
on Ω starting from a general outer measure µ∗ that need not arise from a
measure µ as in (3.1).
Theorem 1.3.2: Let µ∗ be an outer measure on Ω, i.e., it satisfies (3.4)–
(3.6). Let M ≡ Mµ∗ ≡ {A : A is µ∗ -measurable, i.e., A satisfies (3.3)}.
Then
(i) M is a σ-algebra,
(ii) µ∗ restricted to M is a measure, and
(iii) µ∗ (A) = 0 ⇒ P(A) ⊂ M.
Proof: From (3.3), it follows that ∅ ∈ M and that A ∈ M ⇒ Ac ∈ M.
Next, it will be shown that M is closed under finite unions. Let A1 , A2 ∈ M.
Then, for any E ⊂ Ω,
µ∗ (E)
= µ∗ (E ∩ A1 ) + µ∗ (E ∩ Ac1 ) (since A1 ∈ M)
= µ∗ (E ∩ A1 ∩ A2 ) + µ∗ (E ∩ A1 ∩ Ac2 )
+µ∗ (E ∩ Ac1 ∩ A2 ) + µ∗ (E ∩ Ac1 ∩ Ac2 )
(since A2 ∈ M).
But (A1 ∩ A2 ) ∪ (A1 ∩ Ac2 ) ∪ (Ac1 ∩ A2 ) = A1 ∪ A2 . Since µ∗ is subadditive,
it follows that
µ∗ (E ∩ (A1 ∪ A2 )) ≤ µ∗ (E ∩ A1 ∩ A2 ) + µ∗ (E ∩ A1 ∩ Ac2 ) + µ∗ (E ∩ Ac1 ∩ A2 ).
Thus
µ∗ (E) ≥ µ∗ (E ∩ (A1 ∪ A2 )) + µ∗ (E ∩ (A1 ∪ A2 )c ).
The subadditivity of µ∗ yields the opposite inequality and so, A1 ∪ A2 ∈ M
and hence, M is an algebra. To show that M is a σ-algebra, it suffices to
show that M is closed#under monotone unions, i.e., An ∈ M, An ⊂ An+1
for all n ≥ 1 ⇒ A ≡ n≥1 An ∈ M. Let B1 = A1 and Bn = An ∩ Acn−1
1.3 The extension theorems and Lebesgue-Stieltjes measures
23
for all n ≥ 2. Then,
#∞for all n ≥ 1, Bn ∈ M (since M is an algebra),
#
n
B
=
A
,
and
n
j=1 j
j=1 Bj = A. Hence, for any E ⊂ Ω,
µ∗ (E)
= µ∗ (E ∩ An ) + µ∗ (E ∩ Acn )
= µ∗ (E ∩ An ∩ Bn ) + µ∗ (E ∩ An ∩ Bnc ) + µ∗ (E ∩ Acn )
(since Bn ∈ M)
= µ (E ∩ Bn ) + µ (E ∩ An−1 ) + µ (E ∩ Acn )
n
)
=
µ∗ (E ∩ Bj ) + µ∗ (E ∩ Acn ) (by iteration)
∗
∗
∗
j=1
≥
n
)
j=1
µ∗ (E ∩ Bj ) + µ∗ (E ∩ Ac )
(by monotonicity).
Now letting n → ∞, and using the subadditivity of µ∗ and the fact that
#
∞
j=1 Bj = A, one gets
µ∗ (E) ≥
∞
)
j=1
∗
µ∗ (E ∩ Bj ) + µ∗ (E ∩ Ac )
≥ µ (E ∩ A) + µ∗ (E ∩ Ac ).
This completes the proof of part (i).
To #
prove part (ii), let {Bn }n≥1 ⊂ M and Bi ∩ Bj = ∅ for i )= j. Let
∞
Aj = i=j Bi , j ∈ N. Then, by (i), Aj ∈ M for all j ∈ N and so
µ∗ (A1 )
=
µ∗ (A1 ∩ B1 ) + µ∗ (A1 ∩ B1c ) (since B1 ∈ M)
= µ∗ (B1 ) + µ∗ (A2 )
= µ∗ (B1 ) + µ∗ (B2 ) + µ∗ (A3 ) (by iteration)
n
)
=
µ∗ (Bi ) + µ∗ (An+1 ) (by iteration)
i=1
≥
n
)
µ∗ (Bi )
i=1
for all n ≥ 1.
Now letting n → ∞, one has µ∗ (A1 ) ≥
µ∗ , the opposite inequality holds and so
µ∗ (A1 ) = µ∗
-*
∞
i=1
Bi
$∞
.
i=1
=
µ∗ (Bi ). By subadditivity of
∞
)
µ∗ (Bi ),
i=1
proving (ii).
As for (iii), note that by monotonicity, µ∗ (A) = 0, B ⊂ A ⇒ µ∗ (B) = 0
and hence, for any E, µ∗ (E ∩ B) = 0. Since µ∗ (E) ≥ µ∗ (E ∩ B c ), this
24
1. Measures
implies µ∗ (E) ≥ µ∗ (E ∩ B c ) + µ∗ (E ∩ B). The opposite inequality holds by
!
the subadditivity of µ∗ . So B ∈ M, and (iii) is proved.
Definition 1.3.6: A measure space (Ω, F, ν) is called complete if for any
A ∈ F with ν(A) = 0 ⇒ P(A) ⊂ F.
Thus, by part (iii) of the above theorem, (Ω, Mµ∗ , µ∗ ) is a complete
measure space. Now the above theorem is applied to a µ∗ that is generated
by a given measure µ on a semialgebra C via (3.1).
Theorem 1.3.3: (Caratheodory’s extension theorem). Let µ be a measure
on a semialgebra C and let µ∗ be the set function induced by µ as defined
by (3.1). Then,
(i) µ∗ is an outer measure,
(ii) C ⊂ Mµ∗ , and
(iii) µ∗ = µ on C, where Mµ∗ is as in Theorem 1.3.2.
Proof: The proof of (i) involves verifying (3.4)–(3.6), which is left as
an exercise (Problem 1.20). To prove
# (ii), let A ∈ C. Let E ⊂ Ω and
{An }n≥1 ⊂ C be such that E ⊂ n≥1 An . Then, for all i ∈ N, Ai =
(Ai ∩ A) ∪ (Ai ∩ B1 ) ∪ . . . ∪ (Ai ∩ Bk ) where B1 , . . . , Bk are disjoint sets in
#k
C such that j=1 Bj = Ac . Since µ is finitely additive on C,
µ(Ai )
⇒
∞
)
n=1
µ(An )
=
=
µ(Ai ∩ A) +
∞
)
n=1
∗
k
)
j=1
µ(Ai ∩ Bj )
µ(An ∩ A) +
k
∞ )
)
n=1 j=1
c
µ(An ∩ Bj )
≥ µ (E ∩ A) + µ∗ (E ∩ A ),
since {An ∩ A}n≥1 and {An ∩ Bj : 1 ≤ j ≤ k, n ≥ 1} are both countable
subcollections of C whose unions cover E ∩A and E ∩Ac , respectively. From
the definition of µ∗ (E), it now follows that
µ∗ (E) ≥ µ∗ (E ∩ A) + µ∗ (E ∩ Ac ).
Now the subadditivity of µ∗ completes the proof of part (ii).
To prove (iii), let A ∈ C. Then, by definition, µ∗ (A) ≤ µ(A). If µ∗ (A) =
of
∞, then µ(A) = ∞ = µ∗ (A). If µ∗ (A) < ∞, then by the definition
#
‘infimum,’ for any # > 0, there exists {An }n≥1 ⊂ C such that A ⊂ n≥1 An
and
∞
)
µ∗ (A) ≤
µ(An ) ≤ µ∗ (A) + #.
n=1
1.3 The extension theorems and Lebesgue-Stieltjes measures
25
#
#
But A = A ∩ ( n≥1 An ) = n≥1 (A ∩ An ). Note that the set function µ̄
defined in Proposition 1.3.1 is a measure on A(C) and it coincides with µ
on C. Since A, A ∩ An ∈ C for all n ≥ 1, by Proposition 1.2.3 (b) applied to
µ̄,
µ(A) = µ̄(A) ≤
∞
)
n=1
µ̄(A ∩ An ) ≤
∞
)
µ̄(An ) =
n=1
∞
)
n=1
µ(An ) ≤ µ∗ (A) + #.
!
#Alternately,#w.l.o.g., assume that {An }n≥1 are disjoint. Then A = A ∩
Since A ∩ An ∈ C for all n, by the countable
n≥1 An =
n≥1 (A ∩ An ). $
"
$∞
∞
additivity of µ on C, µ(A) = n=1 µ(A ∩ An ) ≤ n=1 µ(An ) = µ∗ (A) + #.
!
Since # > 0 is arbitrary, this yields µ(A) ≤ µ∗ (A).
Thus, given a measure µ on a semialgebra C ⊂ P(Ω), there is a complete
measure space (Ω, Mµ∗ , µ∗ ) such that Mµ∗ ⊃ C and µ∗ restricted to C
equals µ. For this reason, µ∗ is called an extension of µ. The measure space
(Ω, Mµ∗ , µ∗ ) is called the Caratheodory extension of µ. Since Mµ∗ is a σalgebra and contains C, Mµ∗ must contain σ1C2, the σ-algebra generated
by C, and thus, (Ω, σ1C2, µ∗ ) is also a measure space. However, the latter
may not be complete (see Section 1.4).
Now the above method is applied to the construction of LebesgueStieltjes measures on R and R2 .
1.3.2
Lebesgue-Stieltjes measures on R
Let F : R → R be nondecreasing. For x ∈ R, let F (x+) ≡ limy↓x F (y),
and F (x−) ≡ limy↑x F (y). Set F (∞) = limx↑∞ F (x) and F (−∞) =
limx↓−∞ F (y). Let
3
4 3
4
C ≡ (a, b] : −∞ ≤ a ≤ b < ∞ ∪ (a, ∞) : −∞ ≤ a < ∞ .
(3.7)
Define
µF ((a, b])
µF ((a, ∞))
= F (b+) − F (a+),
= F (∞) − F (a+).
(3.8)
Then, it may be verified that
(i) C is a semialgebra;
(ii) µF is a measure on C. (For (ii), one needs to use the Heine-Borel
theorem. See Problems 1.22 and 1.23.)
Let (R, Mµ∗F , µ∗F ) be the Caratheodory extension of µF , i.e., the measure
space constructed as in the above two theorems.
26
1. Measures
Definition 1.3.7: Let F : R → R be nondecreasing. The (measure) space
(R, Mµ∗F , µ∗F ) is called a Lebesgue-Stieltjes measure space and µ∗F is the
Lebesgue-Stieltjes measure generated by F .
Since σ1C2 = B(R), the class of all Borel sets of R, every LebesgueStieltjes measure µ∗F is also a measure on (R, B(R)). Note also that µ∗F is
finite on bounded intervals.
Conversely, given any Radon measure µ on (R, B(R)), i.e., a measure on
(R, B(R)) that is finite on bounded intervals, set

if x > 0
 µ((0, x])
0
if x = 0
F (x) =

−µ((x, 0]) if x ≤ 0.
Then µF = µ on C. By the uniqueness of the extension (discussed later in
this section, see also Theorem 1.2.4), it follows that µ∗F coincides with µ on
B(R). Thus, every Radon measure on (R, B(R)) is necessarily a LebesgueStieltjes measure.
Definition 1.3.8: (Lebesgue Measure on R). When F (x) ≡ x, x ∈ R,
the measure µ∗F is called the Lebesgue measure and the σ-algebra Mµ∗F is
called the class of Lebesgue measurable sets.
The Lebesgue measure will be denoted by m(·) or µL (·). Given below
are some important results on m(·).
(i) It follows from equation (3.1), that µ∗F ({x}) = F (x+) − F (x−) and
hence = 0 if F is continuous at x. Thus m({x}) ≡ 0 on R.
(ii) By countable additivity of m(·), m(A) = 0 for any countable set A.
(iii) (Cantor set). There exists an uncountable set C such that m(C) = 0.
An example is the Cantor set constructed
Start with I0 =
' as follows:
(
1 2
[0, 1]. Delete the open middle third, i.e., 3 , 3 . Next from the closed
1
2
intervals
I(11 = [0,
' 3 ] and
( I12 = [ 3 , 1] delete the open middle thirds,
'
i.e., 19 , 29 and 79 , 29 , respectively. Repeat this process of deleting
the middle third from each of the remaining closed intervals. Thus
at stage n there will be 2n−1 new closed intervals and 2n−1 deleted
open intervals, each of length 31n . Let Un denote the union of all the
deleted open intervals
# at the nth stage. Then {Un }n≥1 are disjoint
open sets. Let U ≡ n≥1 Un . By countable additivity
m(U ) =
∞
)
n=1
m(Un ) =
∞
)
2n−1
= 1.
3n
n=1
Let C ≡ [0, 1] − U . Since U is open and$[0,1] is closed, C is nonempty.
∞
It can be shown that C ≡ {x : x = 1 a3ii , ai = 0 or 2} (Problem
1.3 The extension theorems and Lebesgue-Stieltjes measures
27
1.33). Thus C can be mapped in (1–1) manner on to the set of all
sequences {δi }i≥1 such that δi = 0 or 2. But this set is uncountable.
Since m([0, 1]) = 1, it follows that m(C) = 0. For more properties of
the Cantor set, see Rudin (1976) and Chapter 4.
(iv) m(·) is invariant under reflection and translation. That is, for any E
in B(R),
m(−E) = m(E) and m(E + c) = m(E)
for all c in R where −E = {−x : x ∈ E} and E + c = {y : y =
x + c, x ∈ E}. This follows from Theorem 1.2.4 and the fact that the
claim holds for intervals (cf. Problem 2.48).
(v) There exists a subset A ⊂ R such that A )∈ Mm . That is, there exists
a non-Lebesgue measurable set. The proof of this requires the use of
the axiom of choice (cf. A.1). For a proof see Royden (1988).
1.3.3
Lebesgue-Stieltjes measures on R2
Let F : R2 → R satisfy
F (a2 , b2 ) − F (a2 , b1 ) − F (a1 , b2 ) + F (a1 , b1 ) ≥ 0,
(3.9)
F (a2 , b2 ) − F (a1 , b1 ) ≥ 0,
(3.10)
and
for all a1 ≤ a2 , b1 ≤ b2 . Extend F to R̄2 by appropriate limiting procedure.
Let
C2 ≡ {I1 × I2 : I1 , I2 ∈ C},
(3.11)
where C is as in (3.7). Next, for I1 = (a1 , a2 ], I2 = (b1 , b2 ], − ∞ <
a1 , a2 , b1 , b2 < ∞, set
µF (I1 × I2 ) ≡ F (a2 +, b2 +) − F (a2 +, b1 +) − (F (a1 +, b2 +) + F (a1 +, b1 +)),
(3.12)
where for any a, b ∈ R, F (a+, b+) is defined as
F (a+, b+) ≡
lim
a! ↓a,b! ↓b
F (a$ , b$ ).
Note that by (3.10), the limit exists and hence, F (a+, b+) is well defined.
Further, by (3.9), the right side of (3.12) is nonnegative. Next extend the
definition of µF to unbounded sets in C2 by the limiting procedure:
!
"
(3.13)
µF (I1 × I2 ) = lim µF (I1 × I2 ) ∩ Jn ,
n→∞
where Jn = (−n, n]×(−n, n]. Then it may be verified (Problems 1.24, 1.25)
that
28
1. Measures
(i) C2 is a semialgebra
(ii) µF is a measure on C2 .
Let (R2 , Mµ∗F , µ∗F ) be the measure space constructed as in the above two
theorems. The measure µ∗F is called the Lebesgue-Stieltjes measure generated by F and Mµ∗F a Lebesgue-Stieltjes σ-algebra. Again, in this case,
Mµ∗ includes the σ-algebra σ1C2 ≡ B(R2 ) and so (R2 , B(R2 ), µ∗F ) is also a
measure space. If F (a, b) = ab, then µF is called the Lebesgue measure on
R2 .
A similar construction holds for any Rk , k < ∞.
1.3.4
More on extension of measures
Next the uniqueness of the extension µ∗ and approximation of the µ∗ measure of a set in Mµ∗ by that of a set from the algebra A ≡ A(C) are
considered. As in the case of measures defined on an algebra, a measure µ
on a semialgebra C ⊂ P(Ω) is said to be σ-finite if there exists a countable
collection
{An }n≥1 ⊂ C such that (i) µ(An ) < ∞ for each n ≥ 1 and (ii)
#
n≥1 An = Ω. The following approximation result holds.
Theorem 1.3.4: Let A ∈ Mµ∗ and µ∗ (A) < ∞. Then, for each # > 0,
there exist B1 , B2 , . . . , Bk ∈ C, k < ∞ with Bi ∩ Bj = ∅ for 1 ≤ i )= j ≤ k,
such that
.
k
*
∗
Bj < #,
µ A+
j=1
where for any two sets E1 and E2 , E1 + E2 is the symmetric difference of
E1 and E2 , defined by E1 + E2 ≡ (E1 ∩ E2c ) ∪ (E1c ∩ E2 ).
Proof: By definition of µ∗ , µ∗ (A)#< ∞ implies that for every # > 0, there
exist {Bn }n≥1 ⊂ C such that A ⊂ n≥1 Bn and
µ∗ (A) ≤
∞
)
n=1
µ(Bn ) ≤ µ∗ (A) + #/2 < ∞.
Since Bn ∈ C, Bnc is a finite union of disjoint sets from C. W.l.o.g., it can
be assumed that {Bn }n≥1 are disjoint. (Otherwise,
$∞ one can consider the
sequence B1 , B2 ∩ B1c , B3 ∩ B2c ∩ B1c , . . ..) Next, n=1$µ(Bn ) < ∞ implies
∞
that for every # > 0, there exists k ∈ N such that n=k+1 µ(Bn ) < 2$ .
#k
#
Since both A and j=1 Bj are subsets of n≥1 Bn ,
∗
µ
-
A+
1*
k
j=1
Bj
2.
∗
≤µ
-
c
A ∩
1*
∞
j=1
Bj
2.
+µ
∗
- *
∞
j=k+1
.
Bj .
1.3 The extension theorems and Lebesgue-Stieltjes measures
But #since µ∗ is a measure on (Ω, Mµ∗ ), µ∗ (
µ∗ (( n≥1 Bn ) ∩ Ac ). Further, since µ∗ (A) < ∞,
µ∗
'8 *
n≥1
'*
=
µ∗
≤
∞
)
n≥1
=
n=1
∞
)
n=1
<
#
2
9
(
Bn ∩ Ac
#
n≥1
29
Bn ) = µ∗ (A) +
(
Bn − µ∗ (A))
µ∗ (Bn ) − µ∗ (A),
µ(Bn ) − µ∗ (A),
(since µ∗ is countably subadditive)
(since µ∗ = µ on C)
(by the choice of {Bn }n≥1 ).
Also, by the definition of k,
∗
µ
- *
∞
j=k+1
Bj
.
≤
∞
)
j=k+1
µ∗ (Bj ) =
∞
)
j=k+1
µ(Bj ) <
#
.
2
#k
Thus, µ∗ (A + [ j=1 Bj ]) < #. This completes the proof of the theorem. !
Thus, every µ∗ -measurable set of finite measure is nearly a finite union of
disjoint elements from the semialgebra C. This was enunciated by J.E. Littlewood as the first of his three principles of approximation (the other two
being: every Lebesgue measurable function is nearly continuous (cf. Theorem 2.5.12) and every almost everywhere convergent sequence of functions
on a finite measure space is nearly uniformly convergent (Egorov’s theorem)
cf. Theorem 2.5.11).
One may strengthen Theorem
1.3.4 to" prove the following result on regu!
larity of Radon measures on Rk , B(Rk ) (cf. Problem 1.32). See also Rudin
(1987), Chapter 2.
Corollary 1.3.5: (Regularity of measures). Let µ be a Radon measure on
(Rk , B(Rk )) for some k ∈ N, i.e., µ(A) < ∞ for all bounded sets A ∈ B(Rk ).
Let A ∈ B(Rk ) be such that µ(A) < ∞. Then, for each # > 0, there exist a
compact set K and an open set G such that K ⊂ A ⊂ G and µ(G \ K) < #.
The following uniqueness result can be established by an application of
the above approximation theorem (Theorem 1.3.4) or by applying the π-λ
theorem (Corollary 1.1.3), as in Theorem 1.2.4 (Problem 1.26).
Theorem 1.3.6: (Uniqueness). Let µ be a σ-finite measure on a semialgebra C. Let ν be a measure on the measurable space (Ω, σ1C2) such that
ν = µ on C. Then ν = µ∗ on σ1C2.
30
1. Measures
An application of Theorem 1.3.4 yields an useful approximation result
known as Lusin’s theorem (see Theorem 2.1.3 in Section 2.1) for approximating Borel measurable functions by continuous functions.
1.4 Completeness of measures
Recall from Definition 1.3.6 that a measure space (Ω, F, µ) is called complete if for any A ∈ F, µ(A) = 0, B ⊂ A ⇒ B ∈ F. That is, for any
set A in F whose µ measure is zero, all subsets of a set A are also in F.
For example, the very construction of the Lebesgue-Stieltjes measure for a
nondecreasing F on R, discussed in Section 1.3, yields a complete measure
space (R, Mµ∗F , µ∗F ). The Borel σ-algebra B(R) is a sub-σ-algebra of Mµ∗F
and (R, B(R), µ∗F ) need not be complete. For example, if µF is the Lebesgue
measure, then the Cantor set C (Section 1.3.2) has measure 0 and hence
M ≡ the Lebesgue σ-algebra contains the power set of C and hence has
cardinality larger than that of R. It can be shown that the cardinality of
B(R) is the same as that of R (see Hewitt and Stromberg (1965)). For
another example, if F is a degenerate distribution at 0, i.e., F (x) = 0 for
x < 0 and F (x) = 1 for x ≥ 0, then Mµ∗F = P(R), the power set of
R (Problem 1.28), and hence (R, B(R), µF ) is not complete. The same is
true for any discrete distribution function. However, it is always possible
to complete an incomplete measure space (Ω, F, µ) by adding new sets to
F. This procedure is discussed below.
Theorem 1.4.1: Let (Ω, F, µ) be a measure space. Let F̃ ≡ {A : B1 ⊂ A ⊂
B2 for some B1 , B2 ∈ F satisfying µ(B2 \ B1 ) = 0}. For any A ∈ F̃, set
µ̃(A) = µ(B1 ) = µ(B2 ) for any pair of sets B1 , B2 ∈ F with B1 ⊂ A ⊂ B2
and µ(B2 \ B1 ) = 0. Then
(i) F̃ is a σ-algebra and F ⊂ F̃,
(ii) µ̃ is well defined,
(iii) (Ω, F̃, µ̃) is a complete measure space and µ̃ = µ on F.
Proof:
(i) Since A ∈ F̃, there exist B1 , B2 ∈ F, B1 ⊂ A ⊂ B2 and µ(B2 \B1 ) =
= µ(B2 \
0. Clearly, B2c ⊂ Ac ⊂ B1c , and B1c , B2c ∈ F and µ(B1c \ B2c ) #
B1 ) = 0 and so Ac ∈ F̃. Next, let {An }∞
⊂
F̃
and
A
=
n=1
n≥1 An .
⊂
Then, for each n there exist B1n and B#2n in F such that B1n
# An ⊂
B2n and µ(B2n \ B1n ) = 0. Let B1 = n≥1 B1n and#B2 = n≥1 B2n .
Then B1 ⊂ A ⊂ B2 , B1 , B2 ∈ F and B2 \ B1 ⊂ n≥1 (B2n \ B1n )
$∞
and hence µ(B2 \ B1 ) ≤ n=1 µ(B2n \ B1n ) = 0. Thus, A ∈ F̃ and
hence F̃ is a σ-algebra. Clearly, F ⊂ F̃ since for every A ∈ F, one
may take B1 = B2 = A.
1.5 Problems
31
(ii) Let B1 ⊂ A ⊂ B2 , B1$ ⊂ A ⊂ B2$ , B1 , B1$ , B2 , B2$ ∈ F and µ(B2 \
B1 ) = 0 = µ(B2$ \ B1$ ). Then B1 ∪ B1$ ⊂ A ⊂ B2 ∩ B2$ and (B2 ∩ B2$ ) \
!
(B1 ∪ B1$ ) = (B2 ∩ B2$ ) ∩ (B1c ∩ B1c ) ⊂ B2 ∩ B1c . Thus
µ([B2 ∩ B2$ ] \ [B1 ∪ B1$ ]) = 0.
Hence, µ(B2 ) = µ(B1 ) + µ(B2 \ B1 ) = µ(B1 ) ≤ µ(B1 ∪ B1$ ) = µ(B2 ∩
B2$ ) ≤ µ(B2$ ). By symmetry µ(B2$ ) ≤ µ(B2 ) and so µ(B2 ) = µ(B2$ ).
But µ(B2 ) = µ(B1 ) and µ(B2$ ) = µ(B1$ ) and also all four quantities
agree.
(iii) It remains to show that µ̃ is countably additive and complete on
∞
F̃.
# Let {An }n=1 be a disjoint sequence of sets from F̃ and let A =
n≥1 An . Let {B1n }n≥1 , {B2n }n≥1 , B1 , B2 be as in the proof of (i).
∞
∞
Then, the fact that {An }#
n=1 are disjoint implies {B1n }n=1 are also
disjoint. And since B1 = n≥1 B1n and µ is a measure on (Ω, F),
µ(B1 ) ≡
∞
)
(B1n ).
n=1
Also, by definition of B1n ’s, µ(B1n ) = µ̃(An ) for all n ≥ 1, and by
(i), µ̃(A) = µ(B1 ). Thus,
µ̃(A) = µ(B1 ) =
∞
)
(B1n ) =
n=1
∞
)
µ̃(An ),
n=1
establishing the countable additivity of µ̃.
Next, suppose that A ∈ F̃ and µ̃(A) = 0. Then there exist B1 , B2 ∈ F
such that B1 ⊂ A ⊂ B2 and µ(B2 \ B1 ) = 0. Further, by definition of
µ̃, µ(B2 ) = µ̃(A) = 0. If D ⊂ A, then ∅ ⊂ D ⊂ B2 and µ(B2 \ ∅) = 0.
Therefore, D ∈ F̃ and hence (Ω, F̃, µ̃) is complete.
Finally, if A ∈ F, then take B1 = B2 = A and so, µ̃(A) = µ(B1 ) =
µ(A), and thus, µ̃ = µ on F. Hence, the proof of the theorem is
complete.
!
1.5 Problems
1.1 Let Ω be a nonempty set. Show that F ≡ {Ω, ∅} and G = P(Ω) ≡
{A : A ⊂ Ω} are both σ-algebras.
1.2 Let Ω be a finite set, i.e., the number of elements in Ω is finite. Let
F ⊂ P(Ω) be an algebra. Show that F is a σ-algebra.
1.3 Show that F6 in Example 1.1.4 is a σ-algebra.
32
1. Measures
1.4 Let
+ {Fθ : θ ∈ Θ} be a family of σ-algebras on Ω. Show that G ≡
θ∈Θ Fθ is also a σ-algebra.
1.5 Let Ω = {1, 2, 3}, F1 = {{1}, {2, 3}, Ω, ∅}, F2 = {{1, 2}, {3}, Ω, ∅}.
Verify that F1 and F2 are both algebras (in fact, σ-algebras) but
F1 ∪ F2 is not an algebra.
1.6 Let Ω be a nonempty set and let A#≡ {Ai : i ∈ N} be a partition
# of Ω,
i.e., Ai ∩ Aj = ∅ for all i )=#j and n≥1 An = Ω. Let F = { i∈J Ai :
J ⊂ N} where, for J = ∅, i∈J Ai ≡ ∅. Show that F is a σ-algebra.
1.7 Let Ω be a nonempty set and let B ≡ {Bi : 1 ≤ i ≤ k < ∞} ⊂ P(Ω),
B not necessarily a partition. Find σ1B2.
+k
(Hint: For each δ = (δ1 , δ2 , . . . , δk ), δi ∈ {0, 1} let Bδ = i=1 Bi (δi ),
where#Bi (0) = Bic and Bi (1) = Bi , i ≥ 1. Show that σ(B) = {E :
E = δ∈J Bδ , J ⊂ {1, 2, . . . , k}}.)
1.8 Let Ω ≡ {1, 2, . . .} = N and Ai ≡ {j : j ∈ N, j ≥ i}, i ∈ N. Show that
σ1A2 = P(Ω) where A = {Ai : i ∈ N}.
1.9 (a) Show that every open set A ⊂ R is a countable union of open
intervals.
(Hint: Use the fact that the set Q of all rational numbers is
dense in R.)
(b) Extend the above to Rk for any k ∈ N.
(c) Strengthen (a) to assert that the open intervals in (a) can be
chosen to be disjoint.
1.10 Show that in Example 1.1.6, Oj ⊂ σ1Oi 2 for all 1 ≤ i, j ≤ 4.
1.11 For k ∈ N, let O5 ≡ {(a1 , b1 ] × . . . × (ak , bk ] : −∞ < ai < bi < ∞, 1 ≤
i ≤ k} and O6 ≡ {[a1 , b1 ] × . . . × [ak , bk ] : −∞ < ai < bi < ∞, 1 ≤
i ≤ k}. Show that σ1O5 2 = σ1O6 2 = B(Rk ).
1.12 Let OS ≡ {{x} : x ∈ Rk } be the class of all singletons in Rk , k ∈ N.
Show that σ1OS 2 is properly contained in B(Rk ).
(Hint: Show that σ1OS 2 coincides with F6 of Example 1.1.4).
1.13 Let R̄ ≡ R ∪ {+∞} ∪ {−∞} be the extended real line. The Borel
σ-algebra on R̄, denoted by B(R̄), is defined as the σ-algebra on R̄
generated by the collection B(R) ∪ {∞} ∪ {−∞}.
:
;
(a) Show that B(R̄) = A ∪ B : A ∈ B(R), B ⊂ {−∞, ∞} .
(b) Show, however, that the
: σ-algebra on R̄ generated by B(R) is
given ;by σ1B(R)2 = A ∪ B : A ∈ B(R), B = {−∞, ∞} or
B=∅ .
1.5 Problems
33
1.14 Let A1 , A2 and A3 denote, respectively, the class of triangles, discs,
and pentagons in R2 . Show that σ1Ai 2 ≡ B(R2 ). Thus, the σ-algebra
B(R2 ) (and similarly B(Rk )) can be generated by starting with various classes of sets of different shapes and geometry.
1.15 Let Ω be a nonempty set and B be a σ-algebra on Ω. Let A ⊂ Ω
and BA ≡ {B ∩ A : B ∈ B}. Show that BA is a σ-algebra on A. The
σ-algebra BA is called the trace σ-algebra of B on A.
1.16 Let Ω = R and F be the collection of all finite unions of disjoint
intervals of the form (a, b] ∩ R, −∞ ≤ a < b ≤ ∞. Show that F is an
algebra but not a σ-algebra.
1.17 Let Ω be a nonempty set and {Ai }i∈N be a sequence of subsets of Ω
such that Ai+1 ⊂ Ai for all i ∈ N. Verify that A = {Ai : i ∈ N} is a
π-system and also determine λ1A2, the λ-system generated by A.
1.18 Let Ω ≡ N, F = P(Ω), and An = {j : j ∈ N, j ≥ n}, n ∈ N. Let
µ+
be the counting measure on (Ω, F). Verify that limn→∞ µ(An ) )=
µ( n≥1 An ).
1.19 Let Ω be a nonempty set and let C ⊂ P(Ω) be a semialgebra. Let
k
4
3
*
Bi : Bi ∈ C, i = 1, 2, . . . , k, k ∈ N .
A(C) ≡ A : A =
i=1
(a) Show that A(C) is the smallest algebra containing C.
(b) Show also that σ1C2 = σ1A(C)2.
1.20 Let µ∗ be as in (3.1) of Section 1.3. Verify (3.4)–(3.6).
∈ N, then find, for
(Hint: Fix 0 < # < ∞. If µ∗ (An ) < ∞ for all n $
∞
each n, a cover {Anj }j≥1 ⊂ C such that µ∗ (An ) ≤ j=1 µ(Anj )+ 2$n .)
1.21 Prove Proposition 1.3.1.
1.22 Let F : R → R be nondecreasing.
# Let (a, b], (an , bn ], n ∈ N be intervals in R such that (a, b] = n≥1 (an , bn ] and {(an , bn ] : n ≥ 1}
are disjoint. Let µF (·) be as in (3.8). Show that µF ((a, b]) =
$
∞
n=1 µF ((an , bn ]) by completing the following steps:
(a) Let G(x) ≡ F (x+) for all x ∈ R and let G(±∞) = F (±∞).
Verify that G(·) is nondecreasing and right continuous on R and
that for any A in C, µF (A) = µG (A).
(b) In view of (a), assume w.l.o.g. that F (·) is right continuous.
Show that for any k ∈ N,
F (b) − F (a) ≥
k
)
i=1
(F (bi ) − F (ai )),
34
1. Measures
and conclude that
F (b) − F (a) ≥
∞
)
i=1
(F (bi ) − F (ai )).
(c) To prove the reverse inequality, fix η > 0. Choose c > a and
dn > bn , n ≥ 1 such that such that F (c) − F (a) < η, [c, b] ⊂
#
n
n≥1 (an , dn ) and F (dn ) − F (bn ) < η/2 for all n ∈ N. Next,
apply the Heine-Borel theorem to the interval [c, b] and the open
cover {(an , dn )}n≥1 and extract a finite cover {(ai , di )}ki=1 for
[c, b]. W.l.o.g., assume that c ∈ (a1 , d1 ) and b ∈ (ak , dk ). Now
verify that
F (b) − F (a) ≤
≤
k
)
j=1
∞
)
j=1
(F (bj ) − F (aj )) + 2η
(F (bj ) − F (aj )) + 2η.
1.23 Extend the above arguments to the case when (a, b] and (ai , bi ], i ≥ 1
are not necessarily bounded intervals.
1.24 Verify that C2 , defined in (3.11), is a semialgebra.
1.25 (a) Verify that the limit in (3.13) exists.
(b) Extend the arguments in Problems 1.22 and 1.23 to verify that
µF of (3.12) and (3.13) is a measure on C2 .
1.26 Establish Theorem 1.3.6 by completing the following:
(a) Suppose first that ν(Ω) < ∞. Verify that L ≡ {A : A ∈ σ1C2,
µ∗ (A) = ν(A)} is a λ system and use the π-λ theorem.
(b) Extend the above to the σ-finite case.
1.27 Prove Corollary 1.3.5 for Lebesgue measure m(·).
1.28 Let F be a discrete distribution function, i.e., F is of the form
F (x) ≡
where 0 < aj < ∞,
P(R).
$
∞
)
j=1
j≥1
aj I(xj ≤ x),
x ∈ R,
aj = 1, xj ∈ R, j ≥ 1. Show that Mµ∗F =
(Hint: Show that µ∗F (Ac ) = 0, where A ≡ {xj : j ≥ 1}, and use the
fact that for any B ⊂ R, B ∩ A ∈ B(R).)
1.5 Problems
1.29 Let

 0
x
F (x) =

1
35
for x < 0
for 0 ≤ x ≤ 1
for x > 0.
Show that Mµ∗F ≡ {A : A ∈ P(R), A ∩ [0, 1] ∈ M}, where M is the
σ-algebra of Lebesgue measurable sets as in Definition 1.3.7.
1.30 Let F (·) = 12 Φ(·)+ 12 FP (·) where Φ(·) is the standard normal cdf, i.e.,
&x
$∞
k (k)
2
Φ(x) ≡ −∞ √12π e−u /2 Au and FP (x) ≡ k=0 e−2 2k! I(−∞,x] , x ∈ R.
Let F1 = Φ, F2 = FP and F3 = F . Let A1 = (0, 1), A2 = {x : x ∈
R, sin x ∈ (0, 12 )}, A3 = {x : for some integers a0 , a1 , . . . , ak , k < ∞,
$k
i
i=0 ai x = 0}, the set of all algebraic numbers. Compute µFi (Aj ),
1 ≤ i, j ≤ 3.
1.31 Let µ be a measure on a semialgebra C ⊂ P(Ω) where Ω is a nonempty
set. Let µ∗ be the outer measure generated by µ and let Mµ∗ be the
σ-algebra of µ∗ -measurable sets as defined in Theorem 1.3.3.
(a) Show that for all A ⊂ Ω, there exists a B ∈ σ1C2 such that
A ⊂ B and µ∗ (A) = µ∗ (B).
(Hint: If µ∗ (A) = ∞, take B to be Ω. If µ∗ (A) < ∞, use
the definition of µ∗ to show that for each# n ≥ 1, there exists {B }
⊂ C such that A ⊂ Bn ≡ j≥1 Bnj , µ∗ (A) ≤
+
$∞ nj j≥1
1
∗
j=1 µ(Bnj ) < µ (A) + n . Take B =
n≥1 Bn .)
(b) Show that for all A ∈ Mµ∗ with µ∗ (A) < ∞, there exists B ∈
σ1C2 such that A ⊂ B and µ∗ (B \ A) = 0.
(Hint: Use (a) and the relation B = A ∪ (B \ A) with A and
B \ A = B ∩ Ac in Mµ∗ .)
(c) Show that if µ is σ-finite (i.e., there
# exist sets Ωn , n ≥ 1 in C
with µ(Ωn ) < ∞ for all n ≥ 1 and n≥1 Ωn = Ω), then in (b),
the hypothesis that µ∗ (A) < ∞ can be dropped.
(Hint: Assume w.l.o.g. that {Ωn }n≥1 are disjoint. Apply (b) to
{An ≡ A ∩ Ωn }n≥1 .)
(d) Show that if µ is σ-finite, then A ∈ Mµ∗ iff there exist sets
B1 , B2 ∈ σ1C2 such that B1 ⊂ A ⊂ B2 and µ∗ (B2 \ B1 ) = 0.
(Hint: Apply (c) to both A and Ac .)
This shows that Mµ∗ is the completion of σ1C2 w.r.t. µ∗ .
1.32 (An outline of a proof of Corollary 1.3.5). Let (R, Mµ∗F , µ∗F ) be a
Lebesgue Stieltjes measure space generated by a right continuous and
nondecreasing function F : R → R.
36
1. Measures
(a) Show that A ∈ Mµ∗F iff there exist Borel sets B1 and B2 ∈ B(R)
such that B1 ⊂ A ⊂ B2 and µ∗F (B2 \ B1 ) = 0.
(Hint: Take C to be the semialgebra C = {(a, b] : −∞ ≤ a ≤
b < ∞} ∪ {(b, ∞) : −∞ ≤ b < ∞} and apply Problem 1.31 (d).)
(b) Let A ∈ Mµ∗F with µ∗F (A) < ∞. Show that for any # > 0, there
exist a finite number of bounded open intervals Ij , j = 1, 2, . . . , k
#k
such that µ∗F (A + j=1 Ij ) < #.
(Outline: Claim: For any B ∈ C with µF (B) < ∞, there exists
an open interval I such that µ∗F (I + B) < #.
To see this, note that if B = (a, b], − ∞ ≤ a < b < ∞,
then one may choose b$ > b such that F (b$ ) − F (b) < #. Now,
with I = (a, b$ ), µ∗F (I + B) = µ∗F ((b, b$ )) = µF ((b, b$ )) ≤
F (b$ ) − F (b) < #. If B = (b, ∞) and µF (B) < ∞, then there exists b$ > b such that F (∞) − F (b$ −) < #. Hence, with I = (b, b$ ),
µ∗F (I + B) = µ∗F ([b$ , ∞)) = F (∞) − F (b$ −) < #. This proves the
claim.
Next, By Theorem 1.3.4, for all ## > 0, there exist
k
B1 , B2 , . . . , Bk ∈ C such that µ∗F (A + j=1 Bj ) < #/2. For each
Bj , find Ij , a bounded open interval such that µ∗F (Bj +Ij ) < 2$j .
Since (A1 ∪ A2 ) + (C1 ∪ C2 ) ⊂ (A1 + C1 ) ∪ (A2 + C2 ) for any
A1 , A2 , C1 , C2 ⊂ Ω, it follows that
µ∗F
k
'8 *
k
k
9 8*
9( )
#
Bj +
Ij <
µ∗F (Bj + Ij ) < .
2
j=1
j=1
j=1
#k
Hence, µ∗F (A + [ j=1 Ij ]) < #.)
(c) Let A ∈ Mµ∗F with µ∗F (A) < ∞. Show that for every # > 0,
there exists an open set O such that A ⊂ O and µ∗F (O \ A) < #.
# > 0, there ex(Hint: By definition of µ∗F , for every
#
∗
B
ist {Bj }j≥1 ⊂ C such that A ⊂
j≥1 j and µF (A) ≤
$∞
∗
j=1 µF (Bj ) ≤ µF (A) + #. Now as in (b), there exist open
∗
j
intervals Ij such that
Bj ⊂ Ij and
#∞ µF (Ij \ Bj ) < #/2∗ for all
#∞
j ≥ 1. Then A ⊂ j=1 Bj ⊂ j=1 Ij ≡ O. Also, µF (O) =
∗
∗
∗
µ∗F (O \ A) ⇒ µ$
µ∗F (A) + $
F (O \ A) = µF (O) − µ (A) < 2# (since
∞
∞
∗
∗
∗
∗
µ (O) ≤ j=1 µ (Ij ) = j=1 µF (Bj ) + # ≤ µF (A) + 2# < ∞).)
(d) Extend (c) to all A ∈ Mµ∗F .
(Hint: Let Ai = A ∩ [i, i + 1], i ∈ Z. Apply (c) to Ai with
$
and take unions.)
#i = 2|i|+1
(e) Show that for all A ∈ Mµ∗F and for all # > 0, there exist a
closed set C and an open set O such that C ⊂ A ⊂ O and
1.5 Problems
37
µ∗F (O \ A) ≤ #, µ∗ (A \ C) < #.
(Hint: Apply (d) to A and Ac .)
(f) Show that for all A ∈ Mµ∗F with µ∗F (A) < ∞ and for all # > 0,
there exist a closed and bounded set F ⊂ A such that µ∗F (A \
F ) < # and an open set O with A ⊂ O such that µ∗F (O \ A) < #.
(Hint: Apply (d) to A ∩ [−M, M ] where M is chosen so that
µ∗F (A ∩ [−M, M ]c ) < #. Why is this possible?)
Remark: Thus for all A ∈ Mµ∗F with µ∗F (A) < ∞ and for all
# > 0, there exist a compact set K ⊂ A and an open set O ⊃ A
such that µ∗F (A \ K) < # and µ∗F (O \ A) < #. The first property
is called inner regularity of µ∗F and the second property is called
outer regularity of µ∗F .
(g) Show that for all A ∈ Mµ∗F with µ∗F (A) < ∞ and for all # > 0,
there exists a continuous function g$ with compact support (i.e.,
g$ (x) is zero for |x| large) such that
µ∗F (A + g$−1 {1}) < #.
(Hint: For any bounded open interval (a, b), let η > 0 be such
that µF ((a, a + η]) + µF ([b − η, b)) < #. Next define

if a + η ≤ x ≤ b − η
 1
0
if x )∈ (a, b)
g$ (x) =

linear
over [a, a + η] ∪ [b − η, b].
Then g$ is continuous with compact support. Also, g −1 {1} =
[a + η, b − η] and (a, b) + g −1 {1} = (a, a + η) ∪ (b − η, b). So
µF {(a, b) + g$−1 (1)}| < #, proving the claim for A = (a, b). The
general case follows from (b).)
(h) Show that for all A ∈ Mµ∗F and for all # > 0, there exists a
continuous function g$ (not necessarily with compact support)
such that µ∗F (A + g$−1 {1}) < # (i.e., drop the condition µ∗F (A) <
∞).
(Hint: Let Ak = A ∩ [k, k! + 1], k ∈ Z.
" Find gk : R → R
continuous with support$in k − 18 , k + 98 such that µ∗F (IAk )=
$
. Let g =
gk ) < 2|k|+1
k∈Z gk . Note that for any x ∈ R, at
(x)
=
)
0
and
so
g is continuous. Also, µ∗F (IA )= g) ≤
most
two
g
k
$
∗
k∈Z µF (IAk )= gk ) < #.)
1.33 Let C be the Cantor set in [0,1] as defined in Section 1.3.2.
(a) Show that
∞
3
4
)
ai
C= x:x=
,
a
∈
{0,
2}
.
i
3i
i=1
38
1. Measures
and hence that C is uncountable.
(b) Show that
C + C ≡ {x + y : x, y ∈ C} = [0, 2].
2
Integration
2.1 Measurable transformations
Oftentimes, one is not interested in the full details of a measure space
(Ω, F, µ) but only in certain functions defined on Ω. For example, if Ω represents the outcomes of 10 tosses of a fair coin, one may only be interested
in knowing the number of heads in the 10 tosses. It turns out that to assign
measures (probabilities) to sets (events) involving such functions, one can
allow only certain functions (called measurable functions) that satisfy some
‘natural’ restrictions, specified in the following definitions.
Definition 2.1.1: Let Ω be a nonempty set and let F be a σ-algebra on
Ω. Then the pair (Ω, F) is called a measurable space. If µ is a measure on
(Ω, F), then the triplet (Ω, F, µ) is called a measure space. If in addition,
µ is a probability measure, then (Ω, F, µ) is called a probability space.
Definition 2.1.2:
(a) Let (Ω, F) be a measurable space. Then a function f : Ω to R is called
1F, B(R)2-measurable (or F-measurable) if for each a in R
!
"
(1.1)
f −1 (−∞, a] ≡ {ω : f (ω) ≤ a} ∈ F.
(b) Let (Ω, F, P ) be a probability space. Then a function X : Ω → R is
called a random variable, if the event
X −1 ((−∞, a]) ≡ {ω : X(ω) ≤ a} ∈ F
40
2. Integration
for each a in R, i.e., a random variable is a real valued F-measurable
function on a probability space (Ω, F, P ).
It will be shown later that condition (1.1) on f is equivalent to the
stronger condition that f −1 (A) ∈ F for all Borel sets A ∈ B(R). Since for
any Borel set A ∈ B(R), f −1 (A) is a member of the underlying σ-algebra
F, one can assign a measure to the set f −1 (A) using a measure µ on (Ω, F).
Note that for an arbitrary function T from Ω → R, T −1 (A) need not be
a member of F and hence such an assignment may not be possible. Thus,
condition (1.1) on real valued mappings is a ‘natural’ requirement while
dealing with measure spaces.
The following definition generalizes (1.1) to maps between two measurable spaces.
Definition 2.1.3: Let (Ωi , Fi ), i = 1, 2 be measurable spaces. Then, a
mapping T : Ω1 → Ω2 is called measurable with respect to the σ-algebras
1F1 , F2 2 (or 1F1 , F2 2-measurable) if
T −1 (A) ∈ F1
for all A ∈ F2 .
Thus, X is a random variable on a probability space (Ω, F, P ) iff X is
1F, B(R)2-measurable. Some examples of measurable transformations are
given below.
Example 2.1.1: Let Ω = {a, b, c, d}, F2 = {Ω, ∅, {a}, {b, c, d}} and let
F3 = the set of all subsets of Ω. Define the mappings Ti : Ω → Ω, i = 1, 2,
by
T1 (ω) ≡ a for ω ∈ Ω
and
T2 (ω) =
/
a if ω = a, b
c if ω = c, d.
Then, T1 is 1F2 , F3 2-measurable since for any A ∈ F3 , T1−1 (A) = Ω or
∅ according as a ∈ A or a )∈ A. By similar arguments, it follows that
T2 is 1F3 , F2 2-measurable. However, T2 is not 1F2 , F3 2-measurable since
T2−1 ({a}) = {a, b} )∈ F2 .
As this simple example shows, measurability of a given mapping critically
depends on the σ-algebras on its domain and range spaces. In general, if
T is 1F1 , F2 2-measurable, then T is 1F̃1 , F2 2-measurable for any σ-algebra
F̃1 ⊃ F1 and T is 1F1 , F̃2 2-measurable for any F̃2 ⊂ F2 .
Example 2.1.2: Let T : R → R be defined as
/
sin 2x
if x > 0
T (x) =
1 + cos x if x ≤ 0.
2.1 Measurable transformations
41
Is T measurable w.r.t. the Borel σ-algebras 1B(R), B(R)2? If one is to
apply the definition directly, one must check that T −1 (A) ∈ B(R) for all
A ∈ B(R). However, finding T −1 (A) for all Borel sets A is not an easy task.
In many instances like this one, verification of the measurability property
of a given mapping by directly using the definition can be difficult. In
such situations, one may use some easy-to-verify sufficient conditions. Some
results of this type are given below.
Proposition 2.1.1: Let (Ωi , Fi ), i = 1, 2, 3 be measurable spaces.
(i) Suppose that F2 = σ1A2 for some class of subsets A of Ω2 . If T :
Ω1 → Ω2 is such that T −1 (A) ∈ F1 for all A ∈ A, then T is 1F1 , F2 2measurable.
(ii) Suppose that T1 : Ω1 → Ω2 is 1F1 , F2 2-measurable and T2 : Ω2 →
Ω3 is 1F2 , F3 2-measurable. Let T = T2 ◦ T1 : Ω1 → Ω3 denote the
composition of T1 and T2 , defined by T (ω1 ) = T2 (T1 (ω1 )), ω1 ∈ Ω1 .
Then, T is 1F1 , F3 2-measurable.
Proof:
(i) Define the collection of sets
F = {A ∈ F2 : T −1 (A) ∈ F1 }.
Then,
(a) T −1 (Ω2 ) = Ω1 ∈ F1 ⇒ Ω2 ∈ F.
(b) If A ∈ F, then T −1 (A) ∈ F1 ⇒ (T −1 (A))c ∈ F1 ⇒ T −1 (Ac ) =
(T −1 (A))c ∈ F1 , implying Ac ∈ F.
−1
(c) If A1 , A2 , . . . , ∈ F, then,
# T (Ai ) ∈
# F1 for−1all i ≥ 1. Since F1
−1
is
a
σ-algebra
,
T
(
A
)
=
(An ) ∈ F1 . Thus,
n
n≥1
n≥1 T
#
n≥1 An ∈ F. (See also Problem 2.1 on de Morgan’s laws.)
Hence, by (a), (b), (c), F is a σ-algebra and by hypothesis A ⊂ F .
Hence, F2 = σ1A2 ⊂ F ⊂ F2 . Thus, F = F2 and T is 1F1 , F2 2measurable.
(ii) Let A ∈ F3 . Then, T2−1 (A) ∈ F2 , since T2 is 1F2 , F3 2-measurable.
Also, by the 1F1 , F2 2-measurability of T1 , T −1 (A) = T1−1 (T2−1 (A)) ∈
!
F1 , showing that T is 1F1 , F3 2-measurable.
Proposition 2.1.2: For any k, p ∈ N, if f : Rp → Rk is continuous, then
f is 1B(Rp ), B(Rk )2-measurable.
Proof: Let A = {A : A is an open set in Rk }. Then, by the continuity of
f , f −1 (A) is open and hence, is in B(Rp ) (cf. Section A.4). Thus, f −1 (A) ∈
42
2. Integration
Rp for all A ∈ A. Since B(Rk ) = σ1A2, by Proposition 2.1.1 (a), f is
!
1B(Rp ), B(Rk )2-measurable.
Although the converse to the above proposition is not true, a result due
to Lusin says that except on a set of small measure, f coincides with a
continuous function. This is stronger than the statement that except on
set of small measure, f is close to a continuous function. For the statement
and proof of Lusin’s theorem, see Theorem 2.5.12.
Proposition 2.1.3: Let f1 , . . . , fk (k ∈ N) be 1F, B(R)2-measurable transformations from Ω to R. Then,
(i) f = (f1 , . . . , fk ) is 1F, B(Rk )2-measurable.
(ii) g = f1 + . . . + fk is 1F, B(R)2-measurable.
<k
(iii) h ≡ i=1 fi is 1F, B(R)2-measurable.
(iv) Let p ∈ N and let ψ : Rk → Rp be continuous. Then, ξ ≡ ψ ◦ f is
1F, B(Rp )2-measurable, where f = (f1 , . . . , fk ).
Proof: To prove (i), note that for any rectangle R = (a1 , b1 )×. . .×(ak , bk ),
f −1 (R)
=
=
{ω ∈ Ω : a1 < f1 (ω) < b1 , . . . , ak < fk (ω) < bk }
k
,
i=1
=
k
,
i=1
{ω ∈ Ω : ai < f1 (ω) < bi }
fi−1 (ai , bi ) ∈ F,
since each fi is 1F, B(R)2-measurable. Hence, by Proposition 2.1.1 (i), f
is 1F, B(Rk )2-measurable. To prove (ii), note that the function g1 (x) ≡
x1 + . . . + xk , x = (x1 , . . . , xk ) ∈ Rk is continuous on Rk , and hence,
by Proposition 2.1.2, is 1B(Rk ), B(R)2-measurable. Since g = g1 ◦ f , g is
1F, B(R)2-measurable, by Proposition 2.1.1 (ii). The proofs of (iii) and (iv)
are similar to that of (ii) and hence, are omitted.
!
Corollary 2.1.4: The collection of 1F, B(R)2-measurable functions from
Ω to R is closed under pointwise addition and multiplication as well as
under scalar multiplication.
The proof of Corollary 2.1.4 is omitted.
In view of the above, writing the function T of Example 2.1.2 as
T (x) = (sin 2x)I(0,∞) (x) + (1 + cos x)I(−∞,0] (x),
x ∈ R, the 1B(R), B(R)2-measurability of T follows. Note that here T is not
continuous over R but only piecewise continuous (see also Problem 2.2).
2.1 Measurable transformations
43
Next, measurability of the limit of a sequence of measurable functions is
considered. Let R̄ = R ∪ {+∞, −∞} denote the extended real line and let
B(R̄) ≡ σ1B(R) ∪ {+∞} ∪ {−∞}2 denote the extended Borel σ-algebra on
R̄.
Proposition 2.1.5: For each n ∈ N, let fn : Ω → R̄ be a 1F, B(R̄)2measurable function.
(i) Then, each of the functions supn∈N fn , inf n∈N fn , lim supn→∞ fn , and
lim inf n→∞ fn is 1F, B(R̄)2-measurable.
(ii) The set A ≡ {ω : limn→∞ fn (ω) exists and is finite} lies in F and
the function h ≡ (limn→∞ fn ) · IA is 1F, B(R̄)2-measurable.
Proof:
(i) Let g = supn≥1 fn . To show that g is 1F, B(R̄)2-measurable, it is
enough to show that {ω : g(ω) ≤ r} ∈ F for all r ∈ R (cf. Problem
2.4). Now, for any r ∈ R,
{ω : g(ω) ≤ r} =
=
∞
,
n=1
∞
,
n=1
{ω : fn (ω) ≤ r}
fn−1 ((−∞, r]) ∈ F,
since fn−1 ((−∞, r]) ∈ F for all n ≥ 1, by the measurability of fn .
Next note that inf n≥1 fn = − supn≥1 (−fn ) and hence, inf n≥1 fn is
1F, B(R̄)2-measurable. To prove the measurability of lim supn→∞ fn ,
define the functions gm = supn≥m fn , m ≥ 1. Then, gm is 1F, B(R̄)2measurable for each m ≥ 1 and since gm is nonincreasing in m,
inf m≥1 gm ≡ lim supn→∞ fn is also 1F, B(R̄)2-measurable. A similar
argument works for lim inf n→∞ fn .
(ii) Let h1 = lim supn→∞ fn and h2 = lim inf n→∞ fn , and define h̃i =
hi IR (hi ), i = 1, 2. Note that h̃1 − h̃2 is 1F, B(R)2-measurable. Hence,
{ω : lim fn (ω)
n→∞
=
exists and is finite}
{ω : −∞ < lim sup fn (ω) = lim inf fn (ω) < ∞}
n→∞
n→∞
= {ω : −∞ < h2 (ω) = h1 (ω) < ∞}
= {ω : h̃1 (ω) = h̃2 (ω)} ∩ {ω : h1 (ω) < ∞, h2 (ω) > −∞}
= (h̃1 − h̃2 )−1 ({0}) ∩ {ω : h1 (ω) < ∞, h2 (ω) > −∞} ∈ F.
Finally, note that h = h1 IA .
!
44
2. Integration
Definition 2.1.4: Let {fλ : λ ∈ Λ} be a family of mappings from Ω1 into
Ω2 and let F2 be a σ-algebra on Ω2 . Then,
σ1{fλ−1 (A) : A ∈ F2 , λ ∈ Λ}2
is called the σ-algebra generated by {fλ : λ ∈ Λ} (w.r.t. F2 ) and is denoted
by σ1{fλ : λ ∈ Λ}2.
Note that σ1{fλ : λ ∈ Λ}2 is the smallest σ-algebra on Ω1 that makes all
fλ ’s measurable w.r.t. F2 on Ω2 .
Example 2.1.3: Let f = IA for some set A ⊂ Ω1 and Ω2 = R and
F2 = B(R). Then,
σ1{f }2 = σ1{A}2 = {Ω1 , ∅, A, Ac }.
Example 2.1.4: Let Ω1 = Rk , Ω2 = R, F2 = B(R), and for 1 ≤ i ≤ k, let
fi : Ω1 → Ω2 be defined as
fi (x1 , . . . , xk ) = xi ,
(x1 , . . . , xk ) ∈ Ω1 = Rk .
Then, σ1{fi : 1 ≤ i ≤ k}2 = B(Rk ).
To show this, note that any
+k measurable rectangle A1 × . . . × Ak can be
written as A1 × . . . × Ak = i=1 fi−1 (Ai ) and hence A1 × . . . × Ak ∈ σ1{fi :
1 ≤ i ≤ k}2 for all A1 , . . . , Ak ∈ R. Since Rk is generated by the collection of
all measurable rectangles, B(Rk ) ⊂ σ1{fi : 1 ≤ i ≤ k}2. Conversely, for any
A ∈ B(R) and for any 1 ≤ i ≤ k, fi−1 (A) = R × . . . × A × . . . × R (with A in
the ith position) is in B(Rk ). Therefore, σ1{fi : 1 ≤ i ≤ k}2 = σ1{fi−1 (A) :
A ∈ R, 1 ≤ i ≤ k}2 ⊂ B(Rk ). Hence, σ1{fi : 1 ≤ i ≤ k}2 = B(Rk ).
Proposition 2.1.6: Let {fλ : λ ∈ Λ} be an uncountable collection of maps
from Ω1 to Ω2 . Then for any B ∈ σ1{fλ : λ ∈ Λ}2, there exists a countable
set ΛB ⊂ Λ such that B ∈ σ1{fλ : λ ∈ ΛB }2.
Proof: The proof of this result is left as an exercise (Problem 2.5).
!
2.2 Induced measures, distribution functions
Suppose X is a random variable defined on a probability space (Ω, F, P ).
Then P governs the probabilities assigned to events like X −1 ([a, b]), −∞ <
a < b < ∞. Since X takes values in the real line, one should be able to
express such probabilities only as a function of the set [a, b]. Clearly, since
X is 1F, R2-measurable, X −1 (A) ∈ F for all A ∈ B(R) and the function
PX (A) ≡ P (X −1 (A))
(2.1)
2.2 Induced measures, distribution functions
45
is a set function defined on B(R). Is this a (probability) measure on B(R)?
The following proposition answers the question more generally.
Proposition 2.2.1: Let (Ωi , Fi ), i = 1, 2 be measurable spaces and let
T : Ω1 → Ω2 be a 1F1 , F2 2-measurable mapping from Ω1 to Ω2 . Then, for
any measure µ on (Ω1 , F1 ), the set function µT −1 , defined by
µT −1 (A) ≡ µ(T −1 (A)), A ∈ F2
(2.2)
is a measure on F2 .
Proof: It is easy to check that µT −1 satisfies the three conditions for being
a measure. The details are left as an exercise (cf. Problem 2.9).
Definition 2.2.1: The measure µT −1 is called the measure induced by T
(or the induced measure of T ) on F2 .
In particular, if µ(Ω1 ) = 1, then µT −1 (Ω2 ) = 1. Hence, the set function
P defined in (2.1) is indeed a probability measure on (R, B(R)).
Definition 2.2.2: For a random variable X defined on a probability space
(Ω, F, P ), the probability distribution of X (or the law of X), denoted by
PX (say), is the induced measure of X under P on R, as defined in (2.2).
In introductory courses on probability and statistics, one defines probabilities of events like ‘X ∈ [a, b]’ by using the probability mass function for
discrete random variables and the probability density function for ‘continuous’ random variables. The measure-theoretic definition above allows one
to treat both these cases as well as the case of ‘mixed’ distributions under
a unified framework.
Definition 2.2.3: The cumulative distribution function (or cdf in short)
of a random variable X is defined as
FX (x) ≡ PX ((−∞, x]), x ∈ R.
(2.3)
Proposition 2.2.2: Let F be the cdf of a random variable X.
(i) For x1 < x2 , F (x1 ) ≤ F (x2 )
(i.e., F is nondecreasing on R).
(ii) For x in R, F (x) = limy↓x F (y)
(i.e., F is right continuous on R).
(iii) limx→−∞ F (x) = 0 and limx→+∞ F (x) = 1.
Proof: For x1 < x2 , (−∞, x1 ] ⊂ (−∞, x2 ]. Since PX is a measure on B(R),
F (x1 ) = PX ((−∞, x1 ]) ≤ PX ((−∞, x2 ]) = F (x2 ),
proving (i).
46
2. Integration
To prove (ii), let xn ↓ x. Then, the sets (−∞, xn ] ↓ (−∞, x], and
PX ((−∞, x1 ]) = P (X ≤ x1 ) ≤ 1. Hence, using the monotone continuity from above of the measure PX (m.c.f.a.) (cf. Proposition 1.2.3), one
gets
lim F (xn ) = lim PX ((−∞, xn ]) = PX ((−∞, x]) = F (x).
n→∞
n→∞
Next consider part (iii). Note that if xn ↓ −∞ and yn ↑ ∞, then
(−∞, xn ] ↓ ∅ and (−∞, yn ] ↑ (−∞, ∞). Hence, part (iii) follows the m.c.f.a.
!
and the m.c f.b. properties of PX (cf. Propositions 1.2.1 and 1.2.3).
Definition 2.2.4: A function F : R → R satisfying (i), (ii), and (iii)
of Proposition 2.2.2 is called a cumulative distribution function (or cdf for
short).
Thus, given a random variable X, its cdf FX satisfies properties (i), (ii),
(iii) of Proposition 2.2.2. Conversely, given a cdf F , one can construct a
probability space (Ω, F, P ) and a random variable X on it such that the
cdf of X is F . Indeed, given a cdf F , note that by Theorem 1.3.3 and Definition 1.3.7, there exists a (Lebesgue-Stieltjes) probability measure µF on
(R, B(R)) such that µF ((−∞, x]) = F (x). Now define X to be the identity
map on R, i.e., let X(x) ≡ x for all x ∈ R. Then, X is a random variable on
the probability space (R, B(R), µF ) with probability distribution PX = µF
and cdf FX = F .
In addition to (i), (ii) and (iii) of Proposition 2.2.2, it is easy to verify
that for any x in R,
P (X < x) = FX (x−) ≡ lim FX (y)
y↑x
and hence
P (X = x) = FX (x) − FX (x−).
(2.4)
Thus, the function FX (·) has a jump at x iff P (X = x) > 0. Since a
monotone function from R to R can have only jump discontinuities and only
a countable number of them (cf. Problem 2.11), for any random variable
X, the set {a ∈ R : P (X = a) > 0} is countable. This leads to the following
definitions.
Definition 2.2.5:
(a) A random variable X is called discrete if there exists a countable set
A ⊂ R such that P (X ∈ A) = 1.
(b) A random variable X is called continuous if P (X = x) = 0 for all
x ∈ R.
Note that X is continuous iff FX is continuous on all of R and X is
discrete iff the sum of all the jumps of its cdf FX is one. It may also be
2.2 Induced measures, distribution functions
47
noted that if FX is a step function, then X is discrete but not conversely.
For example, consider the case where the set A in the above definition is
the set of all rational numbers.
It turns out that a given cdf may be written as a weighted sum of a
discrete and a continuous cdfs. Let F be a cdf. Let A ≡ {x : p(x) ≡
F (x)$
− F (x−) > 0}. As remarked
$ earlier, A is at most countable. Write
α = y∈A p(y) and let F̃d (x) = y∈A p(y)I(−∞,x] (y), and F̃c (x) = F (x) −
F̃d (x). It is easy to verify that F̃c (·) is continuous on R. If α = 0, then
F (x) = F̃c (x) and F is continuous. If α = 1, then F = F̃d (x) and F is
discrete. If 0 < α < 1, F (·) can be written as
F (x) = αFd (x) + (1 − α)Fc (x),
(2.5)
where Fd (x) ≡ α−1 F̃d (x) and Fc (x) ≡ (1 − α)−1 F̃c (x) are both cdfs, with
Fd being discrete and Fc being continuous. For a further decomposition of
Fc (·) into absolutely continuous and singular continuous components, see
Chapter 4.
2.2.1
Generalizations to higher dimensions
Induced distributions of random vectors and the associated cdfs are briefly
considered in this section. Let X = (X1 , X2 , . . . , Xk ) be a k-dimensional
random vector defined on a probability space (Ω, F, P ). The probability
distribution PX of X is the induced probability measure on (Rk , B(Rk )),
defined by (cf. (2.1))
PX (A) ≡ P (X −1 (A)) A ∈ B(Rk ) .
(2.6)
The cdf FX of X is now defined by
FX (x) = P (X ≤ x),
x ∈ Rk ,
(2.7)
where for any x = (x1 , x2 , . . . , xk ) and y = (y1 , y2 , . . . , yk ) in Rk , x ≤ y
means that xi ≤ yi for all i = 1, . . . , k.
The extension of Proposition 2.2.2 to the k-dimensional case is notationally involved. Here, an analog of Proposition 2.2.2 for the bivariate case,
i.e., for k = 2 is stated.
Proposition 2.2.3: Let F be the cdf of a bivariate random vector X =
(X1 , X2 ).
(i) Then, for any x = (x1 , x2 ) ≤ y = (y1 , y2 ),
F (y1 , y2 ) − F (x1 , y2 ) − F (y1 , x2 ) + F (x1 , x2 ) ≥ 0.
(ii) For any x = (x1 , x2 ) ∈ R2 ,
lim
y1 ↓x1 ,y2 ↓x2
F (y1 , y2 ) = F (x1 , x2 ),
i.e., F is right continuous on R2 .
(2.8)
48
2. Integration
(iii) limx1 →−∞ F (x1 , a) = limx2 →−∞ F (a, x2 ) = 0 for all a ∈ R;
limx1 →∞,x2 →∞ F (x1 , x2 ) = 1.
(iv) For any a ∈ R, limy↑∞ F (a, y) = P (X1 ≤ a) and limy↑∞ F (y, a) =
P (X2 ≤ a).
Proof: Clearly,
0 ≤ P (X ∈ (x1 , y1 ] × (x2 , y2 ])
= P (x1 < X1 ≤ y1 , x2 < X2 ≤ y2 )
= P (X1 ≤ y1 , x2 < X2 ≤ y2 ) − P (X1 ≤ x1 , x2 < X2 ≤ y2 )
=
=
P (X1 ≤ y1 , X2 ≤ y2 ) − P (X1 ≤ y1 , X2 ≤ x2 )
−[P (X1 ≤ x1 , X2 ≤ y2 ) − P (X1 ≤ x1 , X2 ≤ x2 )]
F (y1 , y2 ) − F (y1 , x2 ) − F (x1 , y2 ) + F (x1 , x2 ).
This proves (i).
To prove (ii), note that for any sequence yin ↓ xi , i = 1, 2, the sets
An = (−∞, y1n ] × (−∞, y2n ] ↓ A ≡ (−∞, x1 ] × (−∞, x2 ]. Hence, by m.c.f.a
property of a probability measure,
F (y1n , y2n ) = P (X ∈ An ) ↓ P (A) = F (x1 , x2 ).
For (iii), note that (−∞, x1n ] × (−∞, a] ↓ ∅ for any sequence x1n ↓ −∞
and for any a ∈ R. Hence, again by the m.c.f.a. property,
F (x1n , a) ↓ 0
as
n → ∞.
By similar arguments, F (a, x2n ) ↓ 0 whenever x2n ↓ −∞. To prove the
last relation in (iii), apply the m.c.f.b. property to the sets (−∞, x1n ] ×
(−∞, x2n ] ↑ R2 for x1n ↑ ∞, x2n ↑ ∞.
The proof of part (iv) is similar.
!
Note that any function satisfying properties (i), (ii), (iii) of Proposition
2.2.3 determines a probability measure uniquely. This follows from the discussions in Section 1.3, as (1.3.9) and (1.3.10) follow from (i) and (iii)
(Problem 2.12). For a general k ≥ 1, an analog of property (i) above is
cumbersome to write down explicitly. Indeed, for any x ≤ y, now a sum
involving 2k -terms must be nonnegative. However, properties (ii), (iii), and
(iv) can be extended in an obvious way to the k-dimensional case. See
Problem 2.13 for a precise statement. Also, for a general k ≥ 1, functions
satisfying the properties listed in Problem 2.13 uniquely determine a probability measure on (Rk , B(Rk )).
2.3 Integration
Let (Ω, F, µ) be a measure space and f : Ω → R be a measurable function.
The goal of this section is to define the integral of f with respect to the
2.3 Integration
49
measure µ and establish some basic convergence results. The integral of a
nonnegative function taking only finitely many values is defined first, which
is then extended to all nonnegative measurable functions by approximation
from below. Finally, the integral of an arbitrary measurable function is defined using the decomposition of the function into its positive and negative
parts.
Definition 2.3.1: A function f : Ω → R̄ ≡ [−∞, ∞] is called simple
if there exist a finite set (of distinct elements) {c1 , . . . , ck } ∈ R̄ and sets
A1 , . . . , Ak ∈ F, k ∈ N such that f can be written as
f=
k
)
ci IAi .
(3.1)
i=1
Definition 2.3.2: (The integral of a simple nonnegative function). Let
f : Ω → R̄+ ≡ [0, ∞] be a simple nonnegative function on (Ω, F,& µ) with
the representation (3.1). The integral of f w.r.t. µ, denoted by f dµ, is
defined as
%
k
)
f dµ ≡
ci µ(Ai ) .
(3.2)
i=1
Here and in the following, the relation
0·∞=0
is adopted as a convention.
It may be verified that the value of the integral in (3.2) does not depend
$l
on the representation of f . That is, if f can be expressed as f = j=1 dj IBj
for some d1 , . . . , dl ∈ R̄+ (not necessarily distinct) and for some sets
$k
$l
B1 , . . . , Bl ∈ F, then
i=1 ci µ(Ai ) =
j=1 dj µ(Bj ), so that the value
of the integral remains unchanged (Problem 2.17). Also note that for the
f in Definition 2.3.2,
%
0≤
f dµ ≤ ∞.
The following proposition is an easy consequence of the definition and
the above remark.
Proposition 2.3.1: Let f and g be two simple nonnegative functions on
(Ω, F, µ). Then
&
&
&
(i) (Linearity) For α ≥ 0, β ≥ 0, (αf + βg)dµ = α f dµ + β gdµ.
(ii) (Monotonicity)
If
&
& f ≥ g a.e. (µ), i.e., µ({ω : ω ∈ Ω, f (ω) < g(ω)}) =
0, then f dµ ≥ gdµ.
50
2. Integration
(iii) &If f = g& a.e. (µ), i.e., if µ({ω : ω ∈ Ω, f (ω) )= g(ω)}) = 0, then
f dµ = gdµ.
Definition 2.3.3: (The integral of a nonnegative measurable function).
Let f : Ω → R̄+ be a nonnegative measurable function
on (Ω, F, µ). The
&
integral of f with respect to µ, also denoted by f dµ, is defined as
%
%
f dµ = lim
fn dµ,
(3.3)
n→∞
where {fn }n≥1 is any sequence of nonnegative simple functions such that
fn (ω) ↑ f (ω) for all ω.
&
Note that by Proposition 2.3.1 (ii), the sequence { fn dµ}n≥1 is nondecreasing, and hence the right side of (3.3) is well defined. That the right
side of (3.3) is the same for all such approximating sequences of functions
needs to be established and is the content of the following proposition. The
proof of this proposition exploits in a crucial way the m.c.f.b. and the finite
additivity of the set function µ (or, equivalently, the countable additivity
of µ).
Proposition 2.3.2: Let {fn }n≥1 and {gn }n≥1 be two sequences of simple
nonnegative measurable functions on (Ω, F, µ) to R̄+ such that as n → ∞,
fn (ω) ↑ f (ω) and gn (ω) ↑ f (ω) for all ω ∈ Ω. Then
%
%
fn dµ = lim
gn dµ.
(3.4)
lim
n→∞
n→∞
Proof: Fix N ∈ N and 0 < ρ < 1. It will now be shown that
%
%
lim
fn dµ ≥ ρ gN dµ.
n→∞
(3.5)
$k
Suppose that gN has the representation gN ≡ i=1 di IBi . Let Dn = {ω ∈
Ω : fn (ω) ≥ ρgN (ω)}, n ≥ 1. Since fn (ω) ↑ f (ω) for all ω, Dn ↑ D ≡
{ω : f (ω) ≥ ρgN (ω)} (Problem 2.18 (b)). Also since gN (ω) ≤ f (ω) and
0 < ρ < 1, D = Ω. Now writing fn = fn IDn + fn IDnc , it follows from
Proposition 2.3.1 that
%
%
%
fn dµ ≥
fn IDn dµ ≥ ρ gN IDn dµ
= ρ
k
)
i=1
di µ(Bi ∩ Dn ).
(3.6)
By the m.c.f.b. property, for each
& i ∈ N, µ(Bi ∩ Dn ) ↑ µ(Bi ∩ Ω) = µ(Bi )
as n → ∞. Since the sequence { fn dµ}n≥1 is nondecreasing, taking limits
2.3 Integration
in (3.6), yields (3.5). Next, letting ρ ↑ 1 yields limn→∞
for each N ∈ N and hence,
%
%
fn dµ ≥ lim
gn dµ .
lim
n→∞
&
fn dµ ≥
&
51
gN dµ
n→∞
By symmetry, (3.4) follows and hence, the proof is complete.
!
Remark 2.3.1: It is easy to verify that Proposition 2.3.2 remains valid if
{fn }n≥1 and {gn }n≥1 increase to f a.e. (µ).
Given a nonnegative measurable function f , one can always construct
a nondecreasing sequence {fn }n≥1 of nonnegative simple functions such
that fn (ω) ↑ f (ω) for all ω ∈ Ω in the following manner. Let {δn }n≥1
be a sequence of positive real numbers and let {Nn }n≥1 be a sequence of
positive integers such that as n → ∞, δn ↓ 0, Nn ↑ ∞ and Nn δn ↑ ∞.
Further, suppose that the sequence Pn ≡ {jδn : j = 0, 1, 2, . . . , Nn } is
nested, i.e., Pn ⊂ Pn+1 for each n ≥ 1. Now set
/
if jδn ≤ f (ω) < (j + 1)δn , j = 0, 1, 2, . . . , (Nn − 1)
jδn
fn (ω) =
Nn δn if f (ω) ≥ Nn δn .
(3.7)
A specific choice of δn and Nn is given by δn = 2−n , Nn = n2n .
Thus,&with the above choice of {fn }n≥1 in the definition of the Lebesgue
integral f dµ in (3.3), the range of f is subdivided into intervals of decreasing lengths. This is in contrast to the definition of the Riemann integral of
f over a bounded interval, which is defined via subdividing the domain of
f into finer subintervals.
Remark 2.3.2: In some cases it may be more appropriate to choose
the approximating sequence {fn }n≥1 in a manner different from (3.7). For
example, let Ω = {ωi : i ≥ 1} be a countable set, F = P(Ω), the power
set of Ω, and let µ be a measure on (Ω, F).& Then any function f : Ω →
R
$+∞≡ [0, ∞) is measurable and the integral f dµ coincides with the sum
i=1 f (ωi )µ({ωi }) as can be seen by choosing the approximating sequence
{fn }n≥1 as
/
f (ωi ) for i = 1, 2, . . . , n
fn (ωi ) =
0
for i > n.
Remark 2.3.3: The integral of a nonnegative measurable function can be
alternatively defined as
%
3%
4
f dµ = sup
gdµ : g nonnegative and simple, g ≤ f .
The equivalence of this to (3.3) is seen as &follows. Clearly the right side
above, say, M is greater than or equal to f dµ as in (3.3). Conversely,
52
2. Integration
there exist a sequence {gn }n≥1
& of simple nonnegative functions with gn ≤ f
for all n ≥ 1 such that limn gn dµ equals the supremum M defined above.
Now set hn = max{gj : 1 ≤ j ≤ n}, n ≥ 1. Now it can be verified that for
each
n ≥ 1, hn is nonnegative, simple, and satisfies hn ↑ f and also that
&
hn dµ converges to M (Problem 2.19 (b)).
Corollary 2.3.3: Let f and g be two nonnegative measurable functions
on (Ω, F, µ). Then, the conclusions of Proposition 2.3.1 remain valid for
such f and g.
Proof: This follows from Proposition 2.3.1 for nonnegative simple functions and Definition 2.3.3.
!
&
The definition of the integral f dµ of a nonnegative measurable function f in (3.3) makes it possible to interchange limits and integration in
a fairly routine manner. In particular, the following key result is a direct
consequence of the definition.
Theorem 2.3.4: (The monotone convergence theorem or MCT). Let
{fn }n≥1 and f be nonnegative measurable functions on (Ω, F, µ) such that
fn ↑ f a.e. (µ). Then
%
%
fn dµ.
(3.8)
f dµ = lim
n→∞
Remark 2.3.4: The important difference between (3.4) and (3.8) is that
in (3.8), the fn ’s need not be simple.
Proof: It is similar to the proof of Proposition 2.3.2. Let {gn }n≥1 be a
sequence of nonnegative simple functions on (Ω, F, µ) such that gn (ω) ↑
f (ω) for all ω. By hypothesis, there exists a set A ∈ F such that µ(Ac ) = 0
and for ω in A, fn (ω) ↑ f (ω). Fix k ∈ N and 0 < ρ < 1. Let Dn = {ω : ω ∈
A, fn (ω) ≥ ρgk (ω)}, n ≥ 1. Then, Dn ↑ D ≡ {ω : ω ∈ A, f (ω) ≥ ρgk (ω)}.
Since gk (ω) ≤ f (ω) for all ω, it follows that D = A. Now, by Corollary
2.3.3,
%
%
%
fn dµ ≥ fn IDn dµ ≥ ρ gk IDn dµ for all n ≥ 1.
By m.c.f.b.,
&
&
gk IA dµ = gk dµ as n → ∞, yielding
%
%
lim inf fn dµ ≥ ρ gk dµ
gk IDn dµ ↑
&
n→∞
for all 0 < ρ < 1 and all k ∈ N. Letting ρ ↑ 1 first and then k ↑ ∞, from
(3.3) one gets
%
%
lim inf fn dµ ≥ f dµ.
n→∞
2.3 Integration
By monotonicity (Corollary 2.3.3),
%
%
fn dµ ≤ f dµ
53
for all n ≥ 1
and so the proof is complete.
!
Corollary 2.3.5: Let {hn }n≥1 be a sequence of nonnegative measurable
functions on a measure space (Ω, F, µ). Then
>
% =)
∞
∞ %
)
hn dµ .
hn dµ =
n=1
Proof: Let fn =
By the MCT,
$n
i=1
But by Corollary 2.3.3,
n=1
hi , n ≥ 1, and let f =
%
%
fn dµ ↑ f dµ.
%
fn dµ =
n %
)
$∞
i=1
hi . Then, 0 ≤ fn ↑ f .
hi dµ.
i=1
Hence, the result follows.
!
Corollary 2.3.6: Let f be a nonnegative measurable function on a measurable space (Ω, F, µ). For A ∈ F, let
%
ν(A) ≡ f IA dµ.
Then, ν is a measure on (Ω, F).
Proof: Let {An }n≥1 be a sequence of disjoint sets in F. Let hn = f IAn
for n ≥ 1. Then by Corollary 2.3.5,
ν(
*
An )
=
n≥1
=
%
f I[
!
n≥1
% 1)
∞
n=1
An ] dµ
=
%
f·
1)
∞
n=1
2
IAn dµ
2
∞ %
∞
)
)
hn dµ =
hn dµ =
ν(An ).
n=1
n=1
!
Remark 2.3.5: Notice that µ(A) = 0 ⇒ ν(A) = 0. In this case ν is
said to be dominated by or absolutely continuous with respect to µ. The
Radon-Nikodym theorem (see Chapter 4) provides a converse to this. That
is, if ν and µ are two measures on a measurable space (Ω, F) such that
54
2. Integration
ν is dominated by µ and µ is σ-finite, then
there exists a nonnegative
&
measurable function f such that ν(A) = f IA dµ for all A in F. This f
is called a Radon-Nikodym derivative (or a density) of ν with respect to µ
dν
and is denoted by dµ
.
Theorem 2.3.7: (Fatou’s lemma). Let {fn }n≥1 be a sequence of nonnegative measurable functions on (Ω, F, µ). Then
%
%
(3.9)
lim inf fn dµ ≥ lim inf fn dµ.
n→∞
n→∞
Proof: Let gn (ω) ≡ inf{fj (ω) : j ≥ n}. Then {gn }n≥1 is a sequence of
nonnegative, nondecreasing measurable functions on (Ω, F, µ) such that
gn (ω) ↑ g(ω) ≡ lim inf fn (ω). By the MCT,
n→∞
%
But by monotonicity
%
fn dµ ≥
gn dµ ↑
%
%
gdµ.
gn dµ for each n ≥ 1,
and hence, (3.9) follows.
!
Remark 2.3.6: In (3.9), the inequality can be strict. For example, take
fn = I[n,∞) , n ≥ 1, on the measure space (R, B(R), m) where m is the
Lebesgue measure. For another example, consider fn = nI[0, n1 ] , n ≥ 1, on
the finite measure space ([0, 1], B([0, 1]), m).
Definition 2.3.4: (The integral of a measurable function). Let f be a real
valued measurable function on a measure space (Ω, F, µ). Let f + = f &I{f ≥0}
and f − = −f I{f <0} . The integral of f with respect to µ, denoted by f dµ,
is defined as
%
%
%
f dµ = f + dµ − f − dµ,
provided that at least one of the integrals on the right side is finite.
Remark 2.3.7: Note that both f + and f − are nonnegative measurable
+
−
+
−
functions
and
& f = f − f and |f | = f + f . Further, the integrals
& +
f dµ and f − dµ in Definition 2.3.4 are defined via Definition 2.3.3.
Definition 2.3.5: (Integrable functions). A measurable function
& f on a
measure space (Ω, F, µ) is said to be integrable with respect to µ if |f |dµ <
∞.
+
−
Since |f | = f +& + f − , it follows that
& −f is integrable iff both f and f are
+
integrable, i.e., f dµ < ∞ and f dµ < ∞. In the following, whenever
2.3 Integration
55
the integral of f or its integrability is discussed, the measurability of f will
be assumed to hold.
&
Remark on notation: The f dµ is also written as
%
%
f (ω)µ(dω) and
f (ω)dµ(ω).
Ω
Ω
Definition 2.3.6: Let f be a measurable function on a measure space
(Ω,&F, µ) and A ∈ F. Then integral of f over A with respect to µ, denoted
by A f dµ, is defined as
%
%
f dµ ≡ f IA dµ,
(3.10)
A
provided the right side is well defined.
Definition 2.3.7: (Lp -spaces). Let (Ω, F, µ) be a measure space and
0 < p ≤ ∞. Then Lp (Ω, F, µ) is defined as
Lp (Ω, F, µ) ≡
and
{f : |f |p is integrable with respect to µ}
/ %
0
=
f : |f |p dµ < ∞
for 0 < p < ∞,
3
L∞ (Ω, F, µ) ≡ f : µ({|f | > K}) = 0
for some
4
K ∈ (0, ∞) .
The following is an extension of Proposition 2.3.1 to integrable functions.
Proposition 2.3.8: Let f , g ∈ L1 (Ω, F, µ). Then
&
&
&
(i) (αf + βg)dµ = α f dµ + β gdµ for any α, β ∈ R.
&
&
(ii) f ≥ g a.e. (µ) ⇒ f dµ ≥ gdµ.
&
&
(iii) f = g a.e. (µ) ⇒ f dµ = gdµ.
Proof: It is easy to verify (Problem 2.32) that if h = h1 − h2 where h1 and
h2 are nonnegative functions in L1 (Ω, F, µ), then h is also in L1 (Ω, F, µ)
and
%
%
%
hdµ = h1 dµ − h2 dµ.
(3.11)
Note that h ≡ αf + βg can be written as
=
(a+ − α− )(f + − f − ) + (β + − β − )(g + − g − )
(α+ f + + α− f − + β + g + + β − g − )
− (α+ f − + α− f + + β + g − + β − g + )
= h1 − h 2 ,
say.
56
2. Integration
Since f , g ∈ L1 (Ω, F, µ), it follows that h1 and h2 ∈ L1 (Ω, F, µ). Further,
they are nonnegative and by (3.11), h ∈ L1 (Ω, F, µ) and
%
%
%
hdµ = h1 dµ − h2 dµ.
Now apply Proposition 2.3.1 to each of the terms on the right side and
regroup the terms to get
%
%
%
hdµ = α f dµ + β gdµ.
Proofs of (ii) and (iii) are left as an exercise.
!
Remark 2.3.8: By Proposition 2.3.8, if f and g ∈ L1 (Ω, F, µ), then so
does αf + βg for all α, β ∈ R. Thus, L1 (Ω, F, µ) is a vector space over R.
Further, if one sets
%
7f 71 ≡
|f |dµ,
!
and identifies a function" f with its equivalence class under the relation
f ∼ g iff f = g a.e. (µ) , then 7 · 71 defines a norm on L1 (Ω, F, µ) and
makes it a normed linear space (cf. Chapter 3). A similar remark also holds
for Lp (Ω, F, µ) for 1 < p ≤ ∞.
&
Next note that by Proposition 2.3.8, if f = 0 a.e. (µ), then f dµ = 0.
However, the converse is not true. But if f is nonnegative a.e. (µ), then the
converse is true as shown below.
Proposition 2.3.9: Let f be a measurable function on (Ω, F, µ) and let
f be nonnegative a.e. (µ). Then
%
f dµ = 0 iff f = 0 a.e. (µ).
Proof: It is enough to prove the “only if” part.
# Let D = {ω : f (ω) > 0}
and Dn = {ω : f (ω) > n1 }, n ≥ 1. Then D = n≥1 Dn . Since f ≥ f IDn
a.e. (µ),
%
%
1
0 = f dµ ≥ f IDn dµ ≥ µ(Dn ) ⇒ µ(Dn ) = 0 for each n ≥ 1.
n
Also Dn ↑ D and so by m.c.f.b.,
µ(D) = lim µ(Dn ) = 0 .
n→∞
Hence, Proposition 2.3.9 follows.
A dual to the above proposition is the next one.
!
2.3 Integration
57
Proposition 2.3.10: Let f ∈ L1 (Ω, F, µ). Then, |f | < ∞ a.e. (µ).
Proof: Let Cn = {ω : |f (ω)| > n}, n ≥ 1 and let C = {ω : |f (ω)| = ∞}.
Then Cn ↓ C and
&
%
%
|f |dµ
|f |dµ ≥ |f |ICn dµ ≥ nµ(Cn ) ⇒ µ(Cn ) ≤
.
n
&
Since |f |dµ < ∞, limn→∞ µ(Cn ) = 0. Hence, by m.c.f.a., µ(C) =
limn→∞ µ(Cn ) = 0.
!
The next result is a useful convergence theorem for integrals.
Theorem 2.3.11: (The extended dominated convergence theorem or
EDCT ). Let (Ω, F, µ) be a measure space and let fn , gn : Ω → R be
1F, R2-measurable functions such that |fn | ≤ gn a.e. (µ) for all n ≥ 1.
Suppose that
(i) gn → g a.e. (µ) and fn → f a.e. (µ);
&
&
(ii) gn , g ∈ L1 (Ω, F, µ) and |gn |dµ → |g|dµ as n → ∞. Then, f ∈
L1 (Ω, F, µ),
%
%
%
fn dµ = f dµ and
|fn − f |dµ = 0.
lim
lim
(3.12)
n→∞
n→∞
Two important special cases of Theorem 2.3.11 will be stated next. When
gn = g for all n ≥ 1, one has the standard version of the dominated
convergence theorem.
Corollary 2.3.12: (Lebesgue’s dominated
convergence theorem, or DCT ).
&
If |fn | ≤ g a.e. (µ) for all n ≥ 1, gdµ < ∞ and fn → f a.e. (µ), then
f ∈ L1 (Ω, F, µ),
%
%
%
fn dµ = f dµ and
|fn − f |dµ = 0.
lim
lim
(3.13)
n→∞
n→∞
Corollary 2.3.13: (The bounded convergence theorem, or BCT ). Let
µ(Ω) < ∞. If there exist a 0 < k < ∞ such that |fn | ≤ k a.e. (µ) and
fn → f a.e. (µ), then
%
%
%
fn dµ = f dµ and
|fn − f |dµ = 0.
lim
lim
(3.14)
n→∞
n→∞
Proof: Take g(ω) ≡ k for all ω ∈ Ω in the previous corollary.
Proof of Theorem 2.3.11: By Fatou’s lemma,
%
%
%
%
|f |dµ ≤ lim inf |fn |dµ ≤ lim inf |gn |dµ = |g|dµ < ∞.
n→∞
n→∞
!
58
2. Integration
Hence, f is integrable. For proving the second part, let hn = fn + gn
and γn = gn − fn , n ≥ 1. Then, {hn }n≥1 and {γn }n≥1 are sequences of
nonnegative integrable functions. By Fatou’s lemma and (ii),
%
%
(f + g)dµ =
lim inf hn dµ
n→∞
%
≤ lim inf hn dµ
n→∞
1%
2
%
= lim inf
gn dµ + fn dµ
n→∞
%
%
=
gdµ + lim inf fn dµ.
n→∞
Similarly,
%
(g − f )dµ ≤
%
gdµ − lim sup
n→∞
%
fn dµ.
&
&
&
By Proposition 2.3.8, (g ± f )dµ = gdµ ± f dµ. Hence,
%
%
f dµ ≤ lim inf fn dµ
n→∞
and
lim sup
n→∞
%
fn dµ ≤
%
f dµ
&
&
yielding that limn→∞ fn dµ = f dµ. For the last part, apply the above
argument to fn and gn replaced by f˜n ≡ |f − fn | and g̃n ≡ gn + |f |,
respectively.
!
Theorem 2.3.14: (An approximation theorem). Let µF be a LebesgueStieltjes measure on (R, B(R)). Let f ∈ Lp (R, B(R), µF ), 0 < p < ∞. Then,
for any δ > 0, there exist a step function h and a continuous function g
with compact support (i.e., g vanishes outside a bounded interval) such that
%
|f − h|p dµ < δ,
(3.15)
%
|f − g|p dµ < δ,
(3.16)
$k
where a step function h is a function of the form h = i=1 ci IAi with k <
∞, c1 , c2 , . . . , ck ∈ R and A1 , A2 , . . . , Ak being bounded disjoint intervals.
Proof: Let fn (·) = f (·)IBn (·) where Bn = {x : |x| ≤ n, |f (x)| ≤ n}. By
the DCT, for every # > 0, there exists an N$ such that for all n ≥ N$ ,
%
(3.17)
|f − fn |p dµF < #.
2.4 Riemann and Lebesgue integrals
59
Since |fN! (·)| ≤ N$ on [−N$ , N$ ], for any η > 0, there exists a simple
function f˜ such that
sup{|fN! (x) − f˜(x)| : x ∈ R} < η. (cf. (3.7))
(3.18)
Next, using Problem 1.32 (b), one can show that for any η > 0, there exists
a step function h (depending on η) such that
%
(3.19)
|f˜N! − h|p dµF < η.
Since f − h = f − fN! + fN! − f˜ + f˜ − h,
'
(
|f − h|p ≤ Cp |f − fN! |p + |fN! − f˜|p + |f˜ − h|p ,
where Cp is a constant depending only on p.
This in turn yields, from (3.17)–(3.19),
%
|f − h|p dµF ≤ Cp (# + (µF {x : |x| ≤ N$ })η p + η).
(3.20)
Given δ > 0, choose # > 0 first and then η > 0 such that the right side of
(3.20) above is less than δ.
Next, given any step function h and η > 0 there exists a continuous
function g with compact support such that µF {x : h(x) )= g(x)} < η (cf.
Problem 1.32 (g)). Now (3.16) follows from (3.15).
!
Remark 2.3.9: The approximation (3.16) remains valid if g is restricted
to the class of all infinitely differentiable functions with compact support.
Further it remains valid for 0 < p < ∞ for Lebesgue-Stieltjes measures on
any Euclidean space.
Remark 2.3.10: The above approximation theorem fails for p = ∞. For
example, consider the function f (x) ≡ 1 in L∞ (m).
2.4 Riemann and Lebesgue integrals
Let f be a real valued bounded function on a bounded interval [a, b]. Recall
the definition of the Riemann integral. Let P = {x0 , x1 , . . . , xn } be a finite
partition of [a, b], i.e., x0 = a < x1 < x2 < xn−1 < xn = b and ∆ =
∆(P ) ≡ max{(xi+1 − xi ) : 0 ≤ i ≤ n − 1} be the diameter of P . Let
Mi = sup{f (x) : xi ≤ x ≤ xi+1 } and mi = inf{f (x) : xi ≤ x ≤ xi+1 },
i = 0, 1, . . . , n − 1.
Definition 2.4.1: The upper- and lower-Riemann sums of f w.r.t. the
partition P are, respectively, defined as
U (f, P ) ≡
n−1
)
i=0
Mi · (xi+1 − xi )
(4.1)
60
2. Integration
and
L(f, P ) ≡
n−1
)
i=0
mi · (xi+1 − xi ).
(4.2)
It is easy to verify that if Q = {y0 , y1 , . . . , yk } is another partition satisfying
P ⊂ Q, then U (f, P ) ≥ U (f, Q) ≥ L(f, Q) ≥ L(f, P ). Let P denote the
collection of all finite partitions of [a, b].
&
Definition 2.4.2: The upper-Riemann integral f is defined as
%
f = inf U (f, P )
(4.3)
&
P ∈P
and the lower-Riemann integral f , by
%
f = sup L(f, P ).
(4.4)
P ∈P
It can be shown (cf. Problem 2.23) that if {Pn }n≥1 is any sequence of
partitions such that ∆(Pn ) → 0 as n → ∞ and Pn ⊂ Pn+1 for each n ≥ 1,
&
&
then U (Pn , f ) ↓ f and L(Pn , f ) ↑ f .
Definition 2.4.3: f is said to be Riemann integrable if
%
%
f = f.
The common value is denoted by
?
[a,b]
(4.5)
f.
Fix a sequence {Pn }n≥1 of partitions such that Pn ⊂ Pn+1 for all n ≥ 1
and ∆(Pn ) → 0 as n → ∞. Let Pn = {xn0 = a < xn1 < xn2 . . . < xnkn =
b}. For i = 0, 1, . . . , kn − 1, let
φn (x) ≡
ψn (x) ≡
sup{f (t) : xi ≤ t ≤ xi+1 }, x ∈ [xi , xi+1 )
inf{f (t) : xi ≤ t ≤ xi+1 }, x ∈ [xi , xi+1 )
and let φn (b) = ψn (b) = 0. Then, φn and ψn are step functions on [a, b]
and hence, are Borel measurable. Further, since f is bounded, so are φn
and ψn and hence are integrable on [a, b] w.r.t. the&Lebesgue measure m.
The Lebesgue integrals of φn and ψn are given by [a,b] φn dm = U (Pn , f )
&
and [a,b] ψn dm = L(Pn , f ).
#
It can be shown (Problem 2.24) that for all x )∈ n≥1 Pn , as n → ∞,
φn (x) → φ(x) ≡ lim sup{f (y) : |y − x| < δ}
(4.6)
ψn (x) → ψ(x) ≡ lim inf{f (y) : |y − x| < δ}
(4.7)
δ↓0
and
δ↓0
2.5 More on convergence
61
Thus, φ and ψ, being limits of Borel measurable functions (except possibly
on a countable set), are Borel measurable. By the BCT (Corollary 2.3.13),
%
%
%
f = lim
φn dm = φdm
n→∞
and
%
f = lim
n→∞
%
ψn dm =
%
ψdm.
&
&
&
Thus, f is Riemann integrable on [a, b], iff φdm = ψdm, iff (φ −
ψ)dm = 0. Since φ(x) ≥ f (x) ≥ ψ(x) for all x, this holds iff φ = f = ψ a.e
[m]. It can be shown that f is continuous at x0 iff φ(x0 ) = f (x0 ) = ψ(x0 )
(Problem 2.8). Summarizing the above discussion, one gets the following
theorem.
Theorem 2.4.1: Let f be a bounded function on a bounded interval [a, b].
Then f is Riemann integrable on [a, b] iff f is continuous a.e. (m) on
[a, b]. In
? [a, b] and the Lebesgue in& this case, f is Lebesgue integrable on
tegral [a,b] f dm equals the Riemann integral [a,b] f , i.e., the two integrals
coincide.
It should be noted that Lebesgue integrability need not imply Riemann
integrability. For example, consider f (x) ≡ IQ1 (x) where Q1 is the set of
rationals in [0, 1] (Problem 2.25).
The functions φ and ψ defined in (4.6) and (4.7) above are called, respectively, the upper and the lower envelopes of the function f . They are
semicontinuous in the sense that for each α ∈ R, the sets {x : φ(x) < α}
and {x : ψ(x) > α} are open (cf. Problem 2.8).
Remark 2.4.1: The key difference in the definitions of Riemann and
Lebesgue integrals is that in the former the domain of f is partitioned
while in the latter the range of f is partitioned.
2.5 More on convergence
Let {fn }n≥1 and f be measurable functions from a measure space (Ω, F, µ)
to R̄, the set of extended real numbers. There are several notions of convergence of {fn }n≥1 to f . The following two have been discussed earlier.
Definition 2.5.1: {fn }n≥1 converges to f pointwise if
lim fn (ω) = f (ω)
n→∞
for all ω in Ω.
Definition 2.5.2: {fn }n≥1 converges to f almost everywhere (µ), denoted
by fn → f a.e. (µ), if there exists a set B in F such that µ(B) = 0 and
lim fn (ω) = f (ω)
n→∞
for all ω∈B c .
(5.1)
62
2. Integration
Now consider some more notions of convergence.
Definition 2.5.3: {fn }n≥1 converges to f in measure (w.r.t. µ), denoted
by fn −→m f , if for each # > 0,
!
"
lim µ {|fn − f | > #} = 0.
(5.2)
n→∞
to f in Lp (µ),
Definition 2.5.4: Let 0 <&p < ∞. Then, {fn }n≥1 converges
&
p
denoted by fn −→L f , if |fn |p dµ < ∞ for all n ≥ 1, |f |p dµ < ∞ and
%
|fn − f |p dµ = 0.
(5.3)
lim
n→∞
Clearly, (5.3) is equivalent to 7fn − f 7p → 0 as n → ∞, where for any
F-measurable function g and any 0 < p < ∞,
'%
(min{ p1 ,1}
7g7p =
.
(5.4)
|g|p dµ
For p = 1, this is also called convergence in absolute deviation and for
p = 2, convergence in mean square.
Definition 2.5.5: {fn }n≥1 converges to f uniformly (over Ω) if
lim
n→∞
sup{|fn (ω) − f (ω)| : ω ∈ Ω} = 0.
(5.5)
Definition 2.5.6: {fn }n≥1 converges to f in L∞ (µ) if
lim 7fn − f 7∞ = 0,
n→∞
where for any F-measurable function g on (Ω, F, µ),
:
;
7g7∞ = inf K : K ∈ (0, ∞), µ({|g| > K}) = 0 .
(5.6)
(5.7)
Definition 2.5.7: {fn }n≥1 converges to f nearly uniformly (µ) if for every
# > 0, there exists a set A ∈ F such that µ(A) < # and on Ac , fn → f
uniformly, i.e., sup{|fn (ω) − f (ω)| : ω ∈ Ac } → 0 as n → ∞.
The notion of convergence in Definition 2.5.7 is also called almost uniform
convergence in some books (cf. Royden (1988)). The sequence fn ≡ nI[0,1/n]
on (Ω = [0, 1], B([0, 1]), m) converges to f ≡ 0 nearly uniformly but not in
L∞ (m).
When µ is a probability measure, there is another useful notion of convergence, known as convergence in distribution, that is defined in terms
of the induced measures {µfn−1 }n≥1 and µf −1 . This notion of convergence
will be treated in detail in Chapter 9.
2.5 More on convergence
63
Next, the connections between some of these notions of convergence are
explored.
Theorem 2.5.1: Suppose that µ(Ω) < ∞. Then, fn → f a.e. (µ) implies
fn −→m f .
The proof is left as an exercise (Problem 2.26). The hypothesis that
‘µ(Ω) < ∞’ in Theorem 2.5.1 cannot be dispensed with as seen by taking
fn = I[n,∞) on R with Lebesgue measure. Also, fn −→m f does not imply
fn → f a.e. (µ) (Problem 2.46), but the following holds.
Theorem 2.5.2: Let fn −→m f . Then, there exists a subsequence {nk }k≥1
such that fnk → f a.e. (µ).
Proof: Since fn −→m f , for each integer k ≥ 1, there exists an nk such
that for all n ≥ nk
':
;(
µ |fn − f | > 2−k
(5.8)
< 2−k .
W.l.o.g., that assume nk+1 > nk for all k ≥ 1. Let Ak = {|fnk − f | >
2−k }. By Corollary 2.3.5,
>
% =)
∞
∞ %
∞
)
)
IAk dµ =
IAk dµ =
µ(Ak ),
k=1
k=1
k=1
$∞
which, by (5.8), is finite.
by Proposition 2.3.10, k=1 IAk < ∞ a.e.
$Hence,
∞
(µ). Now observe that k=1 IAk (ω) < ∞ ⇒ |fnk (ω) − f (ω)| ≤ 2−k for all
k large ⇒ limk→∞ fnk (ω) = f (ω). Thus, fnk → f a.e. (µ).
!
Remark 2.5.1: From the above result it follows that the extended dominated convergence theorem (Theorem 2.3.11) remains valid if convergence
a.e. of {fn }n≥1 and of {gn }n≥1 are replaced by convergence in measure for
both (Problem 2.37).
Theorem 2.5.3: Let {fn }n≥1 , f be measurable functions on a measure
p
space (Ω, F, µ). Let fn −→L f for some 0 < p < ∞. Then fn −→m f .
Proof: For each # > 0, let An = {|fn − f | ≥ #}, n ≥ 1. Then
%
%
p
|fn − f |p dµ ≥ #p µ(An ).
|fn − f | dµ ≥
An
Since fn → f in Lp ,
&
|fn − f |p dµ → 0 and hence, µ(An ) → 0.
p
!
It turns out that fn −→m f need not imply fn −→L f , even if {fn :
n ≥ 1} ∪ {f } is contained in Lp (Ω, F, µ). For example, let fn = nI[0, n1 ] and
f ≡ 0 on the Lebesgue space ([0,
& 1], B([0, 1]), m), where m is the Lebesgue
measure. Then fn −→m f but |fn − f | ≡ 1 for all n ≥ 1. However, under
64
2. Integration
some additional conditions, convergence in measure does imply convergence
in Lp . Here are two results in this direction.
Theorem 2.5.4: (Scheffe’s theorem). Let {fn }n≥1 , f be a collection of
nonnegative
& functions
& on a measure space (Ω, F, µ). Let fn → f
& measurable
a.e. (µ), fn dµ → f dµ and f dµ < ∞. Then
%
lim
|fn − f |dµ = 0.
n→∞
gn+ and gn− go
Proof: Let gn = f − fn , n ≥ 1. Since fn → f a.e. (µ), both
&
+
to zero a.e. (µ). Further, 0 ≤ gn ≤ f and by hypothesis f dµ < ∞. Thus,
by the DCT, it follows that
%
gn+ dµ → 0.
&
&
&
g&n dµ → 0.& Thus, gn− dµ = gn+ dµ −
&
&Next, note that by hypothesis,
!
gn dµ → 0 and hence, |gn |dµ = gn+ dµ + gn− dµ → 0.
Corollary 2.5.5: Let {fn }n≥1 , f be probability
& density &functions on a
measure space (Ω, F, µ). That is, for all n ≥ 1, fn dµ = f dµ = 1 and
fn , f ≥ 0 a.e. (µ). If fn → f a.e. (µ), then
%
|fn − f |dµ = 0.
lim
n→∞
Remark 2.5.2: The above theorem and the corollary remain valid if the
convergence of fn to f a.e. (µ) is replaced by f −→m f .
, n = 1, 2, . . . and $
{pk }k≥1 be sequences of
Corollary 2.5.6: Let {pnk }k≥1$
∞
∞
nonnegative numbers satisfying $k=1 pnk = 1 = k=1 pk . Let pnk → pk
∞
as n → ∞ for each k ≥ 1. Then k=1 |pnk − pk | → 0.
Proof: Apply Corollary 2.5.5 with µ = the counting measure on (N, P(N)).
!
A more general result in this direction that does not require fn , f to be
nonnegative involves the concept of uniform integrability. Let {fλ : λ ∈ Λ}
be a collection of functions in L1 (Ω, F, µ). Then for each λ ∈ Λ, by the
DCT and the integrability of fλ ,
%
aλ (t) ≡
|fλ |dµ → 0 as t → ∞.
(5.9)
{|fλ |>t}
The notion of uniform integrability requires that the integrals aλ (t) go to
zero uniformly in λ ∈ Λ as t → ∞.
2.5 More on convergence
65
Definition 2.5.8: The collection of functions {fλ : λ ∈ Λ} in L1 (Ω, F, µ)
is uniformly integrable (or UI, in short) if
sup aλ (t) → 0 as t → ∞ .
(5.10)
λ∈Λ
The following proposition summarizes some of the main properties of UI
families of functions.
Proposition 2.5.7: Let {fλ : λ ∈ Λ} be a collection of µ-integrable functions on (Ω, F, µ).
(i) If Λ is finite, then {fλ : λ ∈ Λ} is UI.
&
(ii) If K ≡ sup{ |fλ |1+$ dµ : λ ∈ Λ} < ∞ for some # > 0, then {fλ : λ ∈
Λ} is UI.
&
(iii) If |fλ | ≤ g a.e. (µ) and gdµ < ∞, then {fλ : λ ∈ Λ} is UI.
(iv) If {fλ : λ ∈ Λ} and {gγ : γ ∈ Γ} are UI, then so is {fλ + gγ : λ ∈
Λ, γ ∈ Γ}.
(v) If {fλ : λ ∈ Λ} is UI and µ(Ω) < ∞, then
%
sup |fλ |dµ < ∞.
(5.11)
λ∈Λ
&
Proof: By hypothesis, aλ (t) ≡ {|fλ |>t} |fλ |dµ → 0 as t → ∞ for each
λ. If Λ is finite this implies that sup{aλ (t) : λ ∈ Λ} → 0 as t → ∞. This
proves (i).
To prove (ii), note that since 1 < |fλ |/t on the set {|fλ | > t},
%
8
9$
|fλ | |fλ |/t dµ ≤ Kt−$ → 0 as t → ∞.
sup aλ (t) ≤ sup
λ∈Λ
λ∈Λ
|fλ |>t
For (iii), note that for each t ∈ R, the function ht (x) ≡ xI(t,∞) (x), x ∈ R
is nondecreasing. Hence, by the integrability of g,
%
%
%
sup aλ (t) = sup ht (|fλ |)dµ ≤ ht (g)dµ =
gdµ → 0 as t → ∞.
λ∈Λ
λ∈Λ
{g>t}
&
To prove
(iv), for t > 0, let a(t) = supλ∈Λ ht (|fλ |)dµ and b(t) =
&
supγ∈Γ ht (|gγ |)dµ. Then, for any λ ∈ Λ, and γ ∈ Γ,
%
%
"
!
|fλ + gγ |dµ = ht |fλ + gγ | dµ
{|fλ +gγ |>t}
≤
%
!
"
ht 2 max{|fλ |, |gγ |} dµ
66
2. Integration
%
!
" !
"
ht 2|fλ | I |fλ | ≥ |gγ | dµ +
%
%
≤ 2
|fλ |dµ + 2
=
{|fλ |>t/2}
≤
%
!
" !
"
ht 2|gγ | I |fλ | < |gγ | dµ
{|gγ |>t/2}
2[a(t/2) + b(t/2)].
|gγ |dµ
By hypothesis, both a(t) and b(t) → 0 as t → ∞, thus proving (iv).
Next consider (v). Since {fλ } : λ ∈ Λ} is UI, there exists a T > 0 such
that
%
!
"
sup hT |fλ | dµ ≤ 1.
λ∈Λ
Hence,
sup
λ∈Λ
%
|fλ |dµ
=
sup
λ∈Λ
@%
{|fλ |≤T }
|fλ |dµ +
≤ T µ(Ω) + 1 < ∞.
%
hT (|fλ |)dµ
This completes the proof of the proposition.
A
!
Remark 2.5.3: In the above proposition, part (ii) can be improved as
φ(x)
follows: Let
& φ : R+ → R+ be nondecreasing and x ↑ ∞ as x ↑ ∞.
If supλ∈Λ φ(|fλ |)dµ < ∞, then {fλ : λ ∈ Λ} is UI (Problem 2.27). A
converse to this result is true. That is, if {fλ : λ ∈ Λ} is UI then there
exists such a function φ. Some examples of such φ’s are φ(x) = xk , k >
1, φ(x) = x(log x)β I(x > 1), β > 0, and φ(x) = exp(βx), β > 0.
In part (v) of Proposition 2.5.7, (5.11) does not imply UI. For example,
consider the sequence of functions fn = nI[0, n1 ] , n = 1, 2, . . . on [0, 1]. On
the other hand, (5.11) with an additional condition becomes necessary and
sufficient for UI.
Proposition 2.5.8: Let f ∈ L1 (Ω,& F, µ). Then for every # > 0, there
exists a δ > 0 such that µ(A) < δ ⇒ A |f |dµ < #.
Fix # > 0. By the DCT, there exists a t > 0 such that
$
|f
|dµ < #/2. Hence, for any A ∈ F with µ(A) ≤ δ ≡ 2t
,
{|f |>t}
Proof:
&
%
A
|f |dµ
≤
%
A∩{|f |≤t}
≤ tµ(A) +
≤ #,
proving the claim.
%
|f |dµ +
{|f |>t}
%
{|f |>t}
|f |dµ
|f |dµ
!
2.5 More on convergence
67
The above proposition shows that for every f ∈ L1 (Ω, F, µ), the measure
(cf. Corollary 2.3.6)
%
ν|f | (A) ≡
A
(5.12)
|f |dµ
on (Ω, F) satisfies the condition that ν|f | (A) is small if µ(A) is small, i.e.,
for every # > 0, there exists a δ > 0 such that µ(A) < δ ⇒ ν|f | (A) < #.
This property is referred to as the absolute continuity of the measure νf
w.r.t. µ.
Definition 2.5.9: Given a family {fλ : λ ∈ Λ} ⊂ L1 (Ω, F, µ), the measures {ν|fλ | : λ ∈ Λ} as defined in (5.12) above are uniformly absolutely
continuous w.r.t. µ (or u.a.c. (µ), in short) if for every # > 0, there exists
a δ > 0 such that
3
4
µ(A) < δ ⇒ sup ν|fλ | (A) : λ ∈ Λ < #.
Theorem 2.5.9: Let {fλ : λ& ∈ Λ} ⊂ L1 (Ω, F, :µ) and µ(Ω) < ;∞. Then,
{fλ : λ ∈ Λ} is UI iff supλ∈Λ |fλ |dµ < ∞ and ν|fλ | (·) : λ ∈ Λ is u.a.c.
(µ).
;
:
Proof: Let fλ : λ ∈ Λ be UI. Then, since µ(Ω) < ∞, L1 boundedness
of {fλ : λ ∈ Λ} follows from Proposition 2.5.7 (v). To establish u.a.c. (µ),
fix # > 0. By UI, there exists an N such that
%
|fλ |dµ < #/2.
sup
λ∈Λ
{|fλ |>N }
Let δ =
and let A ∈ F be such that µ(A) < δ. Then, as in the proof of
Proposition 2.5.8 above,
%
|fλ |dµ ≤ N µ(A) + #/2 < #.
sup
$
2N
λ∈Λ
A
;
Conversely, suppose fλ : λ ∈ Λ is L1 bounded and u.a.c. (µ). Then,
for every # > 0, there exists a δ$ > 0 such that
%
|fλ |dµ < # if µ(A) < δ$ .
(5.13)
sup
:
λ∈Λ
A
Also, for any nonnegative f in L1 (Ω, F, µ) and t > 0,
!
"
tµ {f ≥ t} , which implies that
&
!
"
f dµ
µ {f ≥ t} ≤
.
t
&
f dµ ≥
&
{f >t}
f dµ ≥
(This is known as Markov’s inequality − see Chapter 3). Hence, it follows
that
%
8
9
sup µ({|fλ | ≥ t}) ≤ sup |fλ |dµ / t.
(5.14)
λ∈Λ
λ∈Λ
68
2. Integration
8
9
&
Now given # > 0, choose T$ such that supλ∈Λ |fλ |dµ /T$ < δ$ where δ$
is as in (5.13). Then, by (5.14), it follows that
%
sup
|fλ |dµ < #,
λ∈Λ
{|fλ |≥T! }
i.e., {fλ : λ ∈ Λ} is UI.
!
Theorem 2.5.10: Let (Ω, F, µ) be a measure space with µ(Ω) < ∞, and
let {fn : n ≥ 1} ⊂ L1 (Ω, F, µ) be such that fn → f a.e. (µ) and f is
1F, B(R)2-measurable. If {fn : n ≥ 1} is UI, then f is integrable and
%
|fn − f |dµ = 0.
lim
n→∞
Remark 2.5.4:& In view &of Proposition 2.5.7 (iii), Theorem 2.5.10 yields
convergence of f dµ to f dµ under weaker conditions than the DCT,
provided µ(Ω) < ∞. However, even under the restriction µ(Ω) < ∞, UI
of
& {fn }n≥1&is a sufficient, but not a necessary condition for convergence of
In the
fn dµ to f dµ (Problem 2.28).
& special case where fn ’s are non&
negative (and µ(Ω) < ∞), fn dµ → f dµ < ∞ if and only if {fn }n≥1
are UI (Problem 2.29). On the other hand, when
& µ(Ω) =& +∞, UI is no
longer sufficient to guarantee the convergence of fn dµ to f dµ (Problem
2.30). Thus, the notion of UI is useful mainly for finite measures and, in
particular, probability measures.
Proof: By Proposition 2.5.7 and Fatou’s lemma,
%
%
%
|f |dµ ≤ lim inf |fn |dµ ≤ sup |fn |dµ < ∞
n→∞
n≥1
and hence f is integrable.
Next for n ∈ N, t ∈ (0, ∞), define the functions gn = |fn − f |, g n,t =
gn I(|gn | ≤ t) and ḡn,t = gn I(|gn | > t). Since {fn }n≥1 is UI and f is
integrable, by Proposition 2.5.7 (iv), {gn }n≥1 is UI. Hence, for any # > 0,
there exists t$ > 0 such that
%
sup ḡn,t dµ < # for all t ≥ t$ .
(5.15)
n≥1
Next note that for any t > 0, since fn → f a.e. (µ), g n,t → 0 a.e. (µ), and
&
|g n,t | ≤ t and (t)dµ = tµ(Ω) < ∞. Hence, by the DCT,
lim
n→∞
%
g n,t dµ = 0
for all t > 0.
(5.16)
2.5 More on convergence
69
By (5.15) and (5.16) with t = t$ , we get
%
%
%
g n,t dµ ≤ #
0 ≤ lim sup |fn − f |dµ ≤ sup ḡn,t dµ + lim
n→∞
n→∞
n≥1
for all # > 0. Hence,
&
|fn − f |dµ → 0 as n → ∞.
!
The next result concerns connections between the notions of almost everywhere convergence and almost uniform convergence.
Theorem 2.5.11: (Egorov’s theorem). Let fn → f a.e. (µ) and µ(Ω) <
∞. Then fn → f nearly uniformly (µ) as in Definition 2.5.7.
Proof: For j, n, r ∈ N, define the sets
Ajr
Bnr
= {ω : |fj (ω) − f (ω)| ≥ r−1 }
*
,
=
Ajr , Cr =
Bnr
j≥n
D
=
*
n≥1
Cr .
r≥1
It is easy to verify that D is the set of points where fn does not convergence
to f . That is,
D = {ω : fn (ω) )→ f (ω)}.
By hypothesis µ(D) = 0. This implies µ(Cr ) = 0 for all r ≥ 1. Since Bnr ↓
Cr as n → ∞ and µ(Ω) < ∞, by m.c.f.a., µ(Cr ) = 0 ⇒ limn→∞ µ(Bnr ) =
0. So#for all r ≥ 1, # > 0, there exists kr ∈+N such that µ(Bkr r ) < 2$r . Let
A = r≥1 Bkr r . Then µ(A) < ε and Ac = r≥1 Bkcr r . Also, for each r ∈ N,
Ac ⊂ Bkcr r . So for any n ≥ 1,
sup{|fn (ω) − f (ω)| : ω ∈ Ac }
≤ sup{|fn (ω) − f (ω)| : ω ∈ Bkcr r } for all r ∈ N
1
≤
if n ≥ kr .
r
That is, sup{|fn (ω) − f (ω)| : ω ∈ Ac } → 0 as n → ∞.
!
Theorem 2.5.12: (Lusin’s theorem). Let F : R → R be a nondecreasing
function and let µ = µ∗F be the corresponding Lebesgue-Stieltjes measure
!
on (R, Mµ∗F ). Let f : R → R be 1Mµ∗F , B(R)2-measurable. Let µ {x :
"
|f (x)| = ∞} = 0. Then
! for every # > 0, "there exists a continuous function
g : R → R such that µ {x : f (x) )= g(x)} < #.
!
Proof:
Fix
−∞
<
a
<
b
<
∞.
Since
µ([a,
b])
<
∞
and
µ
{x : |f (x)| =
"
∞} = 0, for each # > 0, there exists K ∈ (0, ∞) such that
"
!
#
µ {x : a ≤ x ≤ b, |f (x)| > K} < .
2
70
2. Integration
Let fK (x) = f (x)I[a,b] (x)I{|f |≤K} (x). Then fK : R → R is bounded,
1Mµ∗F , B(R)2-measurable and zero for |x| > K. Consider now the following
claim: For every # > 0, there exists a continuous function g : [a, b] → R
such that
!
"
#
µ {x : fK (x) )= g(x), a ≤ x ≤ b} < .
(5.17)
2
Clearly this implies that
!
"
µ {x : a ≤ x ≤ b, f (x) )= g(x)} < #.
(5.18)
δ
, apply
Fix δ > 0. Now for each n ∈ Z, take [a, b] = [n, n + 1], # = 2|n|+2
.
Let
g̃
be
a
continuous
(5.18) and call the resulting continuous
function
g
n
"
!
function from R → R such that µ {x : n ≤ x ≤ n + 1, g̃(x) )= gn (x)} <
δ
. This can be done by setting g̃(x) = gn (x) for n ≤ x ≤ (n + 1) − δn
2|n|+2
B
C
δ
and linear on (n + 1) − δn , n + 1 for some 0 < δn < 2|n|+3
. Then
!
"
µ {x : f (x) )= g̃(x)}
≤
≤
∞
)
n=−∞
∞
)
!
"
µ {x : n ≤ x ≤ n + 1, f (x) )= g̃(x)}
!
"
µ {x : n ≤ x ≤ n + 1, f (x) )= gn (x)}
n=−∞
∞
)
+
< 2
n=−∞
∞
)
n=−∞
!
"
µ {x : n ≤ x ≤ n + 1, gn (x) )= g̃(x)}
δ
2|n|+2
≤ 2δ.
So it suffices to establish (5.17). Since fK : R → R is bounded and
1Mµ∗F , B(R)2-measurable, for each # > 0, there exists a simple function
$k
s(x) ≡ i=1 ci IAi (x), with Ai ⊂ [a, b], Ai ∈ Mµ∗F , {Ai : 1 ≤ i ≤ k} are
disjoint, µ(Ai ) < ∞, and ci ∈ R for i = 1, . . . , k, such that |f (x)−s(x)| < 4$
for all a ≤ x ≤ b. By Theorem 1.3.4, for each Ai and η > 0, there exist
a finite number of open disjoint intervals Iij = (aij , bij ), j = 1, . . . ni such
that
.
ni
*
η
.
Iij <
µ Ai +
2k
j=1
Now as in Problem 1.32 (g), there exists a continuous function gij such
that
'
(
η
−1
µ gij
{1} + Iij <
, j = 1, 2, . . . , ni , i = 1, 2, . . . , k.
kni
Let
gi ≡
ni
)
j=1
gij , 1 ≤ i ≤ k.
2.6 Problems
71
$k
Then µ(Ai + gi−1 {1}) < kη . Let g = i=1 ci gi . Then µ({s )= g}) < η. Hence
for every # > 0, η > 0, there is a continuous function g$,η : [a, b] → R such
that
;"
!:
µ x : a ≤ x ≤ b, |fK (x) − g$,η (x)| > # < η.
Now for each n ≥ 1, let
hn (·) ≡ g 21n , 21n (·)
Then, µ(An ) <
1
2n
:
1;
and An = x : a ≤ x ≤ b, |fK (x) − hn (x)| > n .
2
and hence
∞
)
n=1
By the MCT, this implies that
∞
)
n=1
µ(An ) < ∞.
&
[a,b]
! $∞
n=1 IAn
IAn < ∞ a.e.
"
dµ < ∞ and hence
µ.
Thus hn → fK a.e. µ on [a, b]. By Egorov’s theorem for any # > 0, there is
a set A$ ∈ B([a, b]) such that
µ(Ac$ ) < #/2
and hn → fK
uniformly on A$ .
By the inner regularity (Corollary 1.3.5) of µ, there is a compact set
D ⊂ A$ such that
µ(A$ \D) < #/2.
Since hn → fK uniformly on A$ , fK is continuous on A$ and hence on D.
It can be shown that there exists a continuous function g : [a, b] → R such
that g = fK on D (Problem 2.8 (e)). A more general result extending a
continuous function defined on a closed set to the whole space is known
as Tietze’s extension theorem (see Munkres (1975)). Thus µ({x : a ≤ x ≤
b, fK (x) )= g(x)}) < #. This completes the proof of (5.17) and hence that
of the proposition.
!
Remark 2.5.5: (Littlewood’s principles). As pointed out in Section 1.3,
Theorems 1.3.4, 2.5.11, and 2.5.12 constitute J. E. Littlewood’s three principles: every Mµ∗F measurable set is nearly a finite union of intervals; every
a.e. convergent sequence is nearly uniformly convergent; and every Mµ∗F measurable function is nearly continuous.
2.6 Problems
2.1 Let Ωi , i = 1, 2 be two nonempty sets and T : Ω1 → Ω2 be a map.
Then for any collection {Aα : α ∈ I} of subsets of Ω2 , show that
'*
(
*
! "
=
Aα
T −1 Aα
T −1
α∈I
α∈I
72
2. Integration
and T −1
',
α∈I
Aα
(
,
=
α∈I
! "
T −1 Aα .
"c
!
Further, T −1 (A) = T −1 (Ac ) for all A ⊂ Ω2 . (These are known as
de Morgan’s laws.)
2.2 Let {Ai }i≥1 be a collection of disjoint sets in a measurable space
(Ω, F).
of 1F, B(R)2-measurable functions
(a) Let {gi }i≥1 be a collection$
∞
from Ω to R. Show that
i=1 IAi gi converges on R and is
1F, B(R)2-measurable.
(b) Let G ≡ σ1{Ai : i ≥ 1}2. Show that h : Ω → R is 1G, B(R)2measurable iff g(·) is constant on each Ai .
2.3 Let f, g : Ω → R be 1F, B(R)2-measurable. Set
h(ω) =
f (ω)
I(g(ω) )= 0),
g(ω)
ω ∈ Ω.
Verify that h is 1F, B(R)2-measurable.
2.4 Let g : Ω → R̄ be such that for every r ∈ R, g −1 ((−∞, r]) ∈ F. Show
that g is 1F, B(R̄)2-measurable.
2.5 Prove Proposition 2.1.6.
(Hint: Show that
σ1{fλ : λ ∈ Λ}2 =
*
L∈C
σ1{fλ : λ ∈ L}2
where C is the collection of all countable subsets of Λ.)
2.6 Let Xi , i = 1, 2, 3 be random variables on a probability space
(Ω, F, P ). Consider the random equation (in t ∈ R):
X1 (ω)t2 + X2 (ω)t + X3 (ω) = 0.
(6.1)
(a) Show that A ≡ {ω ∈ Ω : Equation (6.1) has two distinct real
roots} ∈ F.
(b) Let T1 (ω) and T2 (ω) denote the two roots of (6.1) on A. Let
/
Ti (w) on A
,
fi (w) =
0
on Ac
i = 1, 2. Show that (f1 , f2 ) is 1F, B(R2 )2-measurable.
2.6 Problems
73
""
!!
2.7 Let M ≡ Xij , 1 ≤ i, j ≤ k, be a (random) matrix of random
variables Xij defined on a probability space (Ω, F, P ).
(a) Show that Y1 ≡ det(M ) (the determinant of M ) and Y2 ≡
tr(M )(the trace of M ) are both 1F, B(R)2-measurable.
(b) Show also that Y3 ≡ the largest eigenvalue of M $ M is 1F, B(R)2measurable, where M $ is the transpose of M .
!
!
M M x)
.)
(Hint: Use the result that Y3 = sup (x (x
! x)
x.=0
2.8 Let f : R → R. Let f¯(x) = inf δ>0 sup|y−x|<δ f (y) and f (x) =
supδ>0 inf |y−x|<δ f (y), x ∈ R.
(a) Show that for any t ∈ R,
{x : f¯(x) < t}
is open and hence, f¯ is 1B(R), B(R)2-measurable.
(b) Show that for any t > 0,
{x : f¯(x) − f (x) < t} ≡
*
{x : f¯(x) < t + r, f (x) > r}
r∈Q
and hence is open.
(c) Show that f is continuous at some x0 in R iff f¯(x0 ) = f (x0 ).
(d) Show that the set Cf ≡ {x : f (·) is continuous at x} is a Gδ
set, i.e., an intersection of a countable number of open sets, and
hence, Cf is a Borel set.
(e) Let D be a closed set in R. Let f : D → R be continuous on D.
Show that there exists a g : R → R continuous such that g = f
on D.
(Hint: Note that Dc is open in R and hence it can be expressed
as a countable union of disjoint open intervals {Ij = (aj , bj ) :
1 ≤ j ≤ k ≤ ∞}. Note that aj , bj ∈ D for all j except for
possibly the j’s for which aj = −∞ or bj = +∞. Let

f (x)
if x ∈ D



(x−aj )

f
(a
)
+
(f
(b
)
−
f
(a
))
if
x ∈ (aj , bj ),

j
j
j
(bj −aj )




aj , bj ∈ D
g(x) ≡
)
if
a
f
(b
j
j = −∞,



x
∈ (aj , bj )




f
(a
)
if
b
= ∞,
j
j


x ∈ (aj , bj ).
Now verify that g has the required properties.)
74
2. Integration
2.9 Prove Proposition 2.2.1 using Problem 2.1.
2.10 (a) Show that for any x ∈ R and any random variable X with cdf
FX (·), P (X < x) = FX (x−) ≡ limy↑x FX (y).
(b) Show that Fc (·) in (2.5) is continuous.
2.11 Let F : R → R be nondecreasing.
(a) Show that for x ∈ R, F (x−) ≡ limy↑x F (y) and F (x+) =
limy↓x F (y) exist and satisfy F (x−) ≤ F (x) ≤ F (x+).
(b) Let D ≡ {x : F (x+) − F (x−) > 0}. Show that D is at most
countable.
(Hint: Show that
D=
* *
Dn,r ,
n≥1 r≥1
where Dn,r = {x : |x| ≤ n, F (x+) − F (x−) > 1r } and that each
Dn,r is finite.)
2.12 Suppose that (i) and (iii) of Proposition 2.2.3 hold. Show that for
any a1 in R and −∞ < a2 ≤ b2 < ∞, F (a2 , a1 ) ≤ F (b2 , a1 ) and
F (a1 , a2 ) ≤ F (a1 , b2 ), (i.e., F is monotone coordinatewise).
2.13 Let F : Rk → R be such that:
(a) for x1 = (x11 , x12 , . . . , x1k ) and x2 = (x21 , x22 , . . . , x2k ) with
x1i ≤ x2i for i = 1, 2, . . . k,
)
∆F (x1 , x2 ) ≡
(−1)s(a) F (a) ≥ 0,
a∈A
where A ≡ {a = (a1 , a2 , . . . , ak ) : ai ∈ {x1i , x2i }, i = 1, 2, . . . , k}
and for a in A, s(a) = |{i : ai = x1i , i = 1, 2, . . . , k}| is the number of indices i for which ai = x1i .
(b) For each i = 1, 2, . . . , k,
lim F (xi ) = 0.
xi1 ↓−∞
Let Ck be the semialgebra of sets of the form {A : A = A1 × . . . × Ak ,
Ai ∈ C for all 1 ≤ i ≤ k} where C is the semialgebra in R defined in
(3.7). Set µF (A) ≡ ∆F (x1 , x2 ) if A = (x11 , x21 ] × (x12 , x22 ] × . . . ×
(x1k , x2k ] is bounded and set µF (A) = limn→∞ µ(A ∩ Jn ), where
Jn = (−n, n]k .
(i) Show that F is coordinatewise monotone, i.e., if x =
(x1 , . . . , xk ), y = (y1 , . . . , yk ) and xi ≤ yi for every i = 1, . . . , k,
then
F (y) ≥ F (x).
2.6 Problems
75
(ii) Show that C is a semialgebra and µF is a measure on C by using
the Heine-Borel theorem as in Problem 1.22 and 1.23.
2.14 Let m(·) denote the Lebesgue measure on (R, B(R)). Let T : R → R
be the map T (x) = x2 . Evaluate the induced measure mT −1 (A) of
the set A, where
(a) A = [0, t], t > 0.
(b) A = (−∞, 0).
(c) A = {1, 2, 3, . . .}.
#∞
(d) A = i=1 (i2 , (i + i12 )2 ).
#∞
(e) A = i=1 (i2 , (i + 1i )2 ).
!
"
2.15 Consider the probability space (0, 1), B((0, 1)), m , where m(·) is the
Lebesgue measure.
(a) Let Y1 be the random variable Y1 (x) ≡ sin 2πx for x ∈ (0, 1).
Find the cdf of Y1 .
(b) Let Y2 be the random variable Y2 (x) ≡ log x for x ∈ (0, 1). Find
the cdf of Y2 .
(c) Let F : R → R be a cdf . For 0 < x < 1, let
F1−1 (x)
=
F2−1 (x)
=
inf{y : y ∈ R, F (y) ≥ x}
sup{y : y ∈ R, F (y) ≤ x}.
Let Zi be the random variable defined by
Zi = Fi−1 (x) 0 < x < 1,
i = 1, 2.
(i) Find the cdf of Zi , i = 1, 2.
(Hint: Verify using the right continuity of F that for any
0 < x < 1, t ∈ R, F (t) ≥ x ⇔ F1−1 (x) ≤ t.)
(ii) Show also that F1−1 (·) is left continuous and F2−1 (·) is right
continuous.
2.16 (a) Let (Ω, F1 , µ) be a σ-finite measure space. Let T : Ω → R be
1F, B(R)2-measurable. Show, by an example, that the induced
measure µT −1 need not be σ-finite.
(b) Let (Ωi , Fi ) be measurable spaces for i = 1, 2 and let T :
Ω1 → Ω2 be 1F1 , F2 2-measurable. Show that any measure µ
on (Ω1 , F1 ) is σ-finite if µT −1 is σ-finite on (Ω2 , F2 ).
76
2. Integration
2.17 Let (Ω, F, µ) be a measure space and let f : Ω → [0, ∞] be such that
it admits two representations
k
)
f=
ci IAi and f =
i=1
,
)
dj IBj ,
j=1
where ci , dj ∈ [0, ∞],and Ai and Bj ∈ F for all i, j. Show that
k
)
ci µ(Ai ) =
i=1
,
)
dj µ(Bj ).
j=1
(Hint: Express Ai and Bj as finite unions of a common collection of
disjoint sets in F.)
2.18 (a) Prove Proposition 2.3.1.
(b) In the proof of Proposition 2.3.2, verify that Dn ↑ D.
(c) Verify Remark 2.3.1.
(Hint: Let
{ω : fn−1 (ω) ≥ fn (ω), gn+1 (ω) ≥ gn (ω)}
',
(,3
ω : lim gn (ω) = g(ω),
A =
An
An
=
n→∞
n≥1
4
lim fn (ω) = f (ω)
n→∞
and f˜n = fn IA , g̃n = gn IA . Verify that µ(Ac ) = 0 and apply
Proposition 2.3.2 to {f˜n }n≥1 and {g̃n }n≥1 .)
2.19 (a) Verify that fn (·) defined in (3.7) satisfies fn (ω) ↑ f (ω) for all ω
in Ω.
(b) Verify that the sequence {hn }n≥1 of
& Remark 2.3.3 satisfies
limn→∞ hn = f a.e. (µ), and limn→∞ hn dµ = M .
2.20 Apply Corollary 2.3.5 to show that for any collection {aij : i, j ∈ N}
of nonnegative numbers,


=∞
>
∞
∞
∞
)
)
)
)

aij  =
aij .
i=1
2.21 Let g : R → R.
j=1
j=1
i=1
(a) Recall that limt→∞ g(t) = L for some L in R if for every # > 0,
there exists a T$ < ∞ such that t ≥ T$ ⇒ |g(t) − L| < #. Show
that limt→∞ g(t) = L for some L in R iff limn→∞ g(tn ) = L for
every sequence {tn }n≥1 with limn→∞ tn = ∞.
2.6 Problems
77
(b) Formulate and prove a similar result when limt→a g(t) = L for
some a, L ∈ R.
2.22 Let {ft : t ∈ R} ⊂ L1 (Ω, F, µ).
(a) (The continuous version of the MCT). Suppose that ft ↑ f as
t ↑ ∞ a.e. (µ) and for each t, ft ≥ 0 a.e. (µ). Show that
%
%
ft dµ ↑ f dµ.
(b) (The continuous version of the DCT). Suppose there exists a
nonnegative g ∈ L1 (Ω, F, µ) such that for each t, |ft | ≤ g a.e.
1
& (µ). Then
& f ∈ L (Ω, F, µ) and
&(µ) and as t → ∞, ft → f a.e.
|ft − f |dµ → 0 and hence, ft dµ → f dµ, as t → ∞.
2.23 Let f : [a, b] → R be bounded where −∞ < a < b < ∞. Let {Pn }n≥1
be a sequence of partitions such that ∆(Pn ) → 0. Show that as n →
∞,
%
%
U (Pn , f ) →
where
&
f
and L(Pn , f ) →
f,
&
f and f are as defined in (4.3) and (4.4), respectively.
(Hint: Given # > 0, fix a partition P = {x0 = a < x1 < . . . < xk = b}
&
such that f < U (P, f ) < f + #. Let δ = min0≤i≤k−1 (xi+1 − xi ).
Choose n large such that the diameter ∆(Pn ) < δ. Verify that
U (Pn , f ) < U (P, f ) + kB∆(Pn )
where B = sup{|f (x)| : a ≤ x ≤ b} and conclude that limn U (Pn , f ) ≤
&
f + #.)
2.24 Establish (4.6) and (4.7).
(Hint: Show that for every
# x and any # > 0, φn (x) ≤ φ(x) + # for all
n large and that for x )∈ n≥1 Pn , φn (x) ≥ φ(x) for all n.)
2.25 If f (x) = IQ1 (x) where Q1 = Q∩[0, 1], Q being the set of all rationals,
then show that for any partition P , U (P, f ) = 1 and L(P, f ) = 0.
2.26 Establish Theorem 2.5.1.
(Hint: Verify that
D
≡
=
{ω : fj (ω) )→ f (ω)}
* , *
Ajr ,
r≥1 n≥1 j≥n
where Ajr = {|fj −f | > 1r }. Show that since µ(D) = 0#
and µ(Ω) < ∞,
µ(Drn ) → 0 as n → ∞ for each r ∈ N, where Drn = j≥n Ajr .)
78
2. Integration
2.27 Let φ : R+ → R+ be nondecreasing and φ(x)
↑ ∞ as x ↑ ∞.
x
1
:
λ
∈
Λ}
be
a
subset
of
L
(Ω,
F,
µ). Show that if
Also, let
{f
λ
&
supλ∈Λ φ(|fλ |)dµ < ∞, then {fλ : λ ∈ Λ} is UI.
2.28 Let µ be the Lebesgue measure on ([−1, 1], B([−1, 1])). For n ≥ 1,
define fn (x) = nI(0,n−1 ) (x) − nI(−n−1 ,0)
& (x) and &f (x) ≡ 0 for x ∈
[−1, 1]. Show that fn → f a.e. (µ) and fn dµ → f dµ but {fn }n≥1
is not UI.
2.29 Let {fn : n ≥ 1} ∪ {f } ⊂ L1 (Ω, F, µ).
&
&
(a) &Show that |fn − f |(dµ) → 0 iff fn −→m f and |fn |dµ →
|f |dµ.
(b) Show further that if µ(Ω) < ∞ then the above two are equivalent
to fn −→m f and {fn } UI.
2.30 For n ≥ 1, let fn (x) = n−1/2 I(0,n) (x), x ∈ R, and let f (x) = 0, x ∈ R.
Let m denote the Lebesgue measure
Show that fn → f
& on (R, B(R)).
&
a.e. (m) and {fn }n≥1 is UI, but fn dm )→ f dm.
2.31 (Computing integrals w.r.t. the Lebesgue measure). Let f ∈
L1 (R, Mm , m) where (R, Mm , m) is the real line with Lebesgue σalgebra, and&Lebesgue& measure, &i.e., m = µ∗F where F (x) ≡& x. The
+
−
+
definition
& − of f dm as f dm− f dm involves computing f dm
and f dm which in turn is given in terms of approximating by integrals of simple nonnegative functions. This is not a very practical
procedure. For f that happens to be continuous a.e. and bounded on
finite intervals, one can compute the Riemann integral of f over finite
intervals and pass to the limit. Justify the following steps:
(a) Let f be continuous a.e. and bounded on finite intervals and
f ∈ L1 (R, Mm , m). Show that for −∞ < a < b < ∞, f ∈
L1 ([a, b], Mm , m) and
%
I
f dm =
f (x)dx,
[a,b]
[a,b]
where the right side denotes the Riemann integral of f on [a, b].
(b) If, in addition, f ∈ L1 (R, Mm , m), then
%
%
f dm = a→−∞
lim
f dm.
R
b→+∞
[a,b]
(c) If f is continuous a.e. and ∈ L1 (R, Mm , m), then
%
%
f dm = lim
φc (f )dm
R
a→−∞
b→+∞
c→∞
[a,b]
where φc (f ) = f (x)I(|f (x)| ≤ c)+cI(f (x) > c)−cI(f (x) < −c).
2.6 Problems
(d) Apply the above procedure to compute
(i) f (x) =
1
1+x2 ,
−x2 /2
(ii) f (x) = e
R
f dm for
,
+x −x2 /2
(iii) f (x) = e
&
79
e
.
2.32 (a) Let a ∈ R. Show that if a1 , a2 are nonnegative such that a =
a1 − a2 then a1 ≥ a+ , a2 ≥ a− and a1 − a+ = a2 − a− .
(b) Let f = f1 − f2 where f1 , f2 are nonnegative
and
&
& are in
1
1
(Ω,
F,
µ).
Show
that
f
∈
L
(Ω,
F,
µ)
and
f
dµ
=
f1 dµ −
L
&
f2 dµ.
2.33 Let (Ω, F, µ) be a measure space. Let f : Ω × (a, b) → R be such that
for each a < t < b, f (·, t) ∈ L1 (Ω, F, µ).
(a) Suppose for each a < t < b,
(i) lim f (ω, t + h) = f (ω, t) a.e. (µ).
h→0
(ii) sup |f (ω, t + h)| ≤ g1 (ω, t), where g1 (·, t) ∈ L1 (Ω, F, µ).
|h|≤1
Show that φ(t) ≡
&
Ω
f (ω, t)dµ is continuous on (a, b).
(b) Suppose for each a < t < b.
f (ω, t + h) − f (ω, t)
= g2 (ω, t) exists a.e. (µ),
h
J f (ω, t + h) − f (ω, t) J
J
J
(ii) sup J
J ≤ G(ω, t) a.e. (µ),
h
0≤|h|≤1
(i) lim
h→0
(iii) G(ω, t) ∈ L1 (Ω, F, µ).
Show that
φ(t) ≡
is differentiable on (a, b).
%
Ω
f (ω, t)dµ
(Hint: Use the continuous version of DCT (cf. Problem 2.22).)
"
!
2.34 Let A ≡ (aij ) be an infinite matrix of real numbers. Suppose that
for each j, limi→∞ aij = aj exists in R and supi |aij | ≤ bj , where
$
∞
j=1 bj < ∞.
(a) Show by an application of the DCT that
lim
i→∞
∞
)
j=1
|aij − aj | = 0.
(b) Show the same directly, i.e., without using the DCT.
80
2. Integration
2.35 Using the above problem or otherwise, show that for any sequence
{xn }n≥1 with limn→∞ xn = x in R,
lim
n→∞
'
1+
xn (n ) xj
≡ exp(x).
=
n
j!
j=0
∞
2.36 (a) Let {fn }n≥1 ⊂ L1 (Ω, F, µ) such that fn → 0 in L1 (µ). Show
that {fn }n≥1 is UI.
(b) Let {fn }n≥1 ⊂ Lp (Ω, F, µ), 0 < p < ∞, such that µ(Ω) < ∞,
{|fn |p }n≥1 is UI and fn −→m f . Show that f ∈ Lp (µ) and
fn → f in Lp (µ).
2.37 (a) Show that for a sequence of real numbers {xn }n≥1 , limn→∞ xn =
x ∈ R̄ holds iff every subsequence {xnj }j≥1 of {xn }n≥1 has a
further subsequence {xnjk }k≥1 such that limk→∞ xnjk = x.
(b) Use (a) and Theorem 2.5.2 to show that the extended DCT
(Theorem 2.3.11) is valid if the a.e. convergence of {fn }n≥1 and
{gn }n≥1 is replaced by convergence in measure.
2.38 Let (R, Mµ∗F , µ∗F ) be a Lebesgue-Stieltjes measure space generated
by a F : R → R nondecreasing. Let f : R → R̄ be Mµ∗F -measurable
such that |f | < ∞ a.e. (µ∗F ). Show that for every k ∈ N and for every
#, η ∈ (0, ∞), there exists a continuous function g : R → R such that
g(x) = 0 for |x| > k and µ∗F ({x : |x| ≤ k, |f (x) − g(x)| > η} < #.
(Hint: Complete the following:
Step I: For all # > 0, there exists Mk,$ ∈ (0, ∞) such that µ∗F ({x :
|x| ≤ k, |f (x)| > Mk,$ }) < #.
Step II: For η > 0, there exists a simple function s(·) such that
µ∗F ({x : |x| ≤ k, |f (x)| ≤ Mk,$ , |f (x) − s(x)| > η}) = 0.
Step III: For δ > 0, there exists a continuous function g(·) such
that g ≡ 0 for |x| > k and µ∗F ({x : |x| ≤ k, s(x) )= g(x)}) < δ.)
1
2.39 Recall from
& Corollary 2.3.6 that for g ∈ L (Ω, F, µ) and nonnegative
function h,
νg (A) ≡ A gdµ is a measure. Next for any &F-measurable
&
show that h ∈ L1 (νg ) iff h · g ∈ L1 (µ) and hdνg = hgdµ.
(Hint: Verify first for h simple and nonnegative, next for h nonnegative, and finally for any h.)
2.40 Prove the BCT, Corollary 2.3.13, using Egorov’s theorem (Theorem
2.5.11).
2.41 Deduce the DCT from the BCT with the notation as in Corollary
2.3.12.
(Hint: Apply the BCT to the measure space (Ω, F, νg ) and functions
hn = fgn I(g > 0), h = fg I(g > 0) where νg is as in Problem 2.39.)
2.6 Problems
81
2.42 (Change of variables formula). Let (Ωi , Fi ), i = 1, 2 be two measurable spaces. Let f : Ω1 → Ω2 be 1F1 , F2 2-measurable, h : Ω2 → R
be 1F2 , B(R)2-measurable, and µ1 be a measure on (Ω1 , F1 ). Show
that g ≡ h ◦ f , i.e., g(ω) ≡ h(f (ω)) for ω ∈ Ω!1 is in L1 (µ) iff h(·) ∈"
L1 (Ω2 , F2 , µ2 ) where µ2 = µ1 f −1 iff I(·) ∈ L1 R, B(R), µ3 ≡ µ2 h−1
where I(·) is the identity function in R, i.e., I(x) ≡ x for all x ∈ R
and also that
%
%
%
Ω1
gdµ1 =
Ω2
hdµ2 =
R
xdµ3 .
2
2.43 Let φ(x) ≡ √12π e−x /2 be the standard N (0, 1) pdf on R. Let
{(µn , σn )}n≥1 be a sequence in R × R+
µn → µ
and"
"
! . Suppose
! x−µ
1
n
,
f
(x)
=
φ
σn → σ as n → ∞. Let fn (x) = σ1n φ x−µ
σn
σ
σ
&
&
and νn (A) = A fn dm, ν(A) = A f dm for any Borel set A in R,
where m(·) is the Lebesgue measure on R. Using Scheffe’s theorem,
verify that, as n → ∞, νn (·) → ν(·) uniformly on B(R) and that for
any h : R → R, bounded and Borel measurable,
%
%
hdνn → hdν.
!
"n
2.44 Let fn (x) = cn 1 − nx I[0,n] (x), x ∈ R, n ≥ 1.
(a) Find cn such that
&
fn dm = 1.
(b) Show that limn→∞ fn (x) ≡ f (x) exists for all x in R and that f
is a pdf
(c) For A ∈ B(R), let
νn (A) ≡
%
fn dm
and ν(A) =
A
%
f dm.
A
Show that νn → ν uniformly on B(R).
2.45 Let (Ω, F, µ) be
& a measure space and f : Ω → R be F-measurable.
Suppose that Ω |f |dµ < ∞ and D is a closed set in R such that for
all B ∈ F with µ(B) > 0,
1
µ(B)
Show that f (ω) ∈ D a.e. µ.
%
B
f dµ ∈ D.
(Hint: Show that for x )∈ D, there exists r > 0 such that µ{ω :
|f (ω) − x| < r} = 0.)
82
2. Integration
2.46 Find a sequence
of nonnegative continuous functions {fn }n≥1 on [0,1]
&
such that [0,1] fn dm → 0 but {fn (x)}n≥1 does not converge for any
x.
(Hint: Let for m ≥ 1, 1 ≤ k ≤ m, gm,k = IB k−1
a reordering of {gn,k : 1 ≤ k ≤ n, n ≥ 1}.)
m
k
,m
C and {fn }n≥1 be
2.47 Let {fn }n≥1 be a sequence of continuous functions from [0,1] to [0,1]
such that fn (x) → 0 as n → ∞ for all 0 ≤ x ≤ 1. Show that
&1
f (x)dx → 0 as n → ∞ (where the integral is the Riemann in0 n
tegral) by two methods: one by using BCT and one without using
&BCT. Show also that if µ is a finite measure on ([0, 1], B([0, 1])), then
f dµ → 0 as n → ∞.
[0,1] n
2.48 (Invariance of Lebesgue measure under translation and reflection.)
Let m(·) be Lebesgue measure on (R, B(R)).
(a) For any E ∈ B(R) and c ∈ R, define −E ≡ {x : −x ∈ E} and
E + c ≡ {y : y = x + c, x ∈ E}. Show that
m(−E) = m(E) and
m(E + c) = m(E)
for all E ∈ B(R) and c ∈ R.
(b) For any f ∈ L1 (R, B(R), m) and c ∈ R, let f˜(x) ≡ f (−x) and
fc (x) ≡ f (x + c). Show that
%
%
%
˜
f dm = fc dm = f dm.
3
Lp-Spaces
3.1 Inequalities
This section contains a number of useful inequalities.
Theorem 3.1.1: (Markov’s inequality). Let f be a nonnegative measurable
function on a measure space (Ω, F, µ). Then for any 0 < t < ∞,
&
f dµ
.
(1.1)
µ({f ≥ t}) ≤
t
Proof: Since f is nonnegative,
&
f dµ ≥
&
({f ≥t})
f dµ ≥ tµ({f ≥ t}).
!
Corollary 3.1.2: Let X be a random variable on a probability space
(Ω, F, P ). Then, for r > 0, t > 0,
P (|X| ≥ t) ≤
E|X|r
.
tr
Proof: Since {|X| ≥ t} = {|X|r ≥ tr } for all t > 0, r > 0, this follows
from (1.1).
!
Corollary 3.1.3: (Chebychev’s inequality). Let X be a random variable
with EX 2 < ∞, E(X) = µ and Var(X) = σ 2 . Then for any 0 < k < ∞,
P (|X − µ| ≥ kσ) ≤
1
.
k2
84
3. Lp -Spaces
Proof: Follows from Corollary 3.1.2 with X replaced by X − µ and with
r = 2, t = kσ.
!
Corollary 3.1.4: Let φ : R+ → R+ be nondecreasing. Then for any random
variable X and 0 < t < ∞,
P (|X| ≥ t) ≤
Eφ(|X|)
.
φ(t)
Proof: Use (1.1) and the fact that |X| ≥ t ⇒ φ(|X|) ≥ φ(t).
!
Corollary 3.1.5: (Cramer’s inequality). For any random variable X and
t > 0,
E(eθX )
.
P (X ≥ t) ≤ inf
θ>0
eθt
"
!
Proof: For t > 0, θ > 0, P (X ≥ t) = P eθX ≥ eθt ≤
E(eθX )
,
eθt
by (1.1). !
Definition 3.1.1: A function φ : (a, b) → R is called convex if for all
0 ≤ λ ≤ 1, a < x ≤ y < b,
φ(λx + (1 − λ)y) ≤ λφ(x) + (1 − λ)φ(y).
(1.2)
Geometrically, this means that for the graph of y = φ(x) on (a, b), for
each fixed t ∈ (0, ∞), the chord over the interval (x, x + t) turns in the
counterclockwise direction as x increases.
More precisely, the following result holds.
Proposition 3.1.6: Let φ : (a, b) → R. Then φ is convex iff for all
a < x1 < x2 < x3 < b,
φ(x3 ) − φ(x2 )
φ(x2 ) − φ(x1 )
≤
,
x2 − x1
x3 − x2
(1.3)
φ(x3 ) − φ(x2 )
φ(x3 ) − φ(x1 )
φ(x2 ) − φ(x1 )
≤
≤
.
x2 − x1
(x3 − x1 )
x3 − x2
(1.4)
which is equivalent to
Proof: Let φ be convex and a < x1 < x2 < x3 < b. Then one can write
3 −x2 )
x2 = λx1 + (1 − λ)x3 with λ = (x
(x3 −x1 ) . So by (1.2),
φ(x2 )
= φ(λx1 + (1 − λ)x3 )
≤ λφ(x1 ) + (1 − λ)φ(x3 )
(x3 − x2 )
(x2 − x1 )
=
φ(x1 ) +
φ(x3 )
(x3 − x1 )
(x3 − x1 )
3.1 Inequalities
85
which is equivalent to (1.3). Also, since
φ(x3 ) − φ(x2 )
φ(x2 ) − φ(x1 )
φ(x3 ) − φ(x1 )
=λ
+ (1 − λ)
,
(x3 − x1 )
(x3 − x2 )
(x2 − x1 )
(1.4) follows from (1.3).
Conversely, suppose (1.4) holds for all a < x1 < x2 < x3 < b. Then given
a < x < y < b and 0 < λ < 1, set x1 = x, x2 = λx + (1 − λ)y, x3 = y and
apply (1.4) to verify (1.2).
!
The following properties of a convex function are direct consequences of
(1.3). The proof is left as an exercise (Problem 3.1).
Proposition 3.1.7: Let φ : (a, b) → R be convex. Then,
(i) For each x ∈ (a, b),
φ$+ (x) ≡ lim
y↓x
φ(y) − φ(x)
,
(y − x)
φ$− (x) ≡ lim
y↑x
φ(y) − φ(x)
(y − x)
exist and are finite.
(ii) Further, φ$− (·) ≤ φ$+ (·) and both are nondecreasing on (a, b).
(iii) φ$ (·) exists except on the countable set of discontinuity points of φ$+
and φ$− .
(iv) For any a < c < d < b, φ is Lipschitz on [c, d], i.e., there exists a
constant K < ∞ such that
|φ(x) − φ(y)| ≤ K|x − y|
for all
c ≤ x, y ≤ d.
(v) For any a < c, x < b,
φ(x) − φ(c) ≥ φ$+ (c)(x − c)
and φ(x) − φ(c) ≥ φ$− (c)(x − c). (1.5)
By the mean value theorem, a sufficient condition for (1.3) and hence, for
the convexity of φ is that φ be differentiable on (a, b) and φ$ be nondecreasing. A further sufficient condition for this is that φ be twice differentiable
on (a, b) and φ$$ be nonnegative. This is stated as
Proposition 3.1.8: Let φ be twice differentiable on (a, b) and φ$$ be nonnegative on (a, b). Then φ is convex on (a, b).
Example 3.1.1: The following functions are convex in the given intervals:
(a) φ(x) = |x|p , p ≥ 1,
(b) φ(x) = ex ,
(−∞, ∞).
(−∞, ∞).
86
3. Lp -Spaces
(c) φ(x) = − log x,
(0, ∞).
(d) φ(x) = x log x,
(0, ∞).
Remark 3.1.1: By Proposition 3.1.7 (iii), the convexity of φ implies that
φ$ exists except on a countable set. For example, the function φ(x) = |x|
is convex on R; it is differentiable at all x )= 0. Similarly, it is easy to
construct a piecewise linear convex function φ with a countable number of
points where φ is not differentiable.
The following is an important inequality for convex functions.
Theorem 3.1.9: (Jensen’s inequality). Let f be a measurable function
on a probability space (Ω, F, P ) with P (f ∈ (a, b)) = 1 for some interval
(a, b), −∞ ≤ a < b ≤ ∞ and let φ : (a, b) → R be convex. Then
'%
( %
φ
f dP ≤ φ(f )dP,
(1.6)
provided
&
|f |dP < ∞ and
&
|φ(f )|dP < ∞.
Remark 3.1.2: In terms of random variables, this says that for any random variable X on a probability space (Ω, F, P ) with P (X ∈ (a, b)) = 1
and for any function φ that is convex on (a, b),
(1.7)
φ(EX) ≤ Eφ(X),
provided E|X| < ∞ and
& E|φ(X)| < ∞, where for any Borel measurable
function h, Eh(X) ≡ h(X)dP .
&
Proof of Theorem 3.1.9: Let c = f dP . Applying (1.5), one gets
!
(1.8)
Y (ω) ≡ φ(f (ω)) − φ(c) − φ+ (c)(f (ω) − c) ≥ 0 a.e. (P ),
&
&
which, when integrated, yields Y (ω)P (dω) ≥ 0. Since (f (ω) − c)
P (dω) = 0, (1.6) follows.
!
Remark
3.1.3: Suppose that equality holds in (1.6). Then, it follows that
&
Y (ω)P (dω) = 0. By (1.8), this implies
φ(f (ω)) − φ(c) = φ$+ (c)(f (ω) − c)
a.e. (P ).
Thus, if φ is a strictly convex function (i.e., strict inequality holds in (1.2)
for all x, y ∈ (a, b) and 0 < λ < 1), then equality holds in (1.6) iff f (ω) = c
a.e. (P ).
The following are easy consequences of Jensen’s inequality (Problem 3.3).
Proposition 3.1.10: Let k ≥ 1 be an integer.
3.1 Inequalities
87
(i) Let a1 , a2 , , . . . , ak be real and p1 , p2 , , . . . , pk be positive numbers such
$k
that i=1 pi = 1. Then
k
)
i=1
pi exp(ai ) ≥ exp
-)
k
i=1
.
pi ai .
(1.9)
(ii) Let b1 , b2 , . . . , bk be nonnegative numbers and p1 , p2 , . . . , pk be as in
(i). Then
k
k
)
K
pi bi ≥
bpi i ,
(1.10)
i=1
and in particular,
i=1
.k
-K
k
k
1)
bi ≥
bi ,
k i=1
i=1
1
(1.11)
i.e., the arithmetic mean of b1 , . . . , bk is greater than or equal to the
geometric mean of the bi ’s. Further, equality holds in (1.10) iff b1 =
b2 = . . . = bk .
(iii) For any a, b real and 1 ≤ p < ∞,
|a + b|p ≤ 2p−1 (|a|p + |b|p ).
(1.12)
Inequality (1.10) is useful in establishing the following:
Theorem 3.1.11: (Hölder’s inequality). Let (Ω, F, µ) be a measure space.
p
.
Let 1 < p < ∞, f ∈ Lp (Ω, F, µ) and g ∈ Lq (Ω, F, µ) where q = (p−1)
Then
-%
. p1 - %
. q1
%
p
q
|f g|dµ ≤
|f | dµ
(1.13)
|g| dµ ,
i.e.,
7f g71
≤
7f 7p 7g7q .
If 7f g71 )= 0, then equality holds in (1.13) iff |f |p = c|g|q a.e. (µ) for some
constant c ∈ (0, ∞).
&
&
Proof: W.l.o.g. assume that |f |p dµ > 0 and |g|q dµ > 0. Fix&ω ∈ Ω. Let
p1 = p1 , p2 = 1q , b1 = c1 |f (ω)|p , and b2 = c2 |g(ω)|q , where c1 = ( |f |p dµ)−1
&
and c2 = ( |g|q dµ)−1 . Then applying (1.10) with k = 2 yields
1 1
c2
c1
|f (ω)|p + |g(ω)|q ≥ c1p c2q |f (ω)g(ω)|.
p
q
Integrating w.r.t. µ yields
1
1
1 ≥ c1p c2q
%
|f (ω)g(ω)|dµ(ω)
(1.14)
88
3. Lp -Spaces
which is equivalent to (1.13).
Next, equality in (1.13) implies equality in (1.14) a.e. (µ). Since 1 < p <
∞, by the last part of Proposition 3.1.10 (ii), this implies that b1 = b2 a.e.
q
!
(µ), i.e., |f (ω)|p = c2 c−1
1 |g(ω)| a.e. (µ).
Remark 3.1.4: (Hölder’s inequality for p = 1, q = ∞). Let f ∈
L1 (Ω, F, µ) and g ∈ L∞ (Ω, F, µ). Then |f g| ≤ |f | 7g7∞ a.e. (µ) and hence,
%
7f g71 ≡ |f g|dµ ≤ 7f 71 7g7∞ .
If equality holds in the above inequality, then |f |(7g7∞ − |g|) = 0 a.e. (µ)
and hence, |g| = 7g7∞ on the set {|f | > 0} a.e. (µ).
The next two corollaries follow directly from Theorem 3.1.11. The proof
is left as an exercise (Problem 3.9).
Corollary 3.1.12: (Cauchy-Schwarz inequality). Let f , g ∈ L2 (Ω, F, µ).
Then
-%
. 12 -%
. 12
%
2
2
|f g|dµ ≤
|f | dµ
|g| dµ
,
(1.15)
i.e.,
7f g71
≤
7f 72 7g72 .
Corollary 3.1.13: Let k ∈ N. Let a1 , a2 , . . . , ak , b1 , b2 , . . . , bk be real
numbers and c1 , c2 , . . . , ck be positive real numbers.
(i) Then, for any 1 < p < ∞,
= k
> p1 = k
> q1
k
)
)
)
|ai bi |ci ≤
|ai |p ci
|bi |q ci
,
i=1
where q =
i=1
(1.16)
i=1
p−1
p .
(ii)
k
)
i=1
|ai bi |ci ≤
= k
)
i=1
2
|ai | ci
> 12 = k
)
i=1
2
|bi | ci
> 12
.
(1.17)
Next, as an application of Hölder’s inequality, one gets
Theorem 3.1.14: (Minkowski’s inequality). Let 1 < p < ∞ and f, g ∈
Lp (Ω, F, µ). Then
-%
. p1
-%
. p1 - %
. p1
p
p
p
|f + g| dµ
≤
|f | dµ
+
|g| dµ ,
i.e.,
7f + g7p
≤
7f 7p + 7g7q .
(1.18)
3.2 Lp -Spaces
89
Proof: Let h1 = |f + g|, h2 = |f + g|p−1 . Then by (1.12),
|f + g|p ≤ 2p−1 (|f |p + |g|p ),
implying that h1 ∈ Lp (Ω, F, µ) and h2 ∈ Lq (Ω, F, µ), where q =
Since |f + g|p = h1 h2 ≤ |f |h2 + |g|h2 , by Hölder’s inequality,
%
But
|f +g|p dµ ≤
&
hq2 =
&
-%
|f |p dµ
. p1 -%
hq2
. q1
+
-%
|g|p dµ
. p1 -%
|f + g|p and so (1.19) yields (1.18).
hq2
. q1
p
(p−1) .
. (1.19)
!
Remark 3.1.5: Inequality (1.18) holds for both p = 1 and p = ∞.
3.2 Lp -Spaces
3.2.1
Basic properties
Let (Ω, F, µ) be a measure space. Recall the definition of Lp (Ω, F, µ), 0 <
p ≤ ∞, from Section 2.5 as the set of all measurable functions f on (Ω, F, µ)
such that 7f 7p < ∞, where for 0 < p < ∞,
'%
(min{ p1 ,1}
|f |p dµ
7f 7p =
and for p = ∞,
7f 7∞ ≡ sup{k : µ({|f | > k}) > 0}
(called the essential supremum of f ). In this section and elsewhere, Lp (µ)
denotes Lp (Ω, F, µ). The following proposition shows that Lp (µ) is a vector
space over R.
Proposition 3.2.1: For 0 < p ≤ ∞,
f, g ∈ Lp (µ), a, b ∈ R
⇒
af + bg ∈ Lp (µ).
(2.1)
Proof:
Case 1: 0 < p ≤ 1. For any two positive numbers x, y,
.p .p
y
y
x
x
+
= 1.
+
≥
x+y
x+y
x+y x+y
Hence, for all x, y ∈ (0, ∞)
(x + y)p ≤ xp + y p .
(2.2)
90
3. Lp -Spaces
It is easy to check that (2.2) continues to hold if x, y ∈ [0, ∞). This yields
|af + bg|p ≤ |a|p |f |p + |b|p |g|p , which, in turn, yields (2.1) by integration.
Case 2: 1 < p < ∞. By (1.12),
|af + bg|p ≤ 2p−1 (|af |p + |bg|p ).
Integrating both sides of the above inequality yields (2.1).
Case 3: p = ∞. By definition, there exist constants K1 < ∞ and K2 < ∞
such that
µ({|f | > K1 }) = 0 = µ({|g| > K2 }).
This implies that µ({|af + bg| > K}) = 0 for any K > |a|K1 + |b|K2 .
Hence, af + bg ∈ L∞ (µ) and
7af + bg7∞ ≤ |a| 7f 7∞ + |b| 7g7∞ .
!
Recall that a set S with a function d : S × S → [0, ∞] is called a metric
space if for all x, y, z ∈ S,
(i) d(x, y) = d(y, x)
(symmetry)
(ii) d(x, y) ≤ d(x, z) + d(y, z)
(triangle inequality)
(iii) d(x, y) = 0 iff x = y
and the function d(·, ·) is called a metric on S. Some examples are
(a) S = Rk with d(x, y) =
'$
k
2
i=1 |xi − yi |
( 12
;
(b) S = C[0, 1], the space of all continuous functions on [0, 1] with
d(f, g) = sup{|f (x) − g(x)| : 0 ≤ x ≤ 1};
(c) S = a nonempty set, and d(x, y) = 1 if x )= y and 0 if x = y. (This
d(·, ·) is called the discrete metric on S.)
The Lp -norm 7 · 7p can be used to introduce a distance notion in Lp (µ)
for 0 < p ≤ ∞.
Definition 3.2.1: For f , g ∈ Lp (µ), 0 < p ≤ ∞, let
dp (f, g) ≡ 7f − g7p .
Note that, for any f , g, h ∈ Lp (µ) and 1 ≤ p ≤ ∞,
(i) dp (f, g) = dp (g, f ) ≥ 0
(nonnegativity and symmetry), and
(ii) dp (f, h) ≤ dp (f, g) + dp (g, h)
(triangle inequality),
(2.3)
3.2 Lp -Spaces
91
which follows by Minkowski’s inequality (Theorem 3.1.14) for 1 ≤ p < ∞,
and by Proposition 3.2.1 for p = ∞. However, dp (f, g) = 0 implies only
that f = g a.e. (µ). Thus, dp (·, ·) of (2.3) satisfies conditions (i) and (ii)
of being a metric and it satisfies condition (iii) as well, provided any two
functions f and g that agree a.e. (µ) are regarded as the same element of
Lp (µ). This leads to the following:
Definition 3.2.2: For f , g in Lp (µ), f is called equivalent to g and is
written as f ∼ g, if f = g a.e. (µ).
It is easy to verify that the relation ∼ of Definition 3.2.2 is an equivalence
relation, i.e., it satisfies
(i) f ∼ f for all f in Lp (µ)
(ii) f ∼ g ⇒ g ∼ f
(reflexive)
(symmetry)
(iii) f ∼ g, g ∼ h ⇒ f ∼ h
(transitive).
This equivalence relation ∼ divides Lp (µ) into disjoint equivalence classes
such that in each class all elements are equivalent. The notion of distance
between these classes may be defined as follows:
dp ([f ], [g]) ≡ dp (f, g)
where [f ] and [g] denote, respectively, the equivalence classes of functions
containing f and g. It can be verified that this is a metric on the set of
equivalence classes. In what follows, the equivalence class [f ] is identified
with the element f . With this identification, (Lp (µ), dp (·, ·)) becomes a
metric space for 1 ≤ p ≤ ∞.
Remark 3.2.1: For 0 < p < 1, if one defines
%
dp (f, g) ≡ |f − g|p dµ,
(2.4)
then (Lp (µ), dp ) becomes a metric space (with the same identification as
above of functions with their equivalence classes). The triangle inequality
follows from (2.2).
Recall that a metric space (S, d) is called complete if every Cauchy sequence in (S, d) converges to an element in S, i.e., if {xn }n≥1 is a sequence in S such that for every # > 0, there exists a N$ such that
n, m ≥ N$ ⇒ d(xn , xm ) ≤ #, then there exists an element x in (S, d) such
that limn→∞ d(xn , x) = 0. The next step is to establish the completeness
of Lp (µ).
Theorem 3.2.2: For 0 < p ≤ ∞, (Lp (µ), dp (·, ·)) is complete, where dp
is as in (2.3).
92
3. Lp -Spaces
Proof: Let {fn }n≥1 be a Cauchy sequence in Lp (µ) for 0 < p < ∞. The
main steps in the proof are as follows:
(I) there exists a subsequence {nk }k≥1 such that {fnk }k≥1 converges a.e.
(µ) to a limit function f ;
(II) lim dp (fnk , f ) = 0;
k→∞
(III) lim dp (fn , f ) = 0.
n→∞
Step (I): Let {#k }k≥1 and {δk }k≥1 be sequences of positive numbers
decreasing to zero. Since {fn }n≥1 is Cauchy, for each k ≥ 1, there exists
an integer nk such that
%
(2.5)
|fn − fm |p dµ ≤ #k for all n, m ≥ nk .
W.l.o.g., let nk+1 > nk for each k ≥ 1. Then, by Markov’s inequality
(Theorem 3.1.1),
%
−p
µ({|fnk+1 − fnk | ≥ δk }) ≤ δk
|fnk+1 − fnk |p dµ ≤ δk−p #k .
(2.6)
Let A#
k = {|fnk+1 − fnk | ≥ δk }, k = 1, 2, . . . and A = lim supk→∞ Ak ≡
+
∞
k≥j Ak . If {#k }k≥1 and {δk }k≥1 satisfy
j=1
∞
)
k=1
δk−p #k < ∞,
(2.7)
$∞
then by (2.6), k=1 µ(Ak ) < ∞ and hence, as in the proof of Theorem
2.5.2, µ(A) = 0.
that for ω in Ac , |fnk+1 (ω) − fnk (ω)| < δk for all k large. Thus, if
$Note
∞
c
k=1 δk < ∞, then for ω in A , {fnk (ω)}k≥1 is a Cauchy sequence in R
and hence, it converges to some f (ω) in R. Setting f (ω) = 0 for ω ∈ A,
one gets
lim fnk = f a.e. (µ).
k→∞
$∞
A choice of {#k }k≥1 and {δk }k≥1 such that k=1 δk < ∞ and (2.7) holds
is given by #k = 2−(p+1)k and δk = 2−k . This completes Step (I).
Step (II): By Fatou’s lemma, part (I), and (2.5), for any k ≥ 1 fixed,
%
%
#k ≥ lim inf |fnk − fnk+j |p dµ ≥ |fnk − f |p dµ .
j→∞
Since fnk ∈ Lp (µ), this shows that f ∈ Lp (µ). Now, on letting k → ∞, (II)
follows.
3.2 Lp -Spaces
93
Step (III): By triangle inequality, for any k ≥ 1 fixed,
dp (fn , f ) ≤ dp (fn , fnk ) + dp (fnk , f ).
By (2.5) and (II), for n ≥ nk , the right side above is ≤ 2#̃k , where #̃k = #k
1/p
if 0 < p < 1 and #̃k = #k if 1 ≤ p < ∞. Now letting k → ∞, (III) follows.
The proof of Theorem 3.2.2 is complete for 0 < p < ∞. The case p = ∞
is left as an exercise (Problem 3.14).
!
3.2.2
Dual spaces
Let 1 ≤ p < ∞. Let g ∈ Lq (µ), where q =
if p = 1. Let
%
Tg (f ) =
By Hölders inequality,
Tg is linear, i.e.,
&
f gdµ,
p
(p−1)
if 1 < p < ∞ and q = ∞
f ∈ Lp (µ).
(2.8)
|f g|dµ < ∞ and so Tg (·) is well defined. Clearly
Tg (α1 f1 + α2 f2 ) = α1 Tg (f1 ) + α2 Tg (f2 )
(2.9)
for all α1 , α2 ∈ R and f1 , f2 ∈ Lp (µ).
Definition 3.2.3:
(a) A function T : Lp (µ) → R that satisfies (2.9) is called a linear functional.
(b) A linear functional T on Lp (µ) is called bounded if there is a constant
c ∈ (0, ∞) such that
|T (f )| ≤ c7f 7p
for all f ∈ Lp (µ).
(c) The norm of a bounded linear functional T on Lp (µ) is defined as
:
;
7T 7 = sup |T f | : f ∈ Lp (µ), 7f 7p = 1 .
By Hölder’s inequality (cf. Theorem 3.1.11 and Remark 3.1.4),
|Tg (f )| ≤ 7g7q 7f 7p
for all f ∈ Lp (µ),
and hence, Tg is a bounded linear functional on Lp (µ). This implies that if
dp (fn , f ) → 0, then
|Tg (fn ) − Tg (f )| ≤ 7g7q dp (fn , f ) → 0,
i.e., Tg is continuous on the metric space (Lp (µ), dp ).
94
3. Lp -Spaces
Definition 3.2.4: The set of all continuous linear functionals on Lp (µ) is
called the dual space of Lp (µ) and is denoted by (Lp (µ))∗ .
In the next section, it will be shown that continuity of a linear functional
on Lp (µ) implies boundedness. A natural question is whether every continuous linear functional T on Lp (µ) coincides with Tg for some g in Lq (µ).
The answer is “yes” for 1 ≤ p < ∞, as shown by the following result.
Theorem 3.2.3: (Riesz representation theorem). Let 1 ≤ p < ∞. Let
T : Lp (µ) → R be linear and continuous. Then, there exists a g in Lq (µ)
such that T = Tg , i.e.,
%
T (f ) = Tg (f ) ≡ f gdµ for all f ∈ Lp (µ),
(2.10)
where q =
p
p−1
for 1 < p < ∞ and q = ∞ if p = 1.
Remark 3.2.2: Such a representation is not valid for p = ∞. That is,
there exists continuous linear &functionals T on L∞ (µ) for which there is no
g ∈ L1 (µ) such that T (f ) = f gdµ for all f ∈ L∞ (µ), provided µ is not
concentrated on a finite set {ω1 , ω2 , . . . , ωk } ⊂ Ω.
For a proof of Theorem 3.2.3 and the above remark, see Royden (1988)
or Rudin (1987).
Next consider the mapping from (Lp (µ))∗ and Lq (µ) defined by
φ(Tg ) = g,
where Tg is as defined in (2.10). Then, φ is linear, i.e.,
φ(α1 T1 + α2 T2 ) = α1 φ(T1 ) + α2 φ(T2 )
for all α1 , α2 ∈ R and T1 , T2 ∈ (Lp (µ))∗ . Further,
7φ(T )7q = 7T 7
for all T ∈ (Lp (µ))∗ .
Thus, φ preserves the vector space structure of (Lp (µ))∗ and the norm. For
this reason, it is called an isometry between (Lp (µ))∗ and Lq (µ).
3.3 Banach and Hilbert spaces
3.3.1
Banach spaces
If (Ω, F, µ) is a measure space it was seen in the previous section
that the
&
space Lp (Ω, F, µ) of equivalence classes of functions f with Ω |f |p dµ < ∞
is& a vector space over R for all 1 ≤ p < ∞ and for p ≥ 1, 7 · 7p ≡
( |f |p dµ)1/p satisfies
3.3 Banach and Hilbert spaces
95
(i) 7f + g7 ≤ 7f 7 + 7g7,
(ii) 7αf 7 = |a| 7f 7 for every α ∈ R,
(iii) 7f 7 = 0 iff f = 0 a.e. (µ).
The Euclidean spaces Rk for any
$kk ∈ N is also a vector space. Note
that for p ≥ 1, setting 7x7p ≡ ( i=1 |xi |p )1/p if x = (x1 , x2 , . . . , xk ),
(Rk , 7x7p ) may be identified with a special case of Lp (Ω, F, µ), where Ω ≡
{1, 2, . . . , k}, F = P(Ω) and µ is the counting measure.
Generalizing the above examples leads to the notion of a normed vector
space (also called normed linear space). Recall that a vector space V over
R is a nonempty set with a binary operation +, a function from V × V to
V (called addition), and scalar
by the real numbers, i.e., a
! multiplication
"
function from R × V → V , (α, v) → αv satisfying
(i) v1 , v2 ∈ V ⇒ v1 + v2 = v2 + v1 ∈ V .
(ii) v1 , v2 , v3 ∈ V ⇒ (v1 + v2 ) + v3 = v1 + (v2 + v3 ).
(iii) There exists an element θ, called the zero vector, in V such that
v + θ = v for all v in V .
(iv) α ∈ R, v ∈ V ⇒ αv ∈ V .
(v) α ∈ R, v1 , v2 ∈ V ⇒ α(v1 + v2 ) = αv1 + αv2 .
(vi) α1 , α2 ∈ R, v ∈ V ⇒ (α1 +α2 )v = α1 v +α2 v and α1 (α2 v) = (α1 α2 )v.
(vii) v ∈ V ⇒ 0v = θ and 1v = v.
Note that from conditions (vi) and (vii) above, it follows that for any v in
V , v + (−1)v = 0 · v = θ. Thus for any v in V , (−1)v is the additive inverse
and is denoted by −v. Conditions (i), (ii), and (iii) are called respectively
commutativity, associativity, and the existence of an additive identity. Thus
V under the operation + is an Abelian (i.e., commutative) group.
Definition 3.3.1: A function f from V to R+ denoted by f (v) ≡ 7v7 is
called a norm if
(a) v1 , v2 ∈ V ⇒ 7v1 + v2 7 ≤ 7v1 7 + 7v2 7
(b) α ∈ R, v ∈ V ⇒ 7αv7 = |α| 7v7
(triangle inequality)
(scalar homogeneity)
(c) 7v7 = 0 iff v = θ.
A vector space V with a norm 7 · 7 defined on it is called a normed vector
space or normed linear space and is denoted as (V, 7 · 7). Let d(v1 , v2 ) ≡
7v1 − v2 7, v1 , v2 ∈ V . Then from the definition of 7 · 7, it follows that d is a
metric on V , i.e., (V, d) is a metric space. Recall that a metric space (S, d)
96
3. Lp -Spaces
is called complete if every Cauchy sequence {xn }n≥1 in S converges to an
element x in S.
Definition 3.3.2: A Banach space is a complete normed linear space
(V, 7 · 7).
It was shown by S. Banach of Poland that all Lp (Ω, B, µ) spaces are
Banach spaces, provided p ≥ 1 and in particular, all Euclidean spaces
are Banach spaces. An example of a different kind is the space C[0, 1] of
all real valued continuous functions on [0, 1] with the usual operation of
pointwise addition and scalar multiplication, i.e., (f + g)(x) = f (x) + g(x)
and (αf )(x) = α · f (x) for all α ∈ R, 0 ≤ x ≤ 1, f , g ∈ C[0, 1] where the
norm (called the supnorm) is defined by 7f 7 = sup{|f (x)| = 0 ≤ x ≤ 1}.
The verification of the fact that C[0, 1] with the supnorm is a Banach
space is left as an exercise (Problem 3.22). The space P of all polynomials
on [0, 1] is also a normed linear space under the above norm but (P, 7 · 7) is
not complete (Problem 3.23). However for each n ∈ N, the space Pn of all
polynomials on [0, 1] of degree ≤ n is a Banach space under the supnorm
(Problem 3.26).
Definition 3.3.3: Let V be a vector space. A subset W ⊂ V is called a
subspace of V if v1 , v2 ∈ W, α1 , α2 ∈ R ⇒ α1 v1 + α2 v2 ∈ W . If (V, 7 · 7) is
a normed vector space and W is a subspace of V , then (W, 7 · 7) is also a
normed vector space. If W is closed in (V, 7 · 7), then W is called a closed
subspace of V .
Remark 3.3.1: If (V, 7 · 7) is a Banach space and W is a closed subspace
of V , then (W, 7 · 7) is also a Banach space.
3.3.2
Linear transformations
Let (Vi , 7 · 7i ), i = 1, 2 be two normed linear spaces over R.
Definition 3.3.4: A function T from V1 to V2 is called a linear transformation or linear operator if α1 , α2 ∈ R, x, y ∈ V1 ⇒ T (α1 x + α2 y) =
α1 T (x) + α2 T (y).
Definition 3.3.5: A linear operator T from (V1 , 7 · 71 ) to (V2 , 7 · 72 ) is
called bounded if 7T 7 ≡ sup{7T x72 : 7x71 < 1} < ∞, i.e., the image of
the unit ball in (V1 , 7 · 71 ) is contained in a ball of finite radius centered at
the zero in V2 .
Here is a summary of some important results on this topic. By linearity
x
1
) = /x/
T (x) for any x )= 0. It follows that T is bounded iff
of T , T ( /x/
there exists k < ∞ such that for any x ∈ V1 , 7T x72 ≤ k7x71 . Clearly,
then k can be taken to be 7T 7. Also by linearity, if T is bounded, then
7T x1 −T x2 7 = 7T (x1 −x2 )7 ≤ 7T 7 7x1 −x2 7 and so the map T is continuous
3.3 Banach and Hilbert spaces
97
(indeed, uniformly so). It turns out that if a linear operator T is continuous
at some x0 in V1 , then T is continuous on all of V1 and is bounded (Problem
3.28 (a)).
Now let B(V1 , V2 ) be the space of all bounded linear operators from
(V1 , 7·71 ) to (V2 , 7·72 ). For T1 , T2 ∈ B(V1 , V2 ), α1 , α2 in R, let (α1 T1 +α2 T2 )
be defined by (α1 T1 + α2 T2 )(x) ≡ α1 T1 (x) + α2 T1 (x) for all x in V1 . Then
it can be verified that (α1 T1 + α2 T2 ) also belongs to B(V1 , V2 ) and
7T 7 ≡ sup{7T x72 : 7x71 ≤ 1}
(3.1)
is a norm on B(V1 , V2 ). Thus (B(V1 , V2 ), 7 · 7) is also a normed linear space.
If (V2 , 7 · 72 ) is complete, then it can be shown that (B(V1 , V2 ), 7 · 7) is
also a Banach space (Problem 3.28 (b)). In particular, if (V2 , 7 · 72 ) is the
real line, the space (B(V1 , R), 7 · 7) is a Banach space.
3.3.3
Dual spaces
Definition 3.3.6: The space of all bounded linear functions from (V1 , 7·7)
to R (also called bounded linear functionals), denoted by V1∗ , is called the
dual space of V1 .
Thus, for any normed linear space (V1 , 7·71 ) (that need not be complete),
the dual space (V1∗ , 7 · 7) is always a Banach space, where 7T 7 ≡ sup{|T x| :
7x71 < 1} for T ∈ V1∗ . If (V1 , 7 · 71 ) = Lp (Ω, F, µ) for some measure
space (Ω, F, µ) and 1 ≤ p < ∞, by the Riesz representation theorem (see
Theorem 3.2.3), the dual space may be identified with Lq (Ω, F, µ) where
q is the conjugate of p, i.e., p1 + 1q = 1. However, as pointed out earlier in
Section 3.2, this is not true for p = ∞. That is, the dual of L∞ (Ω, F, µ) is
not L1 (Ω, F, µ) unless (Ω, F, µ) is a measure space where Ω is a finite set
{w1 , w2 , . . . , wk } and F = P(Ω). An example for the p = ∞ case can be
constructed for the space 2∞ of all bounded sequences of real numbers (cf.
Royden (1988)).
The representation of the dual space of the Banach space C[0, 1] with
supnorm is in terms of finite signed measures (cf. Section 4.2).
Theorem 3.3.1: (Riesz ). Let T : C[0, 1] → R be linear and bounded.
Then there exists two finite measures µ1 and µ2 on [0, 1] such that for any
f ∈ C[0, 1]
%
%
T (f ) = f dµ1 − f dµ2 .
For a proof see Royden (1988) or Rudin (1987) (see also Problem 3.27).
98
3. Lp -Spaces
3.3.4
Hilbert space
A vector space V over R is called a real innerproduct space if there exists
a function f : V × V → R, denoted by f (x, y) ≡ 1x, y2 (and called the
innerproduct) that satisfies
(i) 1x, y2 = 1y, x2 for all x, y ∈ V ,
(ii) (linearity) 1α1 x1 + α2 x2 , y2 = α1 1x, y2 + α2 1x2 , y2 for all α1 , α2 ∈ R,
x1 , x2 , y ∈ V ,
(iii) 1x, x2 ≥ 0 for all x ∈ V and 1x, x2 = 0 iff x = θ, the zero vector of V .
Using the fact that the quadratic function ϕ(t) = 1x + ty, x + ty2 =
1x, x2 + 2t1x, y2 + t2 1y, y2 is nonnegative for all t ∈ R, one gets the CauchySchwarz inequality
L
|1x, y2| ≤ 1x, x21y, y2 for all x, y ∈ V.
L
Now setting 7x7 = 1x, x2 and using the Cauchy-Schwarz inequality, one
verifies that 7x7 is a norm on V and thus (V, 7 · 7) is a normed linear space.
Further, the function 1x, y2 from V × V to R is continuous (Problem 3.29)
under the norm 7(x1 , x2 )7 = 7x1 7 + 7x2 7, (x1 , x2 ) ∈ V × V .
Definition 3.3.7: Let (V, 1·, ·2) be a real innerproduct space. It is called
a Hilbert space if (V, 7 · 7) is a Banach space, i.e., if it is complete.
It was seen in Section 3.2 that for any measure space (Ω, F, µ), the space
2
F, µ) of all equivalence classes of functions f : Ω → R satisfying
L
& (Ω,
|f |2 dµ &< ∞ is a complete innerproduct space with the innerproduct
1f, g2 = f gdµ and hence a Hilbert space. It turns out that every Hilbert
space H is an L2 (Ω, F, µ) for some (Ω, F, µ). (The axiom of choice or its
equivalent, the Hausdorff’s maximality principle, is required for a proof of
this. See Rudin (1987).) This is in contrast to the Banach space case where
every Lp (Ω, F, µ) with p ≥ 1 is a Banach space but not conversely, i.e.,
every Banach space need not be an Lp (Ω, F, µ).
Next for each x in a Hilbert space H, let Tx : H → R be defined by
Tx (y) = 1x, y2. By the defining properties of 1x, y2 and the Cauchy-Schwarz
inequality, it is easy to verify that Tx is a bounded linear function on H,
i.e.,
Tx (α1 y1 + α2 y2 ) = α1 Tx (y1 ) + α2 Tx (y2 )
and
|Tx (y)| ≤ 7x7 7y7
for all α1 , α2 ∈ R, y1 , y2 ∈ H
(3.2)
for all y ∈ H.
(3.3)
Thus Tx ∈ H , the dual space. It is an important result (see Theorem 3.3.3
below) that every T ∈ H ∗ is equal to Tx for some x in H and 7T 7 = 7x7.
Thus H ∗ can be identified with H.
∗
3.3 Banach and Hilbert spaces
99
Definition 3.3.8: Let (V, 1·, ·2) be an inner product space. Two vectors
x, y in V are said to be orthogonal and written as x ⊥ y if 1x, y2 = 0.
A collection B ⊂ V is called orthogonal if x, y ∈ B, x )= y ⇒ 1x, y2 = 0.
The collection B is called orthonormal if it is orthogonal and in addition
for all x in B, 7x7 = 1. Note that if x ⊥ y, then 7x − y72 = 1x − y, x −
y2 = 1x, x2 + 1y, y2 = 7x72 + 7y72 and√so if B is an orthonormal set, then
x, y ∈ B ⇒ either x = y or 7x − y7 = 2. Thus, if V is separable under the
metric d(x, y) = 7x − y7 (i.e., there exists a countable set D ⊂ V such that
for every x in V and # > 0, there exists a d ∈ D such that 7x − d7 < #) and
1
if B ⊂ V is an orthonormal system, then the open ball Sb of radius 2√
2
around each b ∈ B satisfies {Sb ∩ D : b ∈ B} are disjoint and nonempty.
Thus B is countable.
Now let (V, 1·, ·2) be a separable innerproduct space and B ⊂ V be an
orthonormal system.
Definition 3.3.9: The Fourier coefficients of a vector x in V with respect
on orthonormal set B is the set {1x, b2 : b ∈ B}.
Since V is separable, B is countable. Let
$nB = {bi : i ∈ N}. For a given
x ∈ V , let ci = 1x, bi 2, i ≥ 1. Let xn ≡ i=1 ci bi , n ∈ N. The sequence
{xn }n≥1 is called the partial sum sequence of the Fourier expansion of the
vector x w.r.t. the orthonormal set B.
A natural question is: when does {xn }n≥1 converge to x? By the linearity
property in the definition of the innerproduct 1·, ·2, it follows that
0 ≤ 7x − xn 72 = 1x − xn , x − xn 2 = 1x, x2 − 21x, xn 2 + 1xn , xn 2
and
1x, xn 2 =
Since {bi }i≥1 are orthonormal,
n
)
i=1
ci 1x, bi 2 =
7xn 72 = 1xn , xn 2 =
Thus,
n
)
i=1
n
)
c2i .
i=1
c2i = 1x, xn 2.
0 ≤ 7x − xn 72 = 7x72 − 7xn 72 = 7x72 −
leading to
n
)
c2i ,
i=1
Proposition 3.3.2: (Bessel’s inequality). Let {bi }i≥1 be orthonormal in
an innerproduct space (V, 1·, ·2). Then, for any x in V ,
∞
)
i=1
1x, bi 22 ≤ 7x72 .
(3.4)
100
3. Lp -Spaces
Now let (V, 1·, ·2) be a Hilbert space. Since for m > n,
7xn − xm 72 =
m
)
i=n+1
1x, bi 22 ,
it follows from Bessel’s inequality that {xn }n≥1 is a Cauchy sequence, and
since V is complete, there is a y in V such that xn → y. This implies (by the
continuity of 1x, y2) that 1x, bi 2 = limn→∞ 1xn , bi 2 = 1y, bi 2 ⇒ 1x−y, bi 2 = 0
for all i ≥ 1. Thus, it follows that 1x, bi 2 = 1y, bi 2 for all i ≥ 1. The last
relation implies y = x iff the set {bi }i≥1 satisfies the property that
1z, bi 2 = 0
for all i ≥ 1 ⇒ 7z7 = 0.
(3.5)
This property is called the completeness of B. Thus B ≡ {bi }i≥1 is a
complete orthonormal set for a Hilbert space H ≡ (V, 1·, ·2), iff for every
vector x,
∞
)
c2i = 7x72 ,
(3.6)
i=1
where ci = 1x, bi 2, i ≥ 1, which in turn holds iff
M
M
n
)
M
M
Mx −
ci bi M
M
M → 0 as n → ∞.
(3.7)
i=1
$∞
of real numbers such that i=1 c2i < ∞,
Conversely, if {ci }i≥1 is a sequence
$n
then the sequence {xn ≡ i=1 ci bi }n≥1 is Cauchy and hence converges to
an x in V . Thus the Hilbert space:H can be$
identified with
; the space 22
∞
of all square summable sequences {ci }i≥1 : i=1 c2i < ∞ , in the sense
that the map ϕ : x → {ci }i≥1 , where ci = 1x, bi 2, i ≥ 1, preserves the
algebraic structure as well as the innerproduct, i.e., ϕ is a linear operator
from H to 22 and 1ϕ(x), ϕ(y)2 = 1x, y2 for all x, y ∈ H. Such a ϕ is called
an isometric isomorphism between H to 22 . Note also that 22 is simply
L2 (Ω, F, µ) where Ω ≡ N, F = P(N), and µ, the counting measure. It can
be shown (using the axiom of choice) that every separable Hilbert space
does possess a complete orthonormal system, i.e., an orthonormal basis.
Next some examples are given. Here, unless otherwise indicated, H denotes the Hilbert space and B denotes an orthonormal basis of H.
Example 3.3.1:
$∞
(a) H ≡ 22 = {(x1 , x2 , . . .) : xi ∈ R, i=1 x2i < ∞}.
B ≡ {ei : i ≥ 1} where ei ≡ (0, 0, . . . , 1, 0, . . .) with 1 in the ith
position and 0 elsewhere.
!
"
1
m(A), m(·) being
(b) H ≡ L2 ([0, 2π], B [0, 2π] , µ) where µ(A) = 2π
Lebesgue measure.
B ≡ {cos nx : n = 0, 1, 2, . . . , } ∪ {sin nx : n = 1, 2, . . .}. (For a proof,
see Chapter 5.)
3.3 Banach and Hilbert spaces
101
2
(c) Let
& kH ≡ L (R, B(R), µ) where µ is a finite measure such that
|x| dµ < ∞ for all k = 1, 2, . . ..
Let B1 ≡ {1, x, x2 , . . .} and B be the orthonormal set generated by
applying the Gram-Schmidt procedure to B1 (see Problem 3.31). It
can be shown that B is a basis for H (Problem 3.39). When µ is the
standard normal distribution, the elements of B are called Hermite
polynomials.
For one more example, i.e., Haar functions, see Problem 3.40.
Theorem 3.3.3: (Riesz representation). Let H be a separable Hilbert
space. Then every bounded linear functional T on H → R can be represented
as T ≡ Tx0 for some x0 ∈ V , where Tx0 (y) ≡ 1y, x0 2.
Proof: Let B = {bi }i≥1 be an orthonormal basis for H. Let ci ≡ T (bi ),
i ≥ 1. Then, for n ≥ 1,
n
)
c2i
=
i=1
J
J)
J n 2J
J
ci JJ
⇒J
⇒
⇒
≤
T
-)
n
M
7T 7M
M
≤
c2i
< ∞.
i=1
ci bi
i=1
M)
n
c2i
i=1
∞
)
ci T (bi )
i=1
=
i=1
n
)
n
)
7T 72
i=1
.
(by the linearity of T )
M
.1/2
-)
n
M
2
M
ci bi M = 7T 7
ci
i=1
$n
Thus {xn ≡
i=1 ci bi }n≥1 is Cauchy in H and hence converges to an
H. By the continuity of T , for any y, T y = limn→∞ T yn , where
x0 in $
n
yn ≡ i=1 1y, bi 2bi , n ≥ 1. But
T yn
=
n
)
i=1
=
1y, bi 2ci =
N )
O
n
y,
bi ci
i=1
1y, xn 2, by the linearity of T
Again by continuity of 1y, x2, it follows that T y = 1y, x0 2.
!
A sufficient condition for an L2 (Ω, F, µ) to be separable is that there
exists an at most countable family A ≡ {Aj }j≥1 of sets in F such that
F =! σ1A2 and"µ(Aj ) > 0 for each j. This holds for any σ-finite measure µ
on Rk , B(Rk ) (Problem 3.38).
102
3. Lp -Spaces
Remark 3.3.2: Assuming the axiom of choice, the Riesz representation
theorem remains valid for any Hilbert space, separable or not (Problem
3.43).
3.4 Problems
3.1 Prove Proposition 3.1.7.
(Hint: Use (1.4) repeatedly.)
3.2 Let (Ω, F, µ) be a measure space with µ(Ω) ≤ 1 and f : Ω → (a, b) ⊂
R be& in L1 (Ω, F, µ). Let φ : (a, b) → R be convex. Show that if
c ≡ f dµ ∈ (a, b) and φ(f ) ∈ L1 (Ω, F, µ) and cφ$+ (c) ≥ 0, then
-%
. %
µ(Ω) φ
f dµ ≤ φ(f )dµ.
3.3 Prove Proposition 3.1.10.
(Hint: Apply Jensen’s inequality with Ω ≡ {1, 2, . . . , k}, F = P(Ω),
P ({i}) = pi , f (i) = ai , i = 1, 2, . . . , k, and φ(x) = ex to get (i).
Deduce (ii) from (i) and Remark 3.1.3. For (iii), consider φ(x) = |x|p .)
3.4 Give an example of a convex function φ on (0, 1) with a finite number
of points where it is not differentiable. Can this be extended to the
countable case? Uncountable case?
(Hint: Note that φ$+ (·) and φ$− (·) are both monotone and hence have
at most a countable number of discontinuity points.)
3.5 Let φ : (a, b) → R be convex.
(a) Using the definition and induction, show that
-)
. )
n
n
φ
pi xi ≤
pi φ(xi )
i=1
i=1
for any n ≥ 2, x1 , x2 . . . , xn in (a, b) and {p1 , p2 , . . . , pn }, a probability distribution.
(b) Use (a) to prove Jensen’s inequality for any bounded φ.
3.6 Show that a function φ : R → R is convex iff
-%
. %
φ
f dm ≤
φ(f )dm
[0,1]
[0,1]
for every bounded Borel measurable function f : [0, 1] → R, where
m(·) is the Lebesgue measure.
3.4 Problems
103
3.7 Let φ be convex on (a, b) and ψ : R → R be convex and nondecreasing.
Show that ψ ◦ φ is convex on (a, b).
3.8 Let X be a nonnegative random variable on some probability space.
1
(a) Show that (EX)(E X
) ≥ 1. What does this say about the cor1
relation between X and X
?
(b) Let f, g : R+ → R+ be Borel measurable and such that
f (x)g(x) ≥ 1 for all x in R+ . Show that Ef (X)Eg(X) ≥ 1.
3.9 Prove Corollary 3.1.13 using Hölder’s inequality applied to an appropriate measure space.
3.10 Extend Hölder’s inequality as follows. Let
$k1 < pi < ∞, and
fi ∈ Lpi (Ω, F, µ), i = 1, 2, . . . , k. Suppose i=1 p1i = 1. Show that
"
& ! <k
<k
i=1 fi dµ ≤
i=1 7fi 7pi .
(Hint: Use Proposition 3.1.10 (ii).)
3.11 Verify Minkowski’s inequality for p = 1 and p = ∞.
3.12 (a) Find (Ω, F, µ), 0 < p < 1, f , g ∈ Lp (Ω, F, µ) such that
'%
(1/p ' %
(1/p ' %
(1/p
|f + g|p dµ
>
|f |p dµ
+
|g|p dµ
.
(b) Prove (1.18) for 0 < p < 1 with 7f 7p =
&
|f |p dµ.
3.13 Let (Ω, F, µ) be a measure space.
Let
'
( {Ak }k≥1 ⊂ F and
$∞
lim Ak = 0, where lim Ak =
k=1 µ(Ak ) < ∞. Show that µ
k→∞
k→∞
+∞ #
j≥n Aj = {ω : ω ∈ Aj for infinitely many j ≥ 1}.
n=1
3.14 Establish Theorem 3.2.2 for p = ∞.
(Hint: For each k ≥ 1, choose nk ↑ such that 7fnk+1 − fnk 7∞ < 2−k .
Show that there exists a set A with µ(Ac ) = 0 and for ω in A,
|fnk+1 (ω) − fnk (ω)| < 2−k for all k ≥ 1 and now proceed as in the
proof for the case 0 < p < ∞.)
&
3.15 Let f , g ∈ Lp (Ω, F, µ), 0 < p < 1. Show that d(f, g) = |f − g|p dµ
is a metric and (Lp (Ω, F, µ), d) is complete.
3.16 Let (Ω, F, µ) be a measure
& space and f : Ω → R be F-measurable.
Let Af = {p : 0 < p < ∞, |f |p dµ < ∞}.
(a) Show that p1 , p2 ∈ Af , p1 < p2 implies [p1 , p2 ] ⊂ Af .
&
&
&
(Hint: Use |f |≥1 |f |p dµ ≤ |f |≥1 |f |p2 dµ and |f |≤1 |f |p dµ ≤
&
|f |p1 dµ for any p1 < p < p2 .)
|f |≤1
104
3. Lp -Spaces
&
(b) Let ψ(p) = log |f |p dµ for p ∈ Af . By (a), it is known that Af
is connected, i.e., it is an interval. Show that ψ is convex in the
open interior of Af .
(Hint: Use Hölder’s inequality.)
(c) Give examples to show that Af could be a closed interval, an
open interval, and semi-open intervals.
(d) If 0 < p1 < p < p2 , show that
7f 7p ≤ max{7f 7p1 , 7f 7p2 }
(Hint: Use (b).)
&
(e) Show that if |f |r dµ < ∞ for some 0 < r < ∞, then 7f 7p →
7f 7∞ as p → ∞.
(Hint: Show first that for any K > 0, µ(|f | > K) > 0 ⇒
lim 7f 7p ≥ K. If 7f 7∞ < ∞ and µ(Ω) < ∞, use the fact that
p→∞
1/p
7f 7p ≤ 7f 7∞ (µ(Ω))
and reduce the general case under the
&
hypothesis that |f |p dµ < ∞ for some p to this case.)
3.17 Let X be a random
variable on a probability space (Ω, F, µ). Recall
&
that Eh(X) = h(X)dµ if h(X) ∈ L1 (Ω, F, µ).
p1
(a) Show that (E|X|p1 ) ≤ (E|X|p2 ) p2 for any 0 < p1 < p2 < ∞.
(b) Show that ‘=’ holds in (a) iff |X| is a constant a.e. (µ).
(c) Show that if E|X| < ∞, then E| log |X|| < ∞ and E|X|r < ∞
for all 0 < r < 1, and 1r log(E|X|r ) → E log |X| as r → 0.
3.18 Let X be a nonnegative random variable.
(a) Show that EX log X ≥ (EX)(E log X).
L
√
(b) Show that 1 + (EX)2 ≤ E( 1 + X 2 ) ≤ 1 + EX.
3.19 Let Ω = N, F = P(N), and let µ be the counting measure. Denote
Lp (Ω, F, µ) for this case by 2p .
(a) $
Show that 2p is the set of all sequences {xn }n≥1 such that
∞
p
n=1 |xn | < ∞.
(b) For the following sequences, find all p > 0 such that they belong
to 2p :
(i) xn ≡
(ii) xn =
1
n,
n ≥ 1.
1
n(log(n+1))2 ,
n ≥ 1.
3.4 Problems
105
3.20 For 1 ≤ p < ∞, prove the Riesz representation theorem for 2p . That
is, show that if T is a bounded linear functional from 2p → R, then
there exists
$∞ a y = {yi }i≥1 ∈ 2q such that for any x = {xi }i≥1 in 2p ,
T (x) = i=1 xi yi .
(Hint: Let yi = T (ei ) where ei = {ei (j)}j≥1 , ei (j) = 1 if i = j, 0 if
i$
)= j. Use the fact |T (x)| ≤ 7T 7 7x7p to show that for each n ∈ N,
n
( i=1 |yi |q ) ≤ 7T 7q < ∞.)
3.21 Let Ω = R, F = B(R), µ = µF where F is a cdf on R. If f (x) ≡ x2 ,
find Af = {p : 0 < p < ∞, f ∈ Lp (R, B(R), µF )} for the following
cases:
&x
2
(a) F (x) = Φ(x), the N (0, 1) cdf, i.e., Φ(x) ≡ √12π −∞ e−u /2 du,
x ∈ R.
&x
1
(b) F (x) = π1 −∞ 1+u
2 du, x ∈ R.
3.22 Show that C[0, 1] with the supnorm (i.e., with 7f 7 = sup{|f (x)| : 0 ≤
x ≤ 1}) is a Banach space.
(Hint: To verify completeness, let {fn }n≥1 be a Cauchy sequence
in C[0, 1]. Show that for each 0 ≤ x ≤ 1, {fn (x)}n≥1 is a Cauchy
sequence in R. Let f (x) = lim fn (x). Now show that sup{|fn (x) −
n→∞
f (x)| : 0 ≤ s ≤ 1} ≤ lim 7fn − fm 7. Conclude that fn converges to
m→∞
f uniformly on [0, 1] and that f ∈ C[0, 1].)
3.23 Show that the space (P, 7 · 7) of all polynomials on [0, 1] with the
supnorm is a normed linear space that is not complete.
1
(Hint: Let g(t) = 1−t/2
for 0 ≤ t ≤ 1. Find a sequence of polynomials
{fn }n≥1 in P that converge to g in supnorm.)
3.24 Show that the function f (v) ≡ 7v7 from a normed linear space (V, 7·7)
to R+ is continuous.
3.25 Let (V, 7 · 7) be a normed linear space. Let S = {v : 7v7 < 1}. Show
that S is an open set in V .
3.26 Show that the space Pk of all polynomials on [0, 1] of degree ≤ k
is a Banach space under the supnorm, i.e., under 7f 7 = sup{|f (x)|,
0 ≤ s ≤ 1}.
$k
(Hint: Let pn (x) = j=0 anj xj be a sequence of elements in Pk that
converge in supnorm to some f (·). Show that {an1 }n≥1 converges and
recursively, {ani }n≥1 converges for each i.)
&
3.27 Let µ be a finite measure on [0,1]. Verify that Tµ (f ) ≡ f dµ is a
bounded linear functional on C[0, 1] and that 7Tµ 7 = µ([0, 1]).
106
3. Lp -Spaces
3.28 Let (Vi , 7 · 7i ), i = 1, 2, be two normed linear spaces over R.
(a) Let T : V1 → V2 be a linear operator. Show that if for some x0 ,
7T x − T x0 7 → 0 as x → x0 , then T is continuous on V1 and
hence bounded.
(b) Show that if (V2 , 7 · 72 ) is complete, then B(V1 , V2 ) ≡ {T | T :
V1 → V2 , T linear and bounded} is complete under the operator
norm defined in (3.1).
In the following, H will denote a real Hilbert space.
3.29 (a) Use the Cauchy-Schwarz inequality to show that the function
f (x, y) = 1x, y2 is continuous from H × H → R.
(b) (Parallelogram law). Show that in an innerproduct space
(V, 1·, ·2), for any x, y ∈ V
7x + y72 + 7x − y72 = 2(7x72 + 7y72 )
where 7x72 = 1x, x2.
3.30 (a) Let {Qn (x)}n≥0 be defined on [0, 2π] by
Qn (x) = cn
where cn is such that
%
1
2π
2π
0
' 1 + cos x (n
2
Qn (x)dx = 1.
Clearly, Qn (·) ≥ 0.
(i) Verify that for each δ > 0,
sup{Qn (x) : δ ≤ x ≤ 2π − δ} → 0
as
n → ∞.
(ii) Use this to show that if f ∈ C[0, 2π] and if
1
Pn (t) ≡
2π
%
0
2π
Qn (t − s)f (s)ds, n ≥ 0,
(4.1)
then Pn → f uniformly on [0, 2π].
(b) Use this to give a proof of the completeness of the class C of
trigonometric functions.
(c) Show that if f ∈ L1 ([0, 2π]), then Pn (·) converges to f in
L1 ([0, 2π]).
3.4 Problems
107
(d) Let {µn (·)}n≥1 be a sequence of probability measures on
(R, B(R)) such that for each δ > 0, µn ({x : |x| ≥ δ}) → 0
as
& n → ∞. Let f : R → R be Borel measurable. Let fn (x) ≡
f (x− y)µn (dy), n ≥ 1. Assuming that fn (·) is well defined and
Borel measurable, show that
(i) f (·) continuous at x0 and bounded ⇒ fn (x0 ) → f (x0 ).
(ii) f (·) uniformly continuous and bounded on R ⇒ fn → f
uniformly on R.
(iii) f ∈& Lp (R, B(R), m), 0 < p < ∞, m(·) = Lebesgue measure
⇒ |fn − f |p dm → 0.
(iv) Show that (iii) ⇒ (c).
3.31 (Gram-Schmidt procedure). Let B ≡ {bn : n ∈ N} be a collection of
nonzero vector in H. Set
b1
7b1 7
= b2 − 1b2 , e1 2e1 ,
ẽ2
(provided 7ẽ2 7 )= 0), and so on.
=
7ẽ2 7
=
e1
ẽ2
e2
If 7ẽn 7 = 0 for some n ∈ N, then delete bn . Let E ≡ {ej : 1 ≤ j < k},
k ≤ ∞, be the collection of vectors reached this way.
(a) Show that E is an orthonormal system.
(b) Let HB denote the closed linear subspace generated by B, i.e.,
HB
≡
/
x : x ∈ H, there exists xn of the form
0
aj ∈ R, such that 7xn − x7 → 0 .
n
)
aj bj ,
j=1
Show that HB is a Hilbert space and E is a basis for HB .
3.32 Let H = L2 (R, B(R), µ), where &µ is a probability measure. Let
B ≡ {1, x, x2 , . . .}. Assume that |x|k dµ < ∞ for all k ∈ N. Apply the Gram-Schmidt procedure in Problem 3.31 to the set B for
the following cases and evaluate e1 , e2 , e3 .
(a) µ = Uniform [0, 1] distribution.
(b) µ = standard normal distribution.
(c) µ = Exponential (1) distribution.
The orthonormal basis E obtained this way is called Orthogonal Polynomials w.r.t. the given measure. (See Szego (1939).)
108
3. Lp -Spaces
3.33 Let B ⊂ H be an orthonormal system. Show that for any x in H,
{b : 1x, b2 )= 0} is at most countable.
a collection of nonnegative
(Hint: Show first that if {yα : α ∈ I} is$
real numbers such that for some C < ∞, α∈F yα ≤ C for all F ⊂ I,
F finite, then {α : yα > 0} is at most countable and apply this to the
Bessel inequality.)
3.34 Let B ⊂ H. Define B ⊥ ≡ {x : x ∈ H, 1x, b2 = 0, for all b ∈ B}.
Show that B ⊥ is a closed subspace of H.
3.35 Let B ⊂ H be a closed subspace of H.
(a) Using the fact that every Hilbert space admits an orthonormal
basis, show that every x in H can be uniquely decomposed as
x=y+z
(4.2)
where y ∈ B and z ∈ B ⊥ and 7x72 = 7y72 + 7z72 .
(b) Let PB : H → B be defined by PB x = y where x admits the
decomposition in (4.2) above. Verify that PB is a bounded linear
operator from H to B and is of norm 1 if B has at least one
nonzero vector. (The operator PB is called the projection onto
B.)
(c) Verify that PB (PB x) = PB x for all x in H.
3.36 Let H be separable and {xn }n≥1 ⊂ H be such that {7xn 7n≥1 } is
bounded by some C < ∞. Show that there exist a subsequence
{xnj }j≥1 ⊂ {xn }n≥1 and an x0 in H, such that for every y in H,
1xnj , y2 → 1x0 , y2.
(Hint: Fix an orthonormal$basis B ≡ {bn }n≥1 ⊂ H. Let ani =
∞
1xn , bi 2, n ≥ 1, i ≥ 1. Using i=1 a2ni ≤ C for all n and the BolzanoWierstrass property, show that
(a) there exists {nj }j≥1 such that lim anj i = ai exists for all i ≥ 1,
j→∞
$∞ 2
a
<
∞,
i=1 i
$n
(b) lim
i=1 ai bi ≡ x0 exists in H,
n→∞
(c) 1xnj , y2 → 1x0 , y2 for all y in H.)
3.37 Let (V, 1·, ·2) be an innerproduct space. Verify that 1·, ·2 is bilinear,
i.e., for α1 , α2 , β1 , β2 ∈ R, x1 , x2 , y1 , y2 in V ,
1α1 x1 + α2 x2 , β1 y1 + β2 y2 2
=
α1 β1 1x1 , y1 2 + α1 β2 1x1 , y2 2
+ α2 β1 1x2 , y1 2 + α2 β2 1x2 , y2 2.
State and prove an extension to more than two vectors.
3.4 Problems
109
3.38 Let (Ω, F, µ) be a measure space. Suppose that there exists an at
most countable family A ≡ {Aj }j≥1 ⊂ F such that F = σ1A2 and
µ(Aj ) > 0 for each j ≥ 1. Then show that for 0 < p < ∞, Lp (Ω, F, µ)
is separable.
(Hint: Show first that for any A ∈ F with µ(A) < ∞, and # > 0,
there exists a countable subcollection A1 of A such that µ(A+B) < #
where
$n B = {∪Aj : Aj ∈ A1 }. Now consider the class of functions
{ i=1 ci IAi , n ≥ 1, Ai ∈ A, ci ∈ Q}.)
3.39 Show that B in Example 3.3.1 (c) is a basis for H.
(Hint: Using Theorem 2.3.14 prove that the set of all polynomials
are dense in H.)
3.40 (Haar functions). For x in R let h(x) = I[0,1/2) (x) − I[1/2,1) (x).
Let h00 (x) ≡ I[0,1) (x) and for k ≥ 1, 0 ≤ j < 2k−1 , let hkj (x) ≡
k−1
2 2 h(2k−1 x − j), 0 ≤ x < 1.
(a) Verify that the family {hkj (·), k ≥ 0, 0 ≤ j < 2k−1 } is an
orthonormal set in L2 ([0, 1], B([0, 1]), m), where m(·) is Lebesgue
measure.
(b) Verify that this family is complete by completing the following
two proofs:
(i) Show Bthat for
" indicator function f of dyadic interval of the
form 2kn , 2,n , k < 2, the following identity holds:
%
%
(2
2 − k )'
f 2 dm = n =
f hkj dm .
2
k,j
Now use the fact the linear combinations of such f ’s is dense
in L2 [0, 1].
(ii) For each f ∈ L2 ([0, 1], B([0, 1]), m) such
& that f is orthogonal to the Haar functions, F (t) ≡ [0,t] f dm, 0 ≤ t ≤ 1
is continuous and satisfies F ( 2jn ) = 0 for all 0 ≤ j ≤ 2n ,
n = 1, 2, . . . and hence F ≡ 0 implying f = 0 a.e.
3.41 Let H be a Hilbert space over R and M be a closed subspace of H.
Let v0 ∈ H. Show that
min{7v − v0 7 : v ∈ M } = max{1v0 , u2, u ∈ M ⊥ , 7u7 = 1},
where M ⊥ is the orthogonal complement of M , i.e., M ⊥ ≡ {u :
1v, u2 = 0 for all v ∈ M }.
(Hint: Use Problem 3.35 (a).)
110
3. Lp -Spaces
3.42 Let B be an orthonormal set in a Hilbert space H.
(a) (i) Show that for any x in H and any finite set {bi : 1 ≤ i ≤
k} ⊂ B, k < ∞,
k
)
i=1
1x, bi 22 ≤ 7x72 .
(ii) Conclude that for all x in H, Bx ≡ {b : 1x, b2 )= 0, b ∈ B}
is at most countable.
(b) Show that the following are equivalent:
(i) B is complete, i.e., x ∈ H, 1x, b2 = 0 for all b ∈ B ⇒ x = 0.
(ii) For all x ∈ B, there exists an$at most countable set Bx ≡
∞
{bi : i ≥ 1} such that 7x72 = i=1 1x, bi 22 .
(iii) For all x ∈ B, # > 0, there exists a finite set
{b1 , b2 , . . . , bk } ⊂ B such that
k
M
M
)
M
M
1x, bi 2bi M < #.
Mx −
i=1
(iv) If B ⊂ B 1 , B 1 an orthonormal set in H ⇒ B = B 1 .
3.43 Extend Theorem 3.3.3 to any Hilbert space assuming that the axiom
of choice holds.
(Hint: Using the axiom of choice or its equivalent, the Hausdorff
maximality principle, it can be shown that every Hilbert space H
admits an orthonormal basis B (see Rudin (1987)). Now let T be a
bounded linear functional from H to R. Let f (b) ≡ T (b) for b in B.
$k
Verify that i=1 |f (bi )|2 ≤ 7T 72 for all finite collection {bi : 1 ≤
i ≤ k}
$⊂ B. Conclude that D ≡ {b : f (b) )= 0} is countable. Let
x0 ≡ b∈D f (b)b. Now use the proof of Theorem 3.3.3 to show that
T (x) ≡ 1x, x0 2 for all x in H.)
3.44 Let (V, 7·7) be a normed linear space. Let {Tn }n≥1 and T be bounded
linear operators from V to V . The sequence {Tn }n≥1 is said to converge
(a) weakly to T if for each w in V ∗ , the dual of V , and each v in V ,
w(Tn (v)) → w(T (v)),
(b) strongly if for each v in V , 7Tn v − T v7 → 0,
(c) uniformly if sup{7Tn v − T v7 : 7v7 ≤ 1} → 0.
3.4 Problems
111
Let Vp = Lp (R, B(R), µL ), 1 ≤ p ≤ ∞. Let {hn }n≥1 ⊂ R be such that
hn )= 0, hn → 0, as n → ∞. Let (Tn f )(·) ≡ f (· + hn ), T f (·) ≡ f (·).
Verify that
(i) {Tn }n≥1 and T are bounded linear operators on Vp , 1 ≤ p ≤ ∞,
(ii) for 1 ≤ p < ∞, {Tn }n≥1 converges to T weakly,
(iii) for 1 ≤ p < ∞, {Tn } converges to T strongly,
(iv) for 1 ≤ p < ∞, {Tn } does not converge to T uniformly by
showing that for all n, 7Tn − T 7 = 1,
(v) for p = ∞, show that Tn does not converge weakly to T .
4
Differentiation
4.1 The Lebesgue-Radon-Nikodym theorem
Definition 4.1.1: Let (Ω, F) be a measurable space and let µ and ν be
two measures on (Ω, F). The measure µ is said to be dominated by ν or
absolutely continuous w.r.t. ν and written as µ ; ν if
ν(A) = 0 ⇒ µ(A) = 0
for all A ∈ F.
(1.1)
Example 4.1.1: Let m be the Legesgue measure on (R, B(R)) and let µ
be the standard normal distribution, i.e.,
%
2
1
√ e−x /2 m(dx), A ∈ B(R).
µ(A) ≡
2π
A
Then m(A) = 0 ⇒ µ(A) = 0 and hence µ ; m.
Example 4.1.2: Let Z+ ≡ {0, 1, 2, . . .} denote the set of all nonnegative
integers. Let ν be the counting measure on Ω = Z+ and µ be the Poisson (λ)
distribution for 0 < λ < ∞, i.e.,
ν(A) = number of elements in A
and
µ(A) =
) e−λ λj
j!
j∈A
for all A ∈ P(Ω), the power set of Ω. Since ν(A) = 0 ⇔ A = ∅ ⇔ µ(A) = 0,
it follows that µ ; ν and ν ; µ.
114
4. Differentiation
Example 4.1.3: Let f be a nonnegative measurable function on a measure
space (Ω, F, ν). Let
%
f dν for all A ∈ F.
(1.2)
µ(A) ≡
A
Then, µ is a measure on (Ω, F) and ν(A) = 0 ⇒ µ(A) = 0 for all A ∈ F
and hence µ ; ν.
The Radon-Nikodym theorem is a sort of converse to Example 4.1.3. It
says that if µ and ν are σ-finite measures (see Section 1.2) on a measurable
space (Ω, F) and if µ ; ν, then there is a nonnegative measurable function
f on (Ω, F) such that (1.2) holds.
Definition 4.1.2: Let (Ω, F) be a measurable space and let µ and ν be
two measures on (Ω, F). Then, µ is called singular w.r.t. ν and written as
µ ⊥ ν if there exists a set B ∈ F such that
µ(B) = 0
and ν(B c ) = 0.
(1.3)
Note that µ is singular w.r.t. ν implies that ν is singular w.r.t. µ. Thus,
the notion of singularity between two measures µ and ν is symmetric but
that of absolutely continuity is not. Note also that if µ and ν are mutually
singular and B satisfies (1.3), then for all A ∈ F,
µ(A) = µ(A ∩ B c )
and ν(A) = ν(A ∩ B).
(1.4)
Example 4.1.4: Let µ be the Lebesgue measure on (R, B(R)) and ν be
defined as ν(A) = # elements in A ∩ Z where Z is the set of integers. Then
ν(Zc ) = 0
and µ(Z) = 0
and hence (1.3) holds with B = Z. Thus µ and ν are mutually singular.
Another example is the pair m and µc on [0,1] where µc is the LebesgueStieltjes measure generated by the Cantor function (cf. Section 4.5) and m
is the Lebesgue measure.
Example 4.1.5: Let µ be the Lebesgue measure restricted to (−∞, 0] and
ν be the Exponential(1) distribution. That is, for any A ∈ B(R),
µ(A)
ν(A)
=
the Lebesgue measure of
%
=
e−x dx.
A ∩ (−∞, 0];
A∩(0,∞)
Then, µ((0, ∞)) = 0 and ν((−∞, 0]) = 0, and (1.3) holds with B =
(−∞, 0].
4.1 The Lebesgue-Radon-Nikodym theorem
115
Suppose that µ and ν are two finite measures on a measurable space
(Ω, F). H. Lebesgue showed that µ1 can be decomposed as a sum of two
measures, i.e.,
µ = µa + µs
where µa ; ν and µs ⊥ ν. The next theorem is the main result of this
section and it combines the above decomposition result of Lebesgue with
the Radon-Nikodym theorem mentioned earlier.
Theorem 4.1.1: Let (Ω, F) be a measurable space and let µ1 and µ2 be
two σ-finite measures on (Ω, F).
The measure µ1 can be
(i) (The Lebesgue decomposition theorem).
uniquely decomposed as
µ1 = µ1a + µ1s
(1.5)
where µ1a and µ1s are σ-finite measures on (Ω, F) such that µ1a ; µ2
and µ1s ⊥ µ2 .
(ii) (The Radon-Nikodym theorem). There exists a nonnegative measurable function h on (Ω, F) such that
%
µ1a (A) =
hdµ2 for all A ∈ F.
(1.6)
A
Proof: Case 1: Suppose that µ1 and µ2 are finite measures. Let µ be
the measure µ = µ1 + µ2 and let H = L2 (µ). Define a linear function T on
H by
%
T (f ) =
(1.7)
f dµ1 .
Then, by the Cauchy-Schwarz inequality applied to the functions f and
g ≡ 1,
|T (f )|
≤
≤
-%
-%
2
f dµ1
2
f dµ
. 12 '
. 12 '
µ1 (Ω)
µ1 (Ω)
( 12
( 12
.
This shows that T is a bounded linear functional on H with 7T 7 ≤ M ≡
1
(µ1 (Ω)) 2 . By the Riesz representation theorem (cf. Theorem 3.3.3 and
Remark 3.3.2), there exists a g ∈ L2 (µ) such that
%
T (f ) = f gdµ
(1.8)
116
4. Differentiation
for all f ∈ L2 (µ). Let f = IA for A in F. Then, (1.7) and (1.8) yield
%
gdµ.
µ1 (A) = T (IA ) =
A
But 0 ≤ µ1 (A) ≤ µ(A) for all A ∈ F. Hence the function g in L2 (µ) satisfies
%
gdµ ≤ µ(A) for all A ∈ F.
(1.9)
0≤
A
Let A1 = {0 ≤ g < 1}, A2 = {g = 1}, A3 = {g )∈ [0, 1]}. Then (1.9)
implies that µ(A3 ) = 0 (see Problem 4.1). Now define the measures µ1a (·)
and µ1s (·) by
µ1a (A) ≡ µ1 (A ∩ A1 ),
µ1s (A) ≡ µ1 (A ∩ A2 ), A ∈ F.
(1.10)
Next it will be shown that µ1a ; µ2 and µ1s ⊥ µ2 , thus establishing (1.5).
By (1.7) and (1.8), for all f ∈ H,
%
%
%
%
f dµ1 = f gdµ = f gdµ1 + f gdµ2
⇒
Setting f = IA2 yields
%
f (1 − g)dµ1 =
%
f gdµ2 .
(1.11)
0 = µ2 (A2 ).
From (1.10), since µ1s (Ac2 ) = 0, it follows that µ1s ⊥ µ2 . Now fix n ≥ 1
and A ∈ F. Let f = IA∩A1 (1 + g + . . . + g n−1 ). Then (1.11) implies that
%
%
n
(1 − g )dµ1 =
g(1 + g + . . . + g n−1 )dµ2 .
A∩A1
A∩A1
Now letting n → ∞, and using the MCT on both sides, yields
%
g
µ1a (A) =
dµ2 .
IA1
(1
−
g)
A
Setting h ≡
g
1−g IA1
(1.12)
completes the proof of (1.5) and (1.6).
Case 2: Now suppose that µ1 and µ2 are σ-finite. Then there exists a
countable partition {Dn }≥1 ⊂ F of Ω such that µ1 (Dn ) and µ2 (Dn ) are
(n)
(n)
both finite for all n ≥ 1. Let µ1 (·) ≡ µ1 (· ∩ Dn ) and µ2 ≡ µ2 (· ∩ Dn ).
(n)
(n)
Then applying ‘Case 1’ to µ1 and µ2 for each n ≥ 1, one gets measures
(n)
(n)
µ1a , µ1s and a function hn such that
(n)
(n)
(n)
µ1 (·) ≡ µ1a (·) + µ1s (·)
(1.13)
4.1 The Lebesgue-Radon-Nikodym theorem
117
&
&
(n)
(n)
(n)
(n)
where, for A in F, µ1a (A) = A hn dµ2 = A hn IDn dµ2 and µ1s (·) ⊥ µ2 .
$∞
(n)
Since µ1 (·) = n=1 µ1 (·), it follows from (1.13) that
where µ1a (A) ≡
$∞
(1.14)
µ1 (·) = µ1a (·) + µ1s (·),
$
(n)
(n)
∞
n=1 µ1a (A) and µ1s (·) =
n=1 µ1s (·). By the MCT,
%
hdµ2 , A ∈ F,
µ1a (A) =
$∞
A
where h ≡ n=1 hn IDn .
Clearly, µ1a ; µ2 . The verification of the singularity of µ1s and µ2 is
left as an exercise (Problem 4.2).
It remains to prove the uniqueness of the decomposition. Let µ1 = µa +
µs and µ1 = µ$a + µ$s be two decompositions of µ1 where µa and µ$a are
absolutely continuous w.r.t. µ2 and µs and µ$s are singular w.r.t. µ2 . By
definition, there exist sets B and B $ in F such that
µ2 (B) = 0, µ2 (B $ ) = 0,
and µs (B c ) = 0, µ$s (B $c ) = 0.
Let D = B ∪ B $ . Then µ2 (D) = 0 and µs (Dc ) ≤ µs (B c ) = 0. Similarly,
µ$s (Dc ) ≤ µ$s (B $c ) = 0. Also µ2 (D) = 0 implies µa (D) = 0 = µ$a (D). Thus
for any A ∈ F,
Also
µa (A) = µa (A ∩ Dc )
µs (A ∩ Dc )
µ$s (A ∩ Dc )
and µ$a (A) = µ$a (A ∩ Dc ).
≤ µs (A ∩ B c ) = 0
≤ µ$s (A ∩ B $c ) = 0.
Thus, µ(A ∩ Dc ) = µa (A ∩ Dc ) + µs (A ∩ Dc ) = µa (A ∩ Dc ) = µa (A) and
µ(A ∩ Dc ) = µ$a (A ∩ Dc ) + µ$s (A ∩ Dc ) = µ$a (A ∩ Dc ) = µ$a (A). Hence,
µa (A) = µ(A ∩ Dc ) = µ$a (A) for every A ∈ F. That is, µa = µ$a and hence,
!
µs = µ$s .
Remark 4.1.1: In Theorem 4.1.1, the hypothesis of σ finiteness cannot be
dropped. For example, let µ be the Lebesgue measure and ν be the counting
measure on [0, 1]. Then µ ; ν but there does
not exist a nonnegative F&
measurable function h such that µ(A) = & A hdν. To see this, if possible,
suppose that for some h &∈ L1 (ν), µ(A) = A hdν for all A ∈ F. Note that
µ([0, 1]) = 1 implies that [0,1] hdν < ∞ and hence, that B ≡ {x : h(x) > 0}
is countable (Problem 4.3). But µ being the Lebesgue measure, µ(B) = 0
c
c
c
&and µ(B ) = 1. Since by definition, h ≡ 0 on B , this implies 1 = µ(B ) =
hdν = 0, leading to a contradiction. However, if ν is σ-finite and µ ; ν
Bc
(µ not necessarily σ-finite), then the Radon-Nikodym theorem holds, i.e.,
there exists a nonnegative F-measurable function h such that
%
µ(A) =
hdν for all A ∈ F.
A
118
4. Differentiation
For a proof, see Royden (1988), Chapter 11.
Definition 4.1.3: Let µ and ν be measures on a measurable space (Ω, F)
and let h be a nonnegative measurable function such that
%
hdν for all A ∈ F.
µ(A) =
A
Then h is called the Radon-Nikodym derivative of µ w.r.t. ν and is written
as
dµ
= h.
dν
If µ(Ω) < ∞ and there exist two nonnegative F-measurable functions h1
and h2 such that
%
%
µ(A) =
h1 dν =
A
h2 dν
A
for all A ∈ F, then h1 = h2 a.e. (ν) and thus the Radon-Nikodym derivative
dµ
dν is unique up to equivalence a.e. (ν). This also extends to the case when
µ is σ-finite.
The following proposition is easy to verify (cf. Problem 4.4).
Proposition 4.1.2: Let ν, µ, µ1 , µ2 , . . . be σ-finite measures on a measurable space (Ω, F).
(i) If µ1 ; µ2 and µ2 ; µ3 , then µ1 ; µ3 and
dµ1 dµ2
dµ1
=
dµ3
dµ2 dµ3
a.e. (µ3 ).
(ii) Suppose that µ1 and µ2 are dominated by µ3 . Then for any α, β ≥ 0,
αµ1 + βµ2 is dominated by µ3 and
dµ1
dµ2
d(αµ1 + βµ2 )
=α
+β
dµ3
dµ3
dµ3
(iii) If µ ; ν and
dµ
dν
a.e. (µ3 ).
> 0 a.e. (ν), then ν ; µ and
dν
=
dµ
-
dµ
dν
.−1
a.e. (µ).
(iv) Let {µn }n≥1 be a sequence of measures and {αn }n≥1 be a sequence
of
positive real numbers, i.e., αn > 0 for all n ≥ 1. Define µ =
$∞
n=1 αn µn .
4.2 Signed measures
119
(a) Then, µ ; ν iff µn ; ν for each n ≥ 1 and in this case,
∞
dµ )
dµn
=
αn
dν
dν
n=1
a.e. (ν).
(b) µ ⊥ ν iff µn ⊥ ν for all n ≥ 1.
4.2 Signed measures
Let µ1 and µ2 be two finite measures on a measurable space (Ω, F). Let
ν(A) ≡ µ1 (A) − µ2 (A),
for all
A ∈ F.
(2.1)
Then ν : F → R satisfies the following:
(i) ν(∅) = 0.
(ii) $
For any {An }n≥1 ⊂ F with Ai ∩ Aj = ∅ for i )= j, and with
∞
i=1 |ν(Ai )| < ∞,
∞
)
ν(A) =
ν(Ai ).
(2.2)
i=1
(iii) Let
7ν7
≡ sup
/)
∞
i=1
|ν(Ai )| : {An }n≥1 ⊂ F, Ai ∩ Aj = ∅ for
i )= j,
*
n≥1
0
An = Ω .
(2.3)
Then, 7ν7 is finite.
Note that (iii) holds because 7ν7 ≤ µ1 (Ω) + µ2 (Ω) < ∞.
Definition 4.2.1: A set function ν : F → R satisfying (i), (ii), and (iii)
above is called a finite signed measure.
The above example shows that the difference of two finite measures is
a finite signed measure. It will be shown below that every finite signed
measure can be expressed as the difference of two finite measures.
Proposition 4.2.1: Let ν be a finite signed measure on (Ω, F). Let
/)
∞
|ν(An )| : {An }n≥1 ⊂ F, Ai ∩ Aj = ∅ for i )= j,
|ν|(A) ≡ sup
n=1
*
n≥1
0
An = A .
(2.4)
120
4. Differentiation
Then |ν|(·) is a finite measure on (Ω, F).
Proof: That |ν|(Ω) < ∞ follows from part (iii) of the definition. Thus it is
enough to verify that |ν| is countably #
additive. Let {An }n≥1 be a countable
family of disjoint sets in F. Let A = n≥1 An . By the definition of |ν|, for
all # > 0 and n ∈ N, there
exists a countable
#
$∞family {Anj }j≥1 of disjoint
sets in F with An = j≥1 Anj such that j=1 |ν(Anj )| > |ν|(An ) − 2$n .
Hence,
∞
∞ )
∞
)
)
|ν(Anj )| >
|ν|(An ) − #.
n=1 j=1
n=1
Note#
that {Anj }n≥1,j≥1
#
#is a countable family of disjoint sets in F such that
A = n≥1 An = n≥1 j≥1 Anj . It follows from the definition of |ν| that
|ν|(A) ≥
∞ )
∞
)
n=1 j=1
|ν(Anj )| >
∞
)
n=1
|ν|(An ) − #.
Since this is true for for all # > 0, it follows that
|ν|(A) ≥
∞
)
n=1
|ν|(An ).
(2.5)
To get the opposite inequality,
let {Bj }j≥1
# be a countable family of disjoint
#
B
=
A
=
sets
in
F
such
that
j
j≥1
n≥1 An . Since Bj = Bj ∩ A =
#
(B
∩
A
)
and
ν
satisfies
(2.2),
j
n
n≥1
ν(Bj ) =
∞
)
n=1
Thus
∞
)
j=1
|ν(Bj )|
ν(Bj ∩ An )
≤
=
∞ )
∞
)
j=1 n=1
∞
∞ )
)
n=1 j=1
for all j ≥ 1.
|ν(Bj ∩ An )|
|ν(Bj ∩ An )|.
(2.6)
Note that for each An , {Bj ∩#An }j≥1 is a countable family of disjoint
sets in F such that An = j≥1 (Bj ∩ An ). Hence from (2.4), it fol$∞
$∞
lows that |ν|(An ) ≥
|ν(Bj ∩ An )| and hence,
|ν|(An ) ≥
j=1
$∞ $∞
$n=1
∞
j=1 |ν(Bj ∩ An )|. From (2.6), it follows that
n=1 |ν|(An ) ≥
$n=1
∞
|ν(B
)|.
This
being
true
for
every
such
family
{B
}
j
j
j≥1 , it follows
j=1
from (2.4) that
∞
)
|ν|(A) ≤
|ν|(Ai )
(2.7)
i=1
4.2 Signed measures
121
and with (2.5), this completes the proof.
!
Definition 4.2.2: The measure |ν| defined by (2.4) is called the total
variation measure of the signed measure ν.
Next, define the set functions
ν+ ≡
|ν| + ν
,
2
ν− ≡
|ν| − ν
.
2
(2.8)
It can be verified that ν + and ν − are both finite measures on (Ω, F).
Definition 4.2.3: The measures ν + and ν − are called the positive and
negative variation measures of the signed measure ν, respectively.
It follows from (2.8) that
ν = ν+ − ν−.
(2.9)
Thus every finite signed measure ν on (Ω, F) is the difference of two finite
measures, as claimed earlier.
Note that both ν + and ν − are dominated by |ν| and all three measures
are finite. By the Radon-Nikodym theorem (Theorem 4.1.1), there exist
functions h1 and h2 in L1 (Ω, F, |ν|) such that
dν +
= h1
d|ν|
and
dν −
= h2 .
d|ν|
This and (2.9) imply that for any A in F,
ν(A) =
%
A
h1 d|ν| −
%
A
h2 d|ν| =
%
(2.10)
hd|ν|,
(2.11)
A
where h = h1 − h2 . Thus every finite signed measure ν on (Ω, F) can be
expressed as
%
f dµ, A ∈ F
(2.12)
ν(A) =
A
for some finite measure µ on (Ω, F) and some f ∈ L1 (Ω, F, µ).
Conversely, it is easy to verify that a set function ν defined by (2.12) for
some finite measure µ on (Ω, F) and some f ∈ L1 (Ω, F, µ) is a finite signed
measure (cf. Problem 4.6). This leads to the following:
Theorem 4.2.2:
(i) A set function ν on a measurable space (Ω, F) is a finite signed measure iff there exist two finite measures µ1 and µ2 on (Ω, F) such that
ν = µ1 − µ2 .
122
4. Differentiation
(ii) A set function ν on a measurable space (Ω, F) is a finite signed
measure iff there exist a finite measure µ on (Ω, F) and an f ∈
L1 (Ω, F, µ) such that for all A in F,
%
f dµ.
ν(A) =
A
Definition 4.2.4: Let ν be a finite signed measure on a measurable space
on (Ω, F). A set A ∈ F is called a positive set for ν if for any B ⊂ A, B ∈ F,
ν(B) ≥ 0. A set A ∈ F is called a negative set for ν if for any B ⊂ A with
B ∈ F, ν(B) ≤ 0.
Let h be as in (2.11). Let
Ω+ = {ω : h(ω) ≥ 0} and
Ω− = {ω : h(ω) < 0}.
(2.13)
From (2.11), it follows that for all A in F, ν(A∩Ω+ ) ≥ 0 and ν(A∩Ω− ) ≤ 0.
Thus Ω+ is a positive set and Ω− is a negative set for ν. Furthermore,
Ω+ ∪ Ω− = Ω and Ω+ ∩ Ω− = ∅. Summarizing this discussion, one gets the
following theorem.
Theorem 4.2.3: (Hahn decomposition theorem). Let ν be a finite signed
measure on a measurable space (Ω, F). Then there exist a positive set Ω+
and a negative set Ω− for ν such that Ω = Ω+ ∪ Ω− and Ω+ ∩ Ω− = ∅.
Let Ω+ and Ω− be as in (2.13). It can be verified (Problem 4.8) that for
any B ∈ F, if B ⊂ Ω+ , then ν(B) = |ν|(B). By (2.11), this implies that
for all A in F,
%
A∩Ω+
hd|ν| = |ν|(A ∩ Ω+ ).
It follows that h = 1 a.e. (|ν|) on Ω+ . Similarly, h = −1 a.e. (|ν|) on Ω− .
Thus, the measures ν + and ν − , defined in (2.8), reduce to
%
(1 + h)
ν + (A) =
d|ν|
2
%
%A
(1 + h)
(1 + h)
d|ν| +
|ν|
=
2
d
A∩Ω+
A∩Ω−
= |ν|(A ∩ Ω+ ),
and similarly
ν − (A) = |ν|(A ∩ Ω− ).
Note that ν + and ν − are both finite measures that are mutually singular.
This particular decomposition of ν as
ν = ν+ − ν−
4.2 Signed measures
123
is known as the Jordan decomposition of ν. It will now be shown that
this decomposition is minimal and that it is unique in the class of signed
measures with mutually singular components. Suppose there exist two finite
measures µ1 and µ2 on (Ω, F) such that ν = µ1 − µ2 . For any A ∈ F,
ν + (A) = ν(A ∩ Ω+ ) = µ1 (A ∩ Ω+ ) − µ2 (A ∩ Ω+ ) ≤ µ1 (A ∩ Ω+ ) ≤ µ1 (A) and
ν − (A) = −ν(A ∩ Ω− ) = µ2 (A ∩ Ω− ) − µ1 (A ∩ Ω− ) ≤ µ2 (A ∩ Ω− ) ≤ µ2 (A).
Thus, ν + ≤ µ1 and ν − ≤ µ2 . Clearly, since both µ1 and ν + are finite
measures on (Ω, F), µ1 − ν + is also a finite measure. Similarly, µ2 − ν − is
also a finite measure. Also, since µ1 − µ2 = ν = ν + − ν − , it follows that
µ1 −ν + = µ2 −ν − = λ, say. Thus, for any decomposition of ν as µ1 −µ2 with
µ1 , µ2 finite measures, it holds that µ1 = ν + + λ and µ2 = ν − + λ, where
λ is a measure on (Ω, F). Thus, ν = ν + − ν − is a minimal decomposition
in the sense that in this case λ = 0. Now suppose µ1 and µ2 are mutually
singular, i.e., there exist Ω1 , Ω2 ∈ F such that Ω1 ∩ Ω2 = ∅, Ω1 ∪ Ω2 = Ω,
and µ1 (Ω2 ) = 0 = µ2 (Ω1 ). Since µ1 ≥ λ and µ2 ≥ λ, it follows that
λ(Ω2 ) = 0 = λ(Ω1 ). Thus λ = 0 and µ1 = ν + and µ2 = ν − .
Summarizing the above discussion yields:
Theorem 4.2.4: Let ν be a finite signed measure on a measurable space
(Ω, F) and let µ1 and µ2 be two finite measures on (Ω, F) such that ν =
µ1 − µ2 . Then there exists a finite measure λ such that µ1 = ν + + λ and
µ2 = ν − + λ with λ = 0 iff µ1 and µ2 are mutually singular.
Let
S ≡ {ν : ν
is a finite signed measure on
(Ω, F)}.
Also, for any α ∈ R, let α+ = max(α, 0) and α− = max(−α, 0). Note that
for ν1 , ν2 in S and α1 , α2 ∈ R,
α1 ν1 + α2 ν2
=
=
(α1+ − α1− )(ν1+ − ν1− ) + (α2+ − α2− )(ν2+ − ν2− )
(α1+ ν1+ + α1− ν1− + α2+ ν2+ + α2− ν2− )
− (α1+ ν1− + α1− ν1+ + α2+ ν2− + α2− ν2+ )
= λ1 − λ2 ,
say,
where λ1 and λ2 are both finite measures. It now follows from Theorem
4.2.2 that α1 ν1 + α2 ν2 ∈ S. Thus, S is a vector space over R.
Now it will be shown that 7ν7 ≡ |ν|(Ω) is a norm on S and that (S, 7 · 7)
is a Banach space.
Definition 4.2.5: For a finite signed measure ν on a measurable space
(Ω, F), the total variation norm ν is defined by 7ν7 ≡ |ν|(Ω).
Proposition 4.2.5: Let S ≡ {ν : ν a finite signed measure on (Ω, F)}.
Then, 7ν7 ≡ |ν|(Ω), ν ∈ S is a norm on S.
124
4. Differentiation
Proof: Let ν1 , ν2 ∈ S, α1 , α2 ∈#R and λ = α1 ν1 + α2 ν2 . For any A ∈ F
and any {Ai }i≥1 ⊂ F with A = i≥1 Ai ,
|λ(Ai )| ≤ |α1 ||ν1 (Ai )| + |α2 ||ν2 (Ai )| for all i ≥ 1
)
)
)
|λ(Ai )| ≤ |α1 |
|ν1 (Ai )| + |α2 |
|ν2 (Ai )|
⇒
i≥1
i≥1
i≥1
≤ |α1 ||ν1 |(A) + |α2 ||ν2 |(A).
Taking supremum over all {Ai }i≥1 yields,
|λ|(A) ≤ |α1 ||ν1 |(A) + |α2 ||ν2 |(·)
i.e.,
⇒
|λ|(·) ≤ |α1 | |ν1 |(·) + |α2 ||ν2 |(·),
7λ7 ≡ |λ|(Ω) ≤ |α1 | |ν1 |(Ω) + |α2 | |ν2 |(Ω)
= |α1 |7ν1 7 + |α2 |7ν2 7.
Taking α1 = α2 = 1 yields
7ν1 + ν2 7 ≤ 7ν1 7 + 7ν2 7,
i.e., the triangle inequality holds.
Next taking α2 = 0 yields 7α1 ν1 7 ≤ |α1 |7ν1 7. To get the opposite inequality, note that for α1 )= 0, ν1 = α11 α1 ν1 and so 7ν1 7 ≤ | α11 |7α1 ν1 7.
Hence, |α1 | 7ν1 7 ≤ 7α1 ν1 7. Thus, for any α1 )= 0, 7α1 ν7 = |α1 |7ν7. Finally, 7ν7 = 0 ⇒ |ν|(Ω) = 0 ⇒ |ν|(A) = 0 for all A ∈ F ⇒ ν(A) =
0 for all A ∈ F, i.e., ν is the zero measure. Thus 7 · 7 is a norm on S. !
Proposition 4.2.6: (S, 7 · 7) is complete.
Proof: Let {νn }n≥1 be a Cauchy sequence in (S, 7 · 7). Note that for each
A ∈ F, |νn (A) − νm (A)| ≤ |νn − νm |(A) ≤ 7νn − νm 7. Hence, for each
A ∈ F, {νn (A)}n≥1 is a Cauchy sequence in R and hence
ν(A) ≡ lim νn (A)
n→∞
exists.
It will be shown that ν(·) is a finite signed measure and 7νn #
− ν7 → 0 as
n → ∞. Let {Ai }i≥1 ⊂ F, Ai ∩ Aj = ∅ for i )= j, and A = i≥1 Ai . Let
xn ≡ {νn (Ai )}i≥1 , n ≥ 1, and let x$
0 ≡ {ν(Ai )}i≥1 . Note that each xn ∈ 21
where 21 ≡ {x : x = {xi }i≥1 ∈ R,
i≥1 |xi | < ∞}. For x ∈ 21 , let 7x71 =
$
$∞
|x
|.
Then
7x
−
x
7
=
|ν
i
n
m
i≥1 n (Ai ) − νm (Ai )| ≤ |νn − νm |(A) ≤
i=1
|νn − νm |(Ω) → 0 as n, m → ∞. But 21 is complete under 7 · 71 . So there
) → ν(Ai ) for all
exists x∗ ∈ 21 such that 7xn −x∗ 71 → 0. Since xni ≡ νn (A
$i ∞
)
for
all
i
≥
1
and
that
i ≥ 1, it follows that x∗i = ν(A
i
i=1 |ν(Ai )| < ∞.
$∞
$
Also, for $
all n ≥ 1, νn (A) $
= i=1 νn (Ai ). Since i≥1 |νn (Ai )−ν(Ai )| → 0,
νn (A) ≡ i≥1 νn (Ai ) → i≥1 ν(Ai ) as n → ∞. But νn (A) → ν(A). Thus,
4.3 Functions of bounded variation
ν(A) =
$∞
i=1
125
ν(Ai ). Also for any countable partition {Ai }i≥1 ⊂ F of Ω,
∞
)
i=1
|ν(Ai )| = lim
n→∞
∞
)
i=1
|νn (Ai )| ≤ lim 7νn 7 < ∞.
n→∞
Thus |ν|(Ω) < ∞ and hence, ν ∈ S. Finally,
7νn − ν7
=
sup
/)
∞
i=1
|νn (Ai ) − ν(Ai )| : {Ai }i≥1 ⊂ F
is a disjoint partition of
0
Ω .
But for every countable partition {Ai }i≥1 ⊂ F,
∞
)
i=1
|νn (Ai ) − ν(Ai )| = lim
m→∞
∞
)
i=1
|νn (Ai ) − νm (Ai )| ≤ lim 7νn − νm 7.
m→∞
Thus, 7νn − ν7 ≤ limm→∞ 7νn − νm 7 and hence, limn→∞ 7νn − ν7 ≤
!
limn→∞ limm→∞ 7νn − νm 7 = 0. Hence, νn → ν in S.
Definition 4.2.6: (Integration w.r.t. signed measures). Let µ be a finite
signed measure on a measurable space (Ω, F) and |µ| be its total variation
&
measure as in Definition 4.2.2. Then, for any f ∈ L1 (Ω, F, |µ|), f dµ is
defined as
%
%
%
+
f dµ = f dµ − f dµ− ,
where µ+ and µ− are the positive and negative variations of µ as defined
in (2.8).
Proposition 4.2.7: Let µ be a signed measure on a measurable space
(Ω, F, P ). Let µ = λ1 − λ2 where λ1 and λ2 are finite measures. Let
f ∈ L1 (Ω, F, λ1 + λ2 ). Then f ∈ L1 (Ω, F, |µ|) and
%
%
%
f dµ = f dλ1 − f dλ2 .
(2.14)
Proof: Left as an exercise (Problem 4.13).
4.3 Functions of bounded variation
From the construction of the Lebesgue-Stieltjes measures on (R, B(R)) discussed in Chapter 1, it is seen that to every nondecreasing right continuous
function F : R → R, there is a (Radon) measure µF on (R, B(R)) such that
126
4. Differentiation
µF ((a, b]) = F (b) − F (a) for all a < b and conversely. If µ1 and µ2 are two
Radon measures and µ = µ1 − µ2 , let

for x > 0,
 µ((0, x])
−µ((x, 0])
for x < 0,
G(x) ≡

0
for x = 0.

for x > 0,
 F1 (x) − F2 (x) − (F1 (0) − F2 (0))
(F1 (0) − F2 (0)) − (F1 (x) − F2 (x))
for x < 0,
=

0
for x = 0.
Thus to every finite signed measure µ on (R, B(R)), there corresponds a
function G(·) that is the difference of two right continuous nondecreasing
and bounded functions. The converse is also easy to establish. A characterization of such a function G(·) without any reference to measures is given
below.
Definition 4.3.1: Let f : [a, b] → R, where −∞ < a < b < ∞. Then for
any partition Q = {a = x0 < x1 < x2 < . . . < xn = b}, n ∈ N, the positive,
negative and total variations of f with respect to Q are respectively defined
as
P (f, Q)
≡
N (f, Q) ≡
T (f, Q) ≡
n
)
i=1
n
)
i=1
n
)
i=1
(f (xi ) − f (xi−1 ))+
(f (xi ) − f (xi−1 ))−
|f (xi ) − f (xi−1 )|.
It is easy to verify that (i) if f is nondecreasing, then
P (f, Q) = T (f, Q) = f (b) − f (a)
and N (f, Q) = 0
and that (ii) for any f ,
P (f, Q) + N (f, Q) = T (f, Q).
Definition 4.3.2: Let f = [a, b] → R, where −∞ < a < b < ∞. The positive, negative and total variations of f over [a, b] are respectively defined
as
P (f, [a, b]) ≡ sup P (f, Q)
Q
N (f, [a, b]) ≡ sup N (f, Q)
Q
T (f, [a, b]) ≡ sup T (f, Q),
Q
4.3 Functions of bounded variation
127
where the supremum in each case is taken over all finite partitions Q of
[a, b].
Definition 4.3.3: Let f : [a, b] → R, where −∞ < a < b < ∞. Then, f
is said to be of bounded variation on [a, b] if T (f, [a, b]) < ∞. The set of all
such functions is denoted by BV [a, b].
As remarked earlier, if f is nondecreasing, then T (f, Q) = f (b) − f (a) for
each Q and hence T (f, [a, b]) = f (b) − f (a). It follows that if f = f1 − f2 ,
where both f1 and f2 are nondecreasing, then f ∈ BV [a, b]. A natural
question is whether the converse is true. The answer is yes, as shown by
the following result.
Theorem 4.3.1: Let f ∈ BV [a, b]. Let f1 (x) ≡ P (f, [a, x]) and f2 (x) ≡
N (f, [a, x]). Then f1 and f2 are nondecreasing in [a, b] and for all a ≤ x ≤
b,
f (x) = f1 (x) − f2 (x)
Proof: That f1 and f2 are nondecreasing follows from the definition. It is
enough to verify that if f ∈ BV [a, b], then
f (b) − f (a) = P (f, [a, b]) − N (f, [a, b]),
as this can be applied to [a, x] for a ≤ x < b. For each finite partition Q of
[a, b],
f (b) − f (a)
=
n
)
i=1
(f (xi ) − f (xi−1 ))
= P (f, Q) − N (f, Q).
Thus P (f, Q) = f (b) − f (a) + N (f, Q). By taking supremum over all finite
partitions Q, it follows that
P (f, [a, b]) = f (b) − f (a) + N (f, [a, b]).
If f ∈ BV [a, b], this yields f (b) − f (a) = P (f, [a, b]) − N (f, [a, b]).
!
Remark 4.3.1: Since T (f, Q) = P (f, Q) + N (f, Q) = 2P (f, Q) − (f (b) −
f (a)), it follows that if f ∈ BV [a, b], then
T (f, [a, b])
= 2P (f, [a, b]) − (f (b) − f (a))
= P (f, [a, b]) + N (f, [a, b]).
Corollary 4.3.2: A function f ∈ BV [a, b] iff there exists a finite signed
measure µ on (R, B(R)) such that f (x) = µ([a, x]), a ≤ x ≤ b.
The proof of this corollary is left as an exercise.
128
4. Differentiation
Remark 4.3.2: Some observations on functions of bounded variations are
listed below.
(a) Let f = IQ where Q is the set of rationals. Then for any [a, b], a <
b, P (f, [a, b]) = N (f, [a, b]) = ∞ and so f ∈
/ BV [a, b]. This holds for
f = ID for any set D such that both D and Dc are dense in R.
(b) Let f be Lipschitz on [a, b]. That is, |f (x) − f (y)| ≤ K|x − y| for all
x, y in [a, b] where K ∈ (0, ∞) is a constant. Then, f ∈ BV [a, b].
(c) Let f be differentiable in (a, b) and continuous on [a, b] and f $ (·) be
bounded in (a, b). Then by the mean value theorem, f is Lipschitz
and hence, f is in BV [a, b].
(d) Let f (x) = x2 sin x1 , 0 < x ≤ 1, and let f (0) = 0. Then f is continuous on [0, 1], differentiable on (0, 1) with f $ bounded on (0, 1), and
hence f ∈ BV [0, 1].
(e) Let g(x) = x2 sin x12 , 0 < x ≤ 1, g(0) = 0. Then g is continuous
on [0, 1], differentiable on (0, 1) but g $ is not bounded on (0, 1). This
by itself does not imply that g ∈
/ BV [0, 1], since being Lipschitz is
only a sufficient condition. But it turns out that g ∈
/ BV [0, 1]. To see
P
n
$
1
this, let xn = (2n+1)
|g(xi ) − g(xi−1 )| ≥
π , n = 0, 1, 2 . . .. Then
2
n
$
i=1
1
(2i+1) π
2
i=1
and hence T (g, [0, 1]) = ∞.
(f) It is known (see Royden (1988), Chapter 4) that if f : [a, b] → R
then f is differentiable a.e. (m) on (a, b) and
&is nondecreasing,
f $ dm ≤ f (b) − f (c), where (m) denotes the Lebesgue measure.
[a,b]
This implies that
& if f ∈ BV [a, b], then f is differentiable a.e. (m) on
(a, b) and so, [a,b] |f $ |dm ≤ T (f, [a, b]).
4.4 Absolutely continuous functions on R
Definition 4.4.1: A function F : R → R is absolutely continuous (a.c.)
if for all # > 0, there exists δ > 0 such that if Ij = [aj , bj ], j = 1, 2, . . . , k
$k
$k
(k ∈ N) are disjoint and j=1 (bj − aj ) < δ, then j=1 |F (bj ) − F (aj )| < #.
By the mean value theorem, it follows that if F is differentiable and F $ (·)
is bounded, then F is a.c. Also note that F is a.c. implies F is uniformly
continuous.
4.4 Absolutely continuous functions on R
129
Definition 4.4.2: A function F : [a, b] → R is absolutely continuous if the
function F̃ , defined by

if a ≤ x ≤ b,
 F (x)
F (a)
if x < a,
F̃ (x) =

F (b)
if x > b,
is absolutely continuous.
Thus F (x) = x is a.c. on R. Any polynomial is a.c. on any bounded
interval but not necessarily on all of R. For example, F (x) = x2 is a.c. on
any bounded interval but not a.c. on R, since it is not uniformly continuous
on R.
The main result of this section is the following result due to H. Lebesgue,
known as the fundamental theorem of Lebesgue integral calculus.
Theorem 4.4.1: A function F : [a, b] → R is absolutely continuous iff
there is a function f : [a, b] → R such that f is Lebesgue measurable and
integrable w.r.t. m and such that
%
f dm, for all a ≤ x ≤ b
(4.1)
F (x) = F (a) +
[a,x]
where m is the Lebesgue measure.
Proof:
First consider the “if part.” Suppose that (4.1) holds. Since
&
|f |dm < ∞, for any # > 0, there exists a δ > 0 such that (cf. Proposi[a,b]
tion 2.5.8).
%
m(A) < δ ⇒
A
(4.2)
|f |dm < #.
Thus, if Ij = (aj , bj ), ⊂ [a, b], j = 1, 2, . . . , k are such that
δ, then
k
)
j=1
|F (bj ) − F (aj )| ≤
%
!k
j=1
Ij
$k
j=1 (bj − aj )
<
|f |dm < #,
#k
$k
since m( j=1 Ij ) ≤ j=1 (bj − aj ) < δ and (4.2) holds. Thus, F is a.c.
Next consider the “only if part.” It is not difficult to verify (Problem
4.18) that F a.c. implies F is of bounded variation on any finite interval
[a, b] and both the positive and the negative variations of F on [a, b] are
a.c. as well. Hence, it suffices to establish (4.1) assuming that F is a.c. and
nondecreasing. Let µF be the Lebesgue-Stieltjes measure generated by F̃
as in Definition 4.4.2. It will now be shown that µF is absolutely continuous
w.r.t. the Lebesgue measure. Fix # > 0. Let δ > 0 be chosen so that
(aj , bj ) ⊂ [a, b], j = 1, 2, . . . , k,
k
)
j=1
(bj − aj ) < δ ⇒
k
)
j=1
|F (bj ) − F (aj )| < #.
130
4. Differentiation
Let A ∈ Mm , A ⊂ (a, b), and m(A) = 0. Then, there exist a countable
collection of disjoint open intervals {Ij = (aj , bj ) : Ij ⊂ [a, b]}j≥1 such that
)
*
Ij and
(bj − aj ) < δ.
A⊂
j≥1
j≥1
Thus
-
µF A ∩
k
*
j=1
Ij
.
≤ µF
≤
-*
k
k
)
i=1
Ij
.
µF (Ij ) =
j=1
k
)
j=1
(F (bj ) − F (aj )) < #
for all k ∈ N. #
Ij , by the m.c.f.b. property of µF (·), µF (A) =
Since A ⊂
#j≥1
k
limk→∞ µF (A ∩ j=1 Ij ) ≤ #. This being true for any # > 0, it follows
! that "µF (A) = 0. Since F is continuous, µF ({a, b}) = 0 and hence
µF (a, b)c = 0. Thus, µF ; m, i.e., µF is dominated by m. Now,
by the Radon-Nikodym theorem (cf. Theorem 4.1.1 (ii)), there exists
a nonnegative
measurable function f such that A ∈ Mm implies that
&
µF (A) = A∩[a,b] f dm and, in particular, for a ≤ x ≤ b,
%
µF ([a, x]) = F (x) − F (a) =
f dm,
[a,x]
i.e., (4.1) holds.
!
The representation (4.1) of an absolutely continuous F can be strengthened as follows:
Theorem 4.4.2: Let F : R → R satisfy (4.1). Then F is differentiable
a.e. (m) and F $ (·) = f (·) a.e. (m).
For a proof of this result, see Royden (1988), Chapter 4.
The relation between the notion of absolute continuity of a distribution
function F : R → R and that of the associated Lebesgue-Stieltjes measure
µF w.r.t. Lebesgue measure m will be discussed now.
Let F : R → R be a distribution function, i.e., F is nondecreasing and
right continuous. Let µF be the associated Lebesgue-Stieltjes measure such
that µF ((a, b]) = F (b) − F (a) for all −∞ < a < b < ∞. Recall that F
is said to be absolutely continuous on an interval [a, b] if for each # > 0,
there exists a δ > 0 such that for any finite collection of intervals Ij =
(aj , bj ), j = 1, 2, . . . , n, contained in [a, b],
n
)
j=1
(bj − aj ) < δ
⇒
n
)
j=1
(F (bj ) − F (aj )) < #.
4.4 Absolutely continuous functions on R
131
Recall also that µF is absolutely continuous w.r.t. the Lebesgue measure m
if for A ∈ B(R), m(A) = 0 ⇒ µF (A) = 0. A natural question is that if
F is absolutely continuous on every interval [a, b] ⊂ R, is µF absolutely
continuous w.r.t. m and conversely? The answer is yes.
Theorem 4.4.3: Let F : R → R be a nondecreasing function and let µF be
the associated Lebesgue-Stieltjes measure. Then F is absolutely continuous
on [a, b] for all −∞ < a < b < ∞ iff µF ; m where m is the Lebesgue
measure on (R, B(R)).
Proof: Suppose that µF ; m. Then by Theorem 4.1.1, there exists a
nonnegative measurable function h such that
%
hdm for all A in B(R).
µF (A) =
A
Hence, for any a < b in R and a < x < b,
F (x) − F (a) ≡ µF ((a, x]) =
%
(a,x]
hdm.
This implies the absolute continuity of F on [a, b] as shown in Theorem
4.4.1.
Conversely, if F is absolutely continuous on [a, b] for all −∞ < a < b <
∞, then as shown in the proof of the “only if” part of Theorem 4.4.1, for
all −∞ < a < b < ∞, then µF (A ∩ [a, b]) = 0 if m(A ∩ [a, b]) = 0. Thus,
if m(A) = 0, then for all −∞ < a < b < ∞, m(A ∩ [a, b]) = 0 and hence
!
µF (A ∩ [a, b]) = 0 and hence µF (A) = 0, i.e., µF ; m.
Recall that a measure µ on (Rk , B(Rk )) is a Radon measure if µ(A) <
∞ for every bounded Borel set A. In the following, let m(·) denote the
Lebesgue measure on Rk .
Definition 4.4.3: A Radon measure µ on (Rk , B(Rk )) is differentiable at
x ∈ Rk with derivative (Dµ)(x) if for any # > 0, there is a δ > 0 such that
J
J µ(A)
J
J
− (Dµ)(x)J < #
J
m(A)
for every open ball A such that x ∈ A and diam. (A) ≡ sup{7x − y7 : x, y ∈
A}, the diameter of A, is less than δ.
Theorem 4.4.4: Let µ be a Radon measure on (Rk , B(Rk )). Then
(i) µ is differentiable a.e. (m), Dµ(·) is Lebesgue measurable, and ≥ 0
a.e. (m) and for all bounded Borel sets A ∈ B(Rk ),
%
Dµ(·)dm ≤ µ(A).
A
132
4. Differentiation
&
(ii) Let µa (A) ≡ A Dµ(·)dm, A ∈ B(Rk ). Let µs (·) be the unique measure
on B(Rk ) such that for all bounded Borel sets A
µs (A) = µ(A) − µa (A).
Then
µs ⊥ m and Dµs (·) = 0
a.e. (m).
For a proof, see Rudin (1987).
Remark 4.4.1: By the uniqueness of the Lebesgue decomposition, it
follows that a Radon measure
µ on B(Rk ) is ⊥ m iff Dµ(·) = 0 a.e. (m)
&
and is ; m iff µ(A) = A Dµ(·)dm for all A ∈ B(Rk ).
k
integrable w.r.t. m on bounded sets. Let µ(A) ≡
& Let f : R → R+ be
k
f
dm
for
A
∈
B(R
).
Then µ(·) is a Radon measure and that is ; m
A
and hence by Theorem 4.4.4
Dµ(x) = f (x) for almost all
x(m).
That is, for almost all x(m), for each # > 0, there is a δ > 0 such that
J 1 %
J
J
J
f dm − f (x)J < #
J
m(A) A
for all open balls A such that x ∈ A and diam. (A) < δ.
It turns out that a stronger result holds.
Theorem 4.4.5: For almost all x(m), for each # > 0, there is a δ > 0
such that
%
1
|f − f (x)|dm < #
m(A) A
for all open balls A such that x ∈ A and diam. (A) < δ (see Problems 4.23,
4.24).
Theorem 4.4.6: (Change of variables formula in Rk , k > 1). Let V be
an open set in Rk . Let T ≡ (T1 , T2 , . . . , Tk ) be a map from Rk → Rk such
i (·)
that for each i, Ti : Rk → R and ∂T
∂xj exists on V for all 1 ≤ i, j ≤ k.
!! i (·) ""
Suppose that the Jacobian JT (·) ≡ det ∂T
is continuous and positive
∂xj
on V . Suppose further that T (V ) is a bounded open set W in Rk and that
T is (1 − 1) and T −1 is continuous. Then
(i) For all Borel set E ⊂ V , T (E) is a Borel set ⊂ W .
(ii) ν(·) ≡ m(T (·)) is a measure on B(W ) and ν ; m with
dν(·)
= JT (·).
dm
4.5 Singular distributions
(iii) For any h ∈ L1 (W, m)
%
hdm =
W
%
V
133
!
"
h T (·) JT (·)dm.
(iv) λ(·) ≡ mT −1 (·) is a measure on B(W ) and λ ; m with
!
"
dλ
= |J T −1 (·) |−1 .
dm
(v) For any µ ; m on B(V ) the measure ψ(·) ≡ µT −1 (·) is dominated
by m with
' dµ (!
"' !
"(−1
dψ
(·) =
T −1 (·) JT T −1 (·)
dm
dm
on W.
For a proof see Rudin (1987), Chapter 7.
4.5 Singular distributions
4.5.1
Decomposition of a cdf
Recall that a cumulative distribution function (cdf) on R is a function
F : R → [0, 1] such that it is nondecreasing, right continuous, F (−∞) = 0,
F (∞) = 1. In Section 2.2, it was shown that any cdf F on R can be written
as
(5.1)
F = αFd + (1 − α)Fc ,
where Fd and Fc are discrete and continuous cdfs respectively. In this section, the cdf Fc will be further decomposed into a singular continuous and
absolutely continuous cdfs.
Definition 4.5.1: A cdf F is singular if F $ ≡ 0 almost everywhere w.r.t.
the Lebesgue measure on R.
Example 4.5.1: The cdfs of Binomial, Poisson, or any integer valued random variables are singular.
It is known (cf. Royden (1988), Chapter 5) that a monotone function
F : R → R is differentiable almost everywhere w.r.t. the Lebesgue measure
and its derivative F $ satisfies
% b
F $ (x)dx ≤ F (b) − F (a),
(5.2)
a
for any −∞ < a < b < ∞.
134
4. Differentiation
&x
For x ∈ R, let F̃ac (x) ≡ −∞ Fc$ (t)dt and F̃sc (x) ≡ Fc (x) − F̃ac (x). If
&∞
β̃ ≡ −∞ Fc$ (t)dt = F̃ac (∞) = 0, then Fc$ (t) = 0 a.e. and so Fc is singular
continuous. If β̃ = 1, then Fc = F̃ac and so, Fc is absolutely continuous. If
0 < α < 1 and 0 < β̃ < 1, then F can be written as
F = αFd + βFac + γFsc ,
(5.3)
where β = (1 − α)β̃, γ = (1 − α)(1 − β̃), Fac = β̃ −1 F̃ac , Fsc = (1 − β̃)−1 F̃sc ,
and Fd is as in (5.1). Note that Fd , Fac , Fsc are all cdfs and α, β, γ are
nonnegative numbers adding up to 1. Summarizing the above discussions,
one has the following:
Proposition 4.5.1: Given any cdf F , there exist nonnegative constants α,
β, γ and cdfs Fd , Fac , Fsc satisfying (a) α + β + γ = 1, and (b) Fd is
discrete, Fac is absolutely continuous, Fsc is singular continuous, such that
the decomposition (5.3) holds.
It can be shown that the constants α, β, and γ are uniquely determined,
and that when 0 < α < 1, the decomposition (5.1) is unique, and that when
0 < α, β, γ < 1, the decomposition (5.3) is unique. The decomposition
(5.3) also has a probabilistic interpretation. Any random variable X can
be realized as a randomized choice over three random variables Xd , Xac ,
and Xsc having cdfs Fd , Fac , and Fsc , respectively, and with corresponding
randomization probabilities α, β, and γ. For more details see Problem 6.15
in Chapter 6.
4.5.2
Cantor ternary set
Recall the construction of the Cantor set from Section 1.3.
Let I0 = [0, 1] denote the unit interval. If one deletes the open
B middle
C
third ofB I0 ,C then one gets two disjoint closed intervals I11 = 0, 13 and
I12 = 23 , 1 . Proceeding similarly withB theC closed intervals
B 2 1 C I11 and
B 2 I712C,
1
one gets
B four
C disjoint intervals I21 = 0, 9 , I22 = 9 , 3 , I23 = 3 , 9 ,
I24 = 89 , 1 , and so on. Thus, at each step, deleting the open middle third
of the closed intervals constructed in the previous step, one is left with 2n
#2n
disjoint closed intervals each of length 3−n after n steps. Let Cn = j=1 Inj
+∞
and C = n=1 Cn . By construction Cn+1 ⊂ Cn for each n and Cn ’s are
closed sets. With m(·)
! "ndenoting Lebesgue measure, one has m(C0 ) = 1 and
m(Cn ) = 2n 3−n = 23 .
+∞
Definition 4.5.2: The set C ≡ n=1 Cn is called the Cantor ternary set
or simply the Cantor set.
! "n
Since m(C0 ) = 1, by m.c.f.a. m(C) = limn→∞ m(Cn ) = limn→∞ 23 =
0. Thus,
! the" Cantor set C has zero Lebesgue measure. Next, let U1 =
U11 = 13 , 23 be the deleted interval at the first stage, U2 = U21 ∪ U22 =
4.5 Singular distributions
135
!1
" !7 8"
∪ 9 , 9 be the union of the deleted intervals at the second stage, and
"
#2n−1
#∞ ! #2n−1
similarly, Un = j=1 Unj at stage n. Thus C c = U = n=1
j=1 Unj
is open and m(C c ) = 1. Since C ∪ C c = [0, 1] and C c is open, it follows
that C is nonempty. In fact, C is uncountably infinite as will be shown
now. To do this, one needs the concept of p-nary expansion of numbers in
[0,1]. Fix a positive integer p > 1. For each x in [0,1), let a1 (x) = <px=
where <t= = n if n ≤ t < n + 1. Thus a1 (x) ≤ px < a1 (x) + 1 and
a1 (x) ∈ {0, 1, . . . , p − 1}, i.e., a1p(x) ≤ x < a1p(x) + p1 . Thus, if kp ≤ x < k+1
p
2
9, 9
for some k = 0, 1, 2, . . . , p − 1, then a1 (x) = k. Next, let x1 ≡ x − a1p(x) and
"
B
a2 (x) = <p2 x1 =. Then, x1 ∈ 0, p1 and
a2 (x)
a1 (x)
1
a2 (x)
<
≤ x1 = x −
+ 2
2
2
p
p
p
p
and a2 (x) ∈ {0, 1, 2, p − 1}. Next, let 0 ≤ x2 ≡ x − a1p(x) −
a3 (x) = <p3 x2 = and so on. After k such iterations one gets
0≤x−
k
)
ai (x)
i=1
pi
<
∞
)
ai (x)
i=1
pi
<
1
p2
and
1
pk
where ai (x) ∈ {0, 1, 2, . . . , p − 1} for all i. Since
the p-nary expansion of x in [0,1) as
x=
a2 (x)
p2
1
pk
→ 0 as k → ∞, one gets
, ai (x) ∈ {0, 1, 2, . . . , p − 1}.
(5.4)
Notice that if x = kp for some k ∈ {0, 1, 2, . . . , p − 1}, then in the above
expansion a1 (x) = k and ai (x) = 0 for i ≥ 2, and the expansion terminates.
$∞
$∞ (p−1)
But since i=1 (p−1)
= 1, one may also write x = kp = k−1
i=1 pi ,
pi
p +
this being an expansion which is nonterminating and recurring. It can be
shown that for all x in [0,1) of the form x = p,m for some positive integers 2
and m, there are exactly two expansions such that one terminates and the
other is nonterminating and recurring. For all other x in [0,1), the p-nary
expansion is nonterminating and nonrecurring and is unique.
The decimal expansion corresponds with the case p = 10, the binary
expansion with the case p = 2, and the ternary expansion with the case
p = 3. Here the convention of choosing only a nonterminating expansion
for $
each x in [0,1) is used.
for p = 3, 13 will be replaced
! 1 " Thus, for
! 1 example,
"
∞ 2
by i=2 3i so that a1 3 = 0, ai 3 = 2 for i ≥ 2. Similarly, for
! 7 p" = 3,
$∞ 2
7
2
1
2
0
x=
=
+
will
be
replaced
by
+
+
so
that
a
2
i
1
i=3 3
3
3
3
9 = 2,
! "9
!9 "
a2 79 = 0, ai 79 = 2 for i ≥ 3. By taking p = 2, i.e., the binary expansion,
$∞
it is seen that every x in [0,1) can be uniquely represented as x = i=1 δi2(x)
i
136
4. Differentiation
where δi (x) ∈ {0, 1} for all i. Thus, the interval [0,1) is in one-to-one
correspondence with the set of all sequences of 0’s and 1’s.
It is not difficult to prove the following (Problem 4.31).
Theorem 4.5.2: A number x belongs to the Cantor set C iff in (5.4), for
all i ≥ 1, ai (x) is either 0 or 2.
Corollary 4.5.3: The Cantor set C is in one-to-one correspondence with
the set of all sequences of 0’s and 1’s and hence is in one-to-one correspondence with the unit interval [0,1].
Remark 4.5.1: Thus the Cantor ternary set C is a closed subset of [0,1],
its Lebesgue measure m(C) = 0, and its cardinality is the same as that of
[0,1]. Further, it is nowhere dense, i.e., its complement U is dense in the
sense that for every open interval (a, b) ⊂ [0, 1], U ∩ (a, b) is nonempty.
It is also possible to get a Cantor like set Cα with (Lebesgue) measure α,
0 < α < 1, by following the above iterative procedure of deleting at each
stage intervals of length that is a fraction (1−α)
of the full interval (Problem
3
4.32).
4.5.3
Cantor ternary function
The Cantor ternary function F : [0, 1] → [0, 1] is defined as follows: For
n ≥ 1, let {Unj : j = 1, . . . , 2n−1 } denote the set of “deleted” intervals at
step n in the definition of the Cantor set C. Define F on C c = U by
F (x)
=
=
=
1
2
1
4
3
4
'1 2(
,
3 3
'1 2(
,
=
9 9
'7 8(
,
=
9 9
on
U11 =
on
U21
on
U22
and
and so on. It can be checked that F is uniformly continuous on U and has
a continuous extension to I0 = [0, 1]. The extension of the function F (also
denoted by F ) maps [0,1] onto [0,1] and is continuous and nondecreasing.
Further, on U , it is differentiable with derivative F $ ≡ 0. Since m(U c ) =
0, F is a singular cdf (cf. see Definition 4.5.1). It can be shown that if
$∞ ai (x)
is the ternary expansion of x ∈ (0, 1), then
i=1 3i
N (x)−1
F (x) =
) ai (x)
1
+ N (x)
i+1
2
2
i=1
(5.5)
$∞
= 13 = i=2 32i ⇒
Where N (x) = inf{i : i ≥ 1, ai (x) = 1}. For example, x$
∞
N (x) = ∞, a1 (x) $
= 0, ai (x) = 2 for i ≥ 2 ⇒ F (x) = i=2 21i = 12 while
∞ 2
4
1
0
1
x = 9 = 3 + 32 + i=3 3i ⇒ N (x) = 1 ⇒ F (x) = 2 .
4.6 Problems
137
It can be shown that if {δn }n≥1 is a sequence of independent {0, 1} valued
random variables with P (δ1 = 0) = 12 = P (δ1 = 1), then the above F is
$∞
i
the cdf of the random variable X = i=1 2δ
3i which lies in the Cantor set
w.p. 1.
4.6 Problems
4.1 Show that in the proof of Theorem 4.1.1, µ(A3 ) = 0 where A3 = {g )∈
[0, 1]} and g satisfies (1.9).
(Hint: Apply (1.9) separately to A31 = {g > 1} and A32 = {g < 0}.)
4.2 Verify that µ1s , defined in (1.14) and µ2 are singular.
(Hint: For each n, by Case 1, there exists a gn in L2 (Dn , Fn , µ(n) )
(n)
(n)
where
µ(n) = µ1 + µ2 and Fn ≡ {A ∩ Dn : A ∈ F }, such that 0 ≤
&
g dµ(n) ≤ µ(n) (A) for all A in Fn . Let A1n = {w : w #
∈ Dn , 0 ≤
A n
gn (w) < 1}, A2n = {w : w ∈ Dn , gn (w) = 1}, and A2 = n≥1 A2n .
$∞
Show that µ2 (A2 ) = 0 and µ1s (A) = n=1 µ1n (A ∩ A2n ) and hence
µ1s (Ac2 ) = 0.)
&
4.3 Let ν be the counting measure on [0, 1] and [0,1] hdν < ∞ for some
nonnegative function h. Show that B = {x : h(x) > 0} is countable.
(Hint: Let Bn = {x : h(x) >
each n ∈ N.)
1
n }.
Show that Bn is a finite set for
4.4 Prove Proposition 4.1.2.
4.5 Find the Lebesgue decomposition of µ w.r.t. ν and the Radona
Nikodym derivative dµ
dν in the following cases where µa is the absolutely continuous component of µ w.r.t. ν.
(a) µ = N (0, 1), ν = Exponential(1)
(b) µ = Exponential(1), ν = N (0, 1)
(c) µ = µ1 + µ2 , where µ1 = N (0, 1), µ2 = Poisson(1) and ν =
Cauchy(0, 1).
(d) µ = µ1 + µ2 , ν = Geometric(p), 0 < p < 1, where µ1 = N (0, 1)
and µ2 = Poisson(1).
(e) µ = µ1 + µ2 , ν = ν1 + ν2 where µ1 = N (0, 1), µ2 = Poisson(1),
ν1 = Cauchy(0, 1) and ν2 = Geometric(p), 0 < p < 1.
(f) µ = Binomial (10, 1/2), ν = Poisson (1).
The measures referred to above are defined in Tables 4.6.1 and 4.6.2,
given at the end of this section.
138
4. Differentiation
4.6 Let (Ω, F, µ) be a measure space and f ∈ L1 (Ω, F, µ). Let
%
νf (A) ≡
f dµ for all A ∈ F.
A
(a) Show that νf is a finite signed measure.
&
&
(b) Show that 7ν7 = Ω |f |dµ and for A ∈ F, νf+ (A) = A f + dµ,
&
&
νf− (A) = A f dµ, and |νf |(A) = A |f |(dµ).
4.7 (a) Let µ1 and µ2 be two finite measures such that both are dominated by a σ-finite measure ν. Show that the total variation measure
of the signed measure µ ≡ µ1 − µ2 is given by
&
i
|µ|(A) = A |h1 − h2 |dν where for i = 1, 2, hi = dµ
dν .
(b) Conclude that if µ1 and µ2 are two measures on$
a countable set
Ω ≡ {ωi }i≥1 with F ≡ P(Ω), then |µ|(A) =
i∈A |µ1 (ωi ) −
µ2 (ωi )|.
(c) Show that if µn is the Binomial (n, pn ) measure and µ is the
Poisson (λ) measure, 0 < λ < ∞, then as n → ∞, |µn −µ|(·) → 0
uniformly on P(Z+ ) iff npn → λ.
(Hint: Show that for each i ∈ Z+ ≡ {0, 1, 2, . . .}, µn ({i}) →
µ({i}) and use Scheffe’s theorem.)
4.8 Let ν be a finite signed measure on a measurable space (Ω, F) and
let |ν| be the total variation measure corresponding to ν. Show that
for any B ∈ Ω+ , B ⊂ F,
|ν|(B) = ν(B),
where Ω+ is as defined in (2.13).
(Hint: For any set A ⊂ Ω+ ,
%
%
ν(A) =
hd|ν| =
A
A∩Ω+
hd|ν| ≥ 0.)
4.9 Show that the Banach space S of finite signed measures on (N, P(N))
is isomorphic to 21 , the Banach space of absolutely convergent sequences {xn }n≥1 in R.
4.10 Let µ1 and µ2 be two probability measures on (Ω, F).
(a) Show that
7µ1 − µ2 7 = 2 sup{|µ1 (A) − µ2 (A)| : A ∈ F }.
(Hint: For any A ∈ F, {A, Ac } is a partition of Ω and so 7µ1 −
µ2 7 ≥ |µ1 (A) − µ2 (A)| + |µ1 (Ac ) − µ2 (Ac )| = 2|µ1 (A) − µ2 (A)|,
4.6 Problems
139
since µ1 and µ2 are probability measures. For the opposite inequality, use the Hahn decomposition of Ω w.r.t. µ1 − µ2 and
the fact 7µ1 − µ2 7 = |(µ1 − µ2 )(Ω+ )| + |(µ1 − µ2 )(Ω− )|.)
(b) Show that 7µ1 − µ2 7 is also equal to
%
J
3J %
4
J
J
sup J f dµ1 − f dµ2 J : f ∈ B(Ω, R)
where B(Ω, R) is the collection of all F-measurable functions
from Ω to R such that sup{|f (ω)| : ω ∈ Ω} ≤ 1.
4.11 Let (Ω, F) be a measurable space.
(a) Let {µn }n≥1 be a sequence of finite measures on (Ω, F). Show
that there exists a probability measure λ such that µn ; λ.
$∞
(·)
(Hint: Consider λ(·) = n=1 21n µµnn(Ω)
.)
(b) Extend (a) to the case where {µn }n≥1 are σ-finite.
(c) Conclude that for any sequence {νn }n≥1 of finite signed measures on (Ω, F), there exists a probability measure λ such that
|νn | ; λ for all n ≥ 1.
4.12 Let {µn }n≥1 be a sequence of finite measures on a measurable space
(Ω, F). Show that there exists a finite measure µ on (Ω, F) such that
7µn − µ7 → 0 iff there is a finite measure λ dominating µ and µn ,
dµ
n
n ≥ 1 such that the Radon-Nikodym derivatives fn ≡ dµ
dλ → f ≡ dλ
in measure on (Ω, F, λ) and µn (Ω) → µ(Ω).
4.13 (a) Let µ1 and µ2 be two finite measures on (Ω, F). Let µ1 = µ1a +
µ1s be the Lebesgue-Radon-Nikodym decomposition of µ1 w.r.t.
µ2 as in Theorem 4.1.1. Show that if µ = µ1 − µ2 , then for all
A ∈ F,
%
dµ1a
|h − 1|dµ2 + µ1s (A) where h =
|µ|(A) =
dµ2
A
is the Radon-Nikodym derivative of µ1a w.r.t. µ2 . Conclude that
if µ1 ⊥ µ&2 , Jthen |µ|(·)
J = µ1 (·) + µ2 (·) and if µ1 ; µ2 , then
1
Jdµ2 .
−
1
|µ|(A) = A J dµ
dµ2
(b) Compute |µ|(·), 7µ7 if µ = µ1 − µ2 for the following cases
(i) µ1 = N (0, 1), µ2 = N (1, 1)
(ii) µ1 = Cauchy (0,1), µ2 = N (0, 1)
(iii) µ1 = N (0, 1), µ2 = Poisson (λ).
(c) Establish Proposition 4.2.7.
140
4. Differentiation
4.14 Give another proof of the completeness of (S, 7 · 7)) by verifying the
following steps.
(a) For any sequence {νn }n≥1 in S, there is a finite measure λ and
{fn }n≥1 ⊂ L1 (Ω, F, λ) such that
%
νn (A) =
fn dλ for all A ∈ F, for all n ≥ 1.
A
(b) {νn }n≥1 Cauchy in S is the same as {fn }n≥1 Cauchy in
L1 (Ω, F, λ) and hence, the completeness of (S, 7·7)) follows from
the completeness of L1 (Ω, F, λ).
4.15 Let f, g ∈ BV [a, b].
(a) Show that P (f + g; [a, b]) ≤ P (f ; [a, b]) + P (g; [a, b]) and that
the same is true for N (·; ·) and T (·; ·).
(b) Show that for any c ∈ R,
P (cf ; [a, b]) = |c|P (f ; [a, b])
and do the same for N (·; ·) and T (·; ·).
(c) For any a < c < b, P (f ; [a, b]) = P (f ; [a, c]) + P (f ; [c, b]).
4.16 Let {fn }n≥1 ⊂ BV [a, b] and let limn fn (x) = f (x) for all x in
[a, b]. Show that P (f ; [a, b]) ≤ limn→∞ P (fn ; [a, b]) and do the same
for N (·; ·) and T (·; ·).
4.17 Let f ∈ BV [a, b]. Show that f is continuous except on an at most
countable set.
4.18 Let F : [a, b] → R be a.c. Show that it is of bounded variation.
(Hint: By the definition of a.c., for # = 1, there is a δ1 > 0 such
$k
$k
that j=1 |aj − bj | < δ1 ⇒ j=1 |F (aj ) − F (bj )| < 1. Let M be an
integer > b−a
δ + 1. Show that T (F, [a, b]) ≤ M .)
4.19 Let F be an absolutely continuous nondecreasing function on R. Let
µF be the Lebesgue-Stieltjes measure corresponding to F . Show that
for any h ∈ L1 (R, MµF , µF ),
%
%
hdµF = hf dm
R
where
f is a nonnegative measurable function such that F (b)−F (a) =
&
f
dm
for any a < b.
[a,b]
4.6 Problems
141
4.20 Let F : [a, b] → R be absolutely continuous with F $ (·) > 0 a.e. on
[a, b], where −∞ < a < b < ∞. Let F (a) = c and F (b) = d. Let m(·)
denote the Lebesgue measure on R. Show the following:
(a) (Change of variables formula). For any g : [c, d] → R and
Lebesgue measurable and integrable w.r.t. m
%
%
gdm =
g(F )F $ dm.
[c,d]
[a,b]
(b) For any Borel set E ⊂ [a, b], F (E) is also a Borel set.
!
"
(c) ν(·) ≡ m F (·) is a measure on B([a, b]) and ν ; m with
dν
(·) = F $ (·).
dm
(d) λ(·) ≡ mF −1 (·) is a measure on B([c, d]) and λ ; m with
' !
"(−1
dλ
(·) = F $ F −1 (·)
.
dm
(e) For any measure µ ; m on B([a, b]) the measure ψ(·) ≡ µF −1 (·)
is dominated by m with
dµ ! −1 "' $ ! −1 "(−1
dψ
=
F (·) F F (·)
.
dm
dm
(f) Establish (a) assuming that g and F $ are both continuous noting
that both integrals reduce to Riemann integrals.
(Hint:
(i) Verify (a) for g = I[a,b] , c < α < β < d and approximate by step
functions.
(ii) Show that F is (1 − 1) and F −1 (·) is continuous and hence Borel
measurable.
(iii) Show that ν(·) = µF , the Lebesgue-Stieltjes measure corresponding to F .
(iv) Use the fact that for any c ≤ α ≤ β ≤ d,
ψ([α, β])
= µ([γ, δ]), where γ = F −1 (α), δ = F −1 (β),
%
dµ
gdm, where g =
=
dm
[γ,δ]
(
'
!
"
%
g F −1 F (·)
$
'
=
!
"( F (·)dm
[γ,δ] F $ F −1 F (·)
!
"
%
g F −1 (·)
!
" dm by (a). )
=
$
−1 (·)
[α,β] F F
142
4. Differentiation
4.21 Let F : R → R be absolutely continuous on every finite interval.
(a) Show that the f in (4.1) can be chosen independently of the
interval [a, b].
(b) Further, if f is integrable over R, then limx→−∞ F (x) ≡ F (−∞)
&and limx→∞ F (x) ≡ F (∞) exist and F (x) = F (−∞) +
f dµL for all x in R.
(−∞,x)
(c) Give an example where F : R → R is a.c., but f is not integrable
over R.
4.22 Let F : R → R be absolutely continuous on bounded intervals. Let
{I = 1 ≤ j ≤ k ≤ ∞} be a collection of disjoint intervals such that
#kj
$
$
j=1 Ij ≡ R and on each Ij , F (·) > 0 a.e. or F (·) < 0 a.e. w.r.t. m.
(a) Show that for any h ∈ L1 (R, m),
%
%
!
"
hdm =
h F (·) |F $ (·)|dm.
R
R
!
"
(b) Show that if µ is a measure on R, B(R) dominated by m then
the measure µF −1 (·) is also dominated by m and
dµF −1
(y) =
dm
where f (·) =
)
xj ∈D(y)
f (xj )
|F $ (xj )|
dµ
dm
and D(y) = {xj : xj ∈ Ij , F (xj ) = y}.
!
"
(c) Let µ be the N (0, 1) measure on R, B(R) , i.e.,
x2
1
dµ
(x) = √
e− 2 , −∞ < x < ∞.
dm
2n
Let F (x) = x2 . Find
dµF −1
dm .
4.23 Let f : R → R be integrable w.r.t. m on bounded intervals. Show
that for almost all x0 in R (w.r.t. m),
lim
a↑x0
b↓x0
1
(b − a)
%
a
b
|f (x) − f (x0 )|dx = 0.
(Hint: For each rational r, by Theorem 4.4.4
1
lim
a↑x0 (b − a)
b↓x
0
%
b
a
|f (x) − r|dx = |f (x0 ) − r|.
4.6 Problems
143
a.e. (m).
# Let Ar denote the set of x0 for which this fails to hold. Let
A = r∈Q Ar . Then m(A) = 0. For any x0 )∈ A and any # > 0, choose
a rational r such that |f (x0 ) − r| < # and now show that
% b
1
lim
|f (x) − f (x0 )|dx < #. )
a↑x0 (b − a)
a
b↓x
0
4.24 Use the hint to the above problem to establish Theorem 4.4.5.
4.25 Let (Ω, B) be a measurable space. Let {µn }n≥1 and µ be σ-finite
measures on (Ω, B). Let for each n ≥ 1, µn = µna + µns be the
Lebesgue
$ of µn w.r.t. µ with
$ µna ; µ and µns ⊥ µ. Let
$ decomposition
λ = n≥1 µn , λa = n≥1 µna , λs = n≥1 µns . Show that λa ; µ
and λs ⊥ µ and that λ = λa + λs is the Lebesgue decomposition of λ
w.r.t. µ.
!
"
4.26 Let {µn }n≥1 be Radon measures on Rk$
, B(Rk ) and m be the
∞
Lebesgue measure on Rk . Show that if λ = n=1 µn is also a Radon
measure, then
∞
)
Dµn a.e. (m).
Dλ =
n=1
(Hint: Use Theorem 4.4.4 and the uniqueness of the Lebesgue decomposition.)
4.27 Let Fn , n ≥ 1$
be a sequence of nondecreasing functions from R → R.
Let F (x) =
n≥1 Fn (x) < ∞ for all x ∈ R. Show that F (·) is
nondecreasing and
)
Fn$ (·) a.e. (m).
F $ (·) =
n≥1
4.28 Let E be a Lebesgue measurable set in R. The metric density of E
at x is defined as
!
"
m E ∩ (x − δ, x + δ)
DE (x) ≡ lim
δ↓0
2δ
if it exists. Show that DE (·) = IE (·) a.e. m.
(Hint: Consider the measure λE (·) ≡ m(E ∩ ·) on the Lebesgue
σ-algebra. Show that λE ; m and find λ$E (·) (cf. Definition 4.4.2).)
4.29 Let F , G : [a, b] → R be both absolutely continuous. Show that
H = F G is also absolutely continuous on [a, b] and that
%
%
F dG +
GdF = F (b)G(b) − F (a)G(a).
[a,b]
[a,b]
144
4. Differentiation
4.30 Let (Ω, F, µ) be a finite measure space. Fix 1 ≤ p < ∞. Let T :
Lp (µ) → R be a bounded linear functional as defined in (3.2.10) (cf.
Section 3.2). Complete the following outline of a proof of Theorem
3.2.3 (Riesz representation theorem).
(a) Let ν(A) ≡ T (IA ), A ∈ F. Verify that ν(·) is a signed measure
on (Ω, F).
(b) Verify that |ν| ; µ.
dν
(c) Let g ≡ dµ
. Show that g ∈ Lq (µ) where q =
and q = ∞ if p = 1.
p
p−1 ,
1<p<∞
(d) Show that T = Tg .
4.31 Prove Theorem 4.5.2 and Corollary 4.5.3.
4.32 For 0 < α < 1, construct a Cantor like set Cα as described in Remark
4.5.1.
4.33 Show that the Cantor ternary function can be expressed as in (5.5).
4.34 Let (Ω, F, µ) be a σ-finite measure space.
(a) Let G be a σ-algebra ⊂ F. Let f : Ω → R+ be 1F, B(R)2measurable. Show that there exists& a g : Ω& → R+ that is
1G, B(R)2-measurable and ν(A) ≡ A f dµ = A gdµ for all A
in G.
(Hint: Apply Theorem 4.1.1 (b) to the measures ν and µ restricted to G. When µ is a probability measure, g is called the
conditional expectation of f given G (cf. Chapter 12).)
(b) Now suppose G = σ1{Ai }i≥1 2 where {Ai }i≥1 is a partition of
Ω ⊂ F. Determine g(·) explicitly on each Ai such that 0 <
µ(Ai ) < ∞.
TABLE 4.6.1. Some discrete univariate distributions.
Mean µ
Bernoulli (p),
0<p<1
Binomial (n, p),
0 < p < 1, n ∈ N
Geometric (p),
0<p<1
Poisson (λ),
0<λ<∞
$1
µ(A), A ∈ B(R)
pi (1 − p)1−i IA (i)
i=0
$n
i=0
! n"
i
$∞
i=0
pi (1 − p)n−i IA (i)
p(1 − p)i−1 IA (i)
$∞
i=0
i
e−λ λi! · IA (i)
4.6 Problems
145
TABLE 4.6.2. Some standard absolutely continuous distributions. Here m denotes
the Lebesgue measure on (R, B(R)).
Measure µ
Uniform (a, b),
−∞ < a < b < ∞
Exponential (β),
β ∈ (0, ∞)
Gamma (α, β),
α, β ∈ (0, ∞)
Beta (α, β),
α, β ∈ (0, ∞)
Cauchy (γ, σ)
γ ∈ R, σ ∈ (0, ∞)
Normal (γ, σ 2 ),
γ ∈ R, σ ∈ (0, ∞)
Lognormal (γ, σ 2 ),
γ ∈ R, σ ∈ (0, ∞)
µ(A), A ∈ B(R)
m(A ∩ [a, b])/(b − a)
&
&
1
A β
exp(−x/β)I(0,∞) (x)m(dx)
1
xα−1 exp(−x/β)I(0,∞) (x)m(dx)
A Γ(α)β α
&∞
where Γ(a) = 0 xa−1 e−x dx, a ∈ (0, ∞)
1
B(α,β)
&
xα−1 (1 − x)β−1 I(0,1) (x)m(dx),
where B(a, b) = Γ(a)Γ(b)/Γ(a + b), a, b ∈ (0, ∞)
& 1
σ2
m(dx)
A πσ σ 2 +(x−γ)2
&
&
A
A
√1
2πσ
exp(−(x − γ)2 /σ 2 )m(dx)
e−(log x−γ)
√1
x
A 2πσ
2 /2σ 2
I(0,∞) (x)m(dx)
5
Product Measures, Convolutions, and
Transforms
5.1 Product spaces and product measures
Given two measure spaces (Ωi , Fi , µi ), i = 1, 2, is it possible to construct a
measure µ ≡ µ1 ×µ2 on a σ-algebra on the product space Ω1 ×Ω2 such that
µ(A × B) = µ1 (A)µ2 (B) for A ∈ F1 and B ∈ F2 ? This section is devoted
to studying this question.
Definition 5.1.1: Let (Ωi , Fi ), i = 1, 2 be measurable spaces.
(a) Ω1 × Ω2 ≡ {(ω1 , ω2 ) : ω1 ∈ Ω1 , ω2 ∈ Ω2 }, the set of all ordered pairs,
is called the (Cartesian) product of Ω1 and Ω2 .
(b) The set A1 × A2 with A1 ∈ F1 , A2 ∈ F2 is called a measurable rectangle. The collection of measurable rectangles will be denoted by C.
(c) The product σ-algebra of F1 and F2 on Ω1 × Ω2 , denoted by F1 , ×F2 ,
is the smallest σ-algebra generated by C, i.e.,
F1 × F2 ≡ σ1{A1 × A2 : A1 ∈ F1 , A2 ∈ F2 }2.
(d) (Ω1 × Ω2 , F1 × F2 ) is called the product measurable space.
Starting with the definition of µ on the class C of measurable rectangles
by
µ(A1 × A2 ) = µ1 (A1 )µ2 (A2 )
(1.1)
148
5. Product Measures, Convolutions, and Transforms
for all A1 ∈ F1 , A2 ∈ F2 , one can extend it to a measure on the algebra
A of all finite unions of disjoint measurable rectangles in a natural way.
Indeed, the extension to A is obtained simply by assigning the µ-measure
of a finite union of disjoint measurable rectangles as the sum of the µmeasures of the corresponding individual measurable rectangles. Then, by
the extension theorem (cf. Theorem 1.3.3), it can be further extended to
a complete measure µ on a σ-algebra containing F1 × F2 , defined in (c)
above. However, to evaluate µ(A) for an arbitrary
A in F1 × F2 that is
&
not a measurable rectangle and to evaluate hdµ for arbitrary measurable
functions h, some further work is needed. For the case of the product of
two measure spaces, here an alternate approach that yields a direct way of
computing these quantities is presented. The extension theorem approach
will be used for the more general case of products of finitely many measure
spaces in Section 5.3. The case of products of infinitely many probability
spaces will be discussed in Chapter 6.
Definition 5.1.2:
(a) Let A ∈ F1 × F2 . Then, for any ω1 ∈ Ω1 , the set
A1ω1 ≡ {ω2 ∈ Ω2 : (ω1 , ω2 ) ∈ A}
(1.2)
is called the ω1 -section of A and for any ω2 ∈ Ω2 , the set A2ω2 ≡
{ω1 ∈ Ω1 : (ω1 , ω2 ) ∈ A} is called the ω2 -section of A.
(b) If f : (Ω1 × Ω2 ) → Ω3 is a 1F1 × F2 , F3 2 measurable mapping from
Ω1 × Ω2 into some measurable space (Ω3 , F3 ), then the ω1 -section of
f is the function f1ω1 : Ω2 → Ω3 , given by
f1ω1 (ω2 ) = f (ω1 , ω2 ), ω2 ∈ Ω2 .
(1.3)
Similarly, one may define the ω2 -sections of f by f2ω2 (ω1 ) = f (ω1 , ω2 ),
ω1 ∈ Ω1 . The following result shows that the ω1 -sections of a F1 × F2 measurable function is F2 -measurable when considered as a function of
ω2 ∈ Ω 2 .
Proposition 5.1.1: Let (Ω1 ×Ω2 , F1 ×F2 ) be a product space, A ∈ F1 ×F2
and let f : Ω1 × Ω2 → Ω3 be a 1F1 × F2 , F3 2-measurable function.
(i) For every ω1 ∈ Ω1 , A1ω1 ∈ F2 and for every ω2 ∈ Ω2 , A2ω2 ∈ F1 .
(ii) For every ω1 ∈ Ω1 , f1ω1 is 1F2 , F3 2-measurable and for every ω2 ∈
Ω2 , f2ω2 is 1F1 , F3 2-measurable.
Proof: Fix ω1 ∈ Ω1 . Define the function g : Ω2 → Ω1 × Ω2 by
g(ω2 ) = (ω1 , ω2 ), ω2 ∈ Ω2 .
5.1 Product spaces and product measures
149
Note that for any measurable rectangle A ≡ A1 × A2 ∈ F1 × F2 ,
/
A2 if ω1 ∈ A1
A1ω1 =
∅
if ω1 )∈ A1
and hence g −1 (A1 × A2 ) ∈ F2 . Since the class of all measurable rectangles
generates F1 × F2 , for fixed ω1 in Ω1 , g is 1F2 , F1 × F2 2-measurable. Therefore, for A in F1 × F2 , A1ω1 = g −1 (A) ∈ F2 and for f as given, f1ω1 = f ◦ g
is 1F2 , F3 2-measurable. This proves (i) and (ii) for the ω1 -sections. The
!
proof for the ω2 -sections are similar.
Now suppose µ1 and µ2 are measures on (Ω1 , F1 ) and (Ω2 , F2 ), respectively. Then, for any set A ∈ F1 × F2 , for all ω1 ∈ Ω1 , A1ω1 ∈ F2 and hence
µ2 (A1ω1 ) is well defined. If this were an F1 -measurable function, then one
might define a set function on F1 × F2 by
%
µ2 (A1ω1 )µ1 (dω1 ).
(1.4)
µ12 (A) =
Ω1
And similarly, reversing the order of µ1 and µ2 , one might define a second
set function
%
µ21 (A) =
µ1 (A2ω2 )µ2 (dω2 ),
(1.5)
Ω2
provided that µ1 (A2ω2 ) is F2 -measurable. Note that for the measurable
rectangles A = A1 × A2 , µ12 (A) = µ1 (A1 )µ2 (A2 ) = µ21 (A) and thus both
µ12 and µ21 coincide (with the product measure µ) on the class C of all
measurable rectangles. This implies that if the product measure µ is unique
on F1 × F2 , and µ12 and µ21 are measures on F1 × F2 , then µ12 , µ21 and
µ must coincide on the σ-algebra F1 × F2 . Then, one can evaluate µ(A)
for any set A ∈ F1 × F2 using either of the relations (1.4) or (1.5). The
following result makes this heuristic discussion rigorous.
Theorem 5.1.2: Let (Ωi , Fi , µi ), i = 1, 2 be σ-finite measure spaces. Then,
(i) for all A ∈ F1 × F2 , the functions µ2 (A1ω1 ) and µ1 (A2ω2 ) are F1 and F2 -measurable, respectively.
(ii) The set functions µ12 and µ21 , given by (1.4) and (1.5) respectively,
are measures on F1 × F2 , satisfying µ12 (A) = µ21 (A) for all A ∈
F1 × F2 .
(iii) Further, µ12 = µ21 ≡ µ is σ-finite and it is the only measure satisfying
µ(A1 × A2 ) = µ1 (A1 )µ2 (A2 )
for all A1 × A2 ∈ C.
Proof: First suppose that µ1 and µ2 are finite measures. Define
:
;
L = A ∈ F1 × F2 : µ2 (A1ω1 ) is a 1F1 , B(R)2-measurable function .
150
5. Product Measures, Convolutions, and Transforms
For A = Ω1 ×Ω2 , µ2 (A1ω1 ) ≡ µ2 (Ω2 ) for all ω1 ∈ Ω1 and hence Ω1 ×Ω2 ∈ L.
Next, let A, B ∈ L with A ⊂ B. Then, it is easy to check that (A \ B)1ω1 =
A1ω1 \ B1ω1 . Since µ2 is a finite measure and A, B ∈ L, it follows that
µ2 ((A \ B)1ω1 ) = µ2 (A1ω1 \ B1ω1 ) = µ2 (A1ω1 ) − µ2 (B1ω1 ) is 1F1 , B(R)2measurable. Thus, A \ B ∈ L. Finally, let {Bn }n≥1 ⊂ L be such that
Bn ⊂ Bn+1 for all n ≥ 1. Then for any ω1 ∈ Ω1 , (Bn )1ω1 ⊂ (Bn+1 )1ω1 for
all n ≥ 1. Hence by the MCT,
'' *
( (
'*
(
!
"
∞ > µ2
= µ2
Bn
(Bn )1ω1 = lim µ2 (Bn )1ω1
1ω1
n≥1
n≥1
n→∞
!#
"
for all ω1 ∈ Ω1 . By Proposition 2.1.5,#this implies that µ2 ( n≥1 Bn )1ω1
is 1F1 , B(R)2-measurable, and hence n≥1 Bn ∈ L. Thus, L is a λ-system.
For A = A1 × A2 ∈ C, µ2 (A1ω1 ) = µ2 (A2 )IA1 (ω1 ) and hence C ⊂ L. Since
C is a π-system, by Corollary 1.1.3, it follows that L = F1 × F2 . Thus,
µ2 (A1ω1 ), considered as a function of ω1 , is 1F1 , B(R)2-measurable for all
A ∈ F1 × F2 , proving (i).
Next, consider part (ii). By part (i), µ12 is a well-defined set function on
F1 × F2 . It is easy to check that µ12 is a measure on F1 × F2 (Problem
5.1). Similarly, µ21 is a well-defined measure on F1 × F2 . Since µ12 (A) =
µ21 (A) = µ1 (A1 )µ2 (A2 ) for all A = A1 × A2 ∈ C and C is a π-system
generating F1 × F2 , it follows from Theorem 1.2.4 that µ12 (A) = µ21 (A)
for all A ∈ F1 ×F2 . This proves (ii) for the case where µi (Ωi ) < ∞, i = 1, 2.
Next, suppose that µi ’s#are σ-finite. Then, there exist disjoint sets
{Bin }n≥1 ⊂ Fi , such that n≥1 Bin = Ωi and µi (Bin ) < ∞ for all n ≥ 1,
i = 1, 2. Define the finite measures
µin (D) = µi (D ∩ Bin ),
D ∈ Fi ,
for n ≥ 1, i = 1, 2. The arguments above with µi replaced by µin imply that
for any A ∈ F1 ×F2 , µ2n (A1ω1 ) is 1F1$
, B(R)2-measurable for all n ≥ 1. Since
∞
µ2 is a measure on F2 , µ2 (A1ω1 ) = n=1 µ2n (Aω1 ) and hence, considered
as a function of ω1 , it is 1F1 , B(R̄)2-measurable for all A ∈ F1 × F2 . Thus,
the set function µ12 of (1.4) is well defined in the σ-finite case as well.
(m,n)
Similarly, µ21 of (1.5) is a well-defined set function on F1 × F2 . Let µ12
(m,n)
and µ21 , respectively, denote the set functions defined by (1.4) and (1.5)
with µ1 replaced by µ1m and µ2 replaced by µ2n , m ≥ 1, n ≥ 1. Then, by
repeated use of the MCT,
%
µ12 (A) =
µ2 (A1ω1 )µ1 (dω1 )
Ω1
=
=
∞ -%
)
∞
)
B1m n=1
m=1
∞
∞
) )%
m=1 n=1
B1m
.
µ2 (A1ω1 ∩ B2n ) µ1 (dω1 )
µ2 (A1ω1 ∩ B2n )µ1 (dω1 )
5.1 Product spaces and product measures
=
∞
∞ )
)
(m,n)
µ12
(A),
A ∈ F1 × F2
m=1 n=1
151
(1.6)
and similarly,
µ21 (A) =
∞
∞ )
)
(m,n)
µ21
(A),
n=1 m=1
(m,n)
A ∈ F1 × F2 .
(1.7)
(m,n)
Since µ12
and µ21
are (finite) measures, it is easy to check that µ12
(m,n)
and µ21 are measures on F1 ×F2 . Also, by the finite case, µ12 (A1 ×A2 ) =
(m,n)
µ21 (A1 × A2 ) for all n ≥ 1, m ≥ 1 and hence
µ12 (A1 × A2 ) = µ21 (A1 × A2 )
for all A1 × A2 ∈ C.
Next note that {B1m × B2n : m ≥ 1, n ≥ 1} is a partition of Ω1 × Ω2 by
F1 × F2 sets and by (1.6) and (1.7), for all m ≥ 1, n ≥ 1,
µ12 (B1m × B2n ) = µ1 (B1m )µ2 (B2n ) = µ21 (B1m × B2n ) < ∞.
Hence, µ12 and µ21 are σ-finite on F1 × F2 . Since µ12 and µ21 agree on
C and C is a π-system generating the product σ-algebra, it follows that
µ12 (A) = µ21 (A) for all A ∈ F1 ×F2 and it is the unique measure satisfying
µ(A1 × A2 ) = µ1 (A1 )µ2 (A2 ) for all A1 × A2 ∈ C. This completes the proof
of the theorem.
!
Remark 5.1.1: In the above theorem, the σ-finiteness condition on the
measures µ1 and µ2 cannot be dropped. For example, let Ω1 = Ω2 = [0, 1],
F1 = F2 = B([0, 1]), µ1 = the Lebesgue measure and µ2 = the counting
measure. Clearly µ2 is not σ-finite since [0, 1] is uncountable. Let A be the
diagonal set in the product space [0, 1]×[0, 1], i.e., A = {(ω1 , ω2 ) ∈ Ω1 ×Ω2 :
ω1 = ω2 }. Then A ∈ F1 × F2 , A1ω1 = {ω1 } and A2ω2 = {ω2 }. Further,
µ
& 2 (A1ω1 ) = 1 for all ω1 and µ1 (A2ω2 ) =& 0 for all ω2 . Thus µ12 (A) =
µ (A1ω1 )µ1 (dω1 ) = 1, but µ21 (A) = Ω2 µ1 (A2ω2 )µ2 (dω2 ) = 0, and
Ω1 2
hence, µ12 (A) )= µ21 (A).
Remark 5.1.2: Although this approach allows one to compute the product
measure µ1 × µ2 of a set A ∈ F1 × F2 , the measure space (Ω1 × Ω2 , F1 ×
F2 , µ1 × µ2 ) may not be complete even if both (Ωi , Fi , µi ), i = 1, 2 are
complete (Problem 5.2). However, the approach based on the extension
theorem yields a product measure space that is complete. See Remark 5.2.2
for further discussion on the topic.
Definition 5.1.3:
(a) The unique measure µ on F1 × F2 defined in Theorem 5.1.2 (iii) is
called the product measure and is denoted by µ1 × µ2 .
152
5. Product Measures, Convolutions, and Transforms
(b) The measure space (Ω1 × Ω2 , F1 × F2 , µ1 × µ2 ) is called the product
measure space.
Formulas (1.4) and (1.5) give two different ways of evaluating µ1 × µ2 (A)
for an A ∈ F1 ×F2 . In the next section, integration of a measurable function
f : Ω1 × Ω2 → R w.r.t. the product measure µ1 × µ2 is considered and the
above approach is extended to justify evaluation of the integral iteratively.
5.2 Fubini-Tonelli theorems
Let f : Ω1 × Ω2 → R be a 1F1 × F2 , B(R)2-measurable function. Relations
(1.4) and (1.5) suggest that the integral of f w.r.t. µ1 ×µ2 may be evaluated
as iterated integrals, using the formulas
2
%
% 1%
f (ω1 , ω2 )µ1 × µ2 (d(ω1 , ω2 )) =
f (ω1 , ω2 )µ1 (dω1 ) µ2 (dω2 )
Ω1 ×Ω2
and
%
Ω1 ×Ω2
Ω2
f (ω1 , ω2 )µ1 × µ2 (d(ω1 , ω2 )) =
%
Ω1
Ω1
1%
(2.1)
2
f (ω1 , ω2 )µ2 (dω2 ) µ1 (dω1 ).
Ω2
(2.2)
Here, the left sides of both (2.1) and (2.2) are simply the integral of f
on the space Ω = Ω1 × Ω2 w.r.t. the measure µ = µ1 × µ2 . The expressions
on the right sides of (2.1) and (2.2) are, however, iterated integrals, where
integrals of sections of f are evaluated first and then the resulting sectional
integrals are integrated again to get the final expression. Conditions for
the validity of (2.1) and (2.2) are provided by the Fubini-Tonelli theorems
stated below.
Theorem 5.2.1: (Tonelli’s theorem). Let (Ωi , Fi , µi ), i = 1, 2 be σ-finite
measure spaces and let f : Ω1 × Ω2 → R+ be a nonnegative F1 × F2 measurable function. Then
%
g1 (ω1 ) ≡
f (ω1 , ω2 )µ2 (dω2 ) : Ω1 → R̄ is 1F1 , B(R̄)2-measurable
Ω2
(2.3)
and
g2 (ω2 ) ≡
Further
%
Ω1
f (ω1 , ω2 )µ1 (dω1 ) : Ω2 → R̄
%
Ω1 ×Ω2
f dµ =
%
Ω1
is
g1 dµ1 =
1F2 , B(R̄)2-measurable.
(2.4)
%
Ω2
g2 dµ2 ,
(2.5)
5.2 Fubini-Tonelli theorems
153
where µ = µ1 × µ2 .
Proof: If f = IA for some A in F1 × F2 , the result follows from Theorem 5.1.2. By the linearity of integrals, the result now holds for all simple
nonnegative functions f . For a general nonnegative function f , there exist
a sequence {fn }n≥1 of nonnegative simple functions &such that fn (ω1 , ω2 ) ↑
f (ω1 , ω2 ) for all (ω1 , ω2 ) ∈ Ω1 ×Ω2 . Write g1n (ω1 ) = Ω1 fn (ω1 , ω2 )µ2 (dω2 ).
Then, g1n is F1 -measurable for all n ≥ 1, g1n ’s are nondecreasing, and by
the MCT,
%
f (ω1 , ω2 )µ2 (dω2 )
g1 (ω1 ) ≡
Ω1
%
= lim
fn (ω1 , ω2 )µ2 (dω2 )
n→∞
=
lim g1n (ω1 )
(2.6)
n→∞
for all ω1 ∈ Ω1 . Thus, by Proposition 2.1.5,
g1 is &1F1 , B(R̄)2-measurable.
&
Since (2.5) holds for simple functions, fn dµ = g1n dµ1 for all n ≥ 1.
Hence, by repeated applications of the MCT, it follows that
%
%
%
f dµ = lim
fn dµ = lim
g1n dµ1
n→∞
n→∞
%
%
=
( lim g1n )dµ1 = g1 dµ1 .
n→∞
The proofs of (2.4) and the second equality in (2.5) are similar.
!
Theorem 5.2.2: (Fubini’s theorem). Let (Ωi , Fi , µi ), i = 1, 2 be σ-finite
measure spaces and let f ∈ L1 (Ω1 × Ω2 , F1 × F2 , µ1 × µ2 ). Then there exist
sets Bi ∈ Fi , i = 1, 2 such that
(i) µi (Ωi \ Bi ) = 0 for i = 1, 2,
(ii) for ω1 ∈ B1 , f (ω1 , ·) ∈ L1 (Ω2 , F2 , µ2 ),
(iii) the function
g1 (ω1 ) ≡
/ &
Ω2
f (ω1 , ω2 )µ2 (dω2 )
0
is F1 -measurable and
%
Ω1
g1 dµ1 =
%
Ω1 ×Ω2
for
for
ω1 in B1
ω1 in B1c
f d(µ1 × µ2 ),
(iv) for ω2 ∈ B2 , f (·, ω2 ) ∈ L1 (Ω1 , F1 , µ1 ),
(2.7)
154
5. Product Measures, Convolutions, and Transforms
(v) the function
g2 (ω2 ) ≡
/ &
Ω1
f (ω1 , ω2 )µ1 (dω1 )
0
is F2 -measurable and
%
Ω2
g2 dµ2 =
%
Ω1 ×Ω2
for
for
ω2 in B2
ω2 in B2c
f d(µ1 × µ2 ).
(2.8)
Remark 5.2.1: An informal statement of the above theorem is as follows.
If f is integrable
&
& on the product space, then the sectional integrals
f
(ω
,
·)dµ
and
f (·, ω2 )dµ1 are well defined a.e., and their integrals
1
2
Ω2
Ω1
w.r.t. µ1 and µ2 , respectively, are equal to the integral of f w.r.t. the product measure µ1 × µ2 .
Proof: By Tonelli’s theorem
%
%
|f |d(µ1 × µ2 ) =
Ω1 ×Ω2
Ω1
-%
Ω2
.
|f (ω1 , ω2 )|µ2 (dω2 ) µ1 (dω1 ).
&
So Ω1 ×Ω2 |f |d(µ1 × µ2 ) < ∞ implies that µ1 (B1c ) = 0 where B1 = {ω1 :
&
|f (ω1 , ·)|dµ2 < ∞}. Also, by Tonelli’s theorem
%
%
g11 (ω1 ) ≡
f + (ω1 , ·)dµ2 and g12 (ω1 ) ≡
f − (ω1 , ·)dµ2
Ω2
Ω2
are both F1 -measurable and
%
%
g11 dµ1 =
f + d(µ1 × µ2 ),
Ω1
Ω1 ×Ω2
%
Ω1
g12 dµ1 =
%
Ω1 ×Ω2
f − d(µ1 × µ2 ).
(2.9)
Since g1 defined in (iii) can be written as g1 = (g11 − g12 )IB1 , g1 is F1 measurable. Also,
%
%
%
|g1 |dµ1 ≤
g11 dµ1 +
g12 dµ1
Ω1
Ω
Ω1
% 1
%
+
=
f d(µ1 × µ2 ) +
f − d(µ1 × µ2 )
Ω1 ×Ω2
Ω1 ×Ω2
< ∞.
&
Further, as Ω1 ×Ω2 |f |dµ1 × µ2 < ∞, by (2.9), g11 and g12 ∈ L1 (Ω1 , F1 , µ1 ).
Noting that µ1 (B1c ) = 0, one gets
%
%
g1 dµ1 =
(g11 − g12 )IB1 dµ1
Ω1
Ω
% 1
%
=
g11 IB1 dµ1 −
g12 IB1 dµ1
Ω
Ω1
% 1
%
=
g11 dµ1 −
g12 dµ1
Ω1
Ω1
5.2 Fubini-Tonelli theorems
155
&
&
which, by (2.9), equals Ω1 ×Ω2 f + d(µ1 × µ2 ) − Ω1 ×Ω2 f − d(µ1 × µ2 ) =
&
f d(µ1 × µ2 ). Thus, (ii) and (iii) of the theorem have been estabΩ1 ×Ω2
lished as well as (i) for i = 1. The proofs of (iv) and (v) and that of (i) for
i = 2 are similar.
!
An application of the Fubini-Tonelli theorems gives an integration by
parts formula. Let F1 and F2 be two nondecreasing right continuous functions on an interval [a, b]. Let µi be the Lebesgue-Stieltjes measure on
B([a, b]) corresponding to F
by parts’ for&
& i , i = 1, 2. The ‘integration
mula allows one to write (a,b] F1 (x)dF2 (x) ≡ (a,b] F1 dµ2 in terms of
&
&
F (x)dF1 (x) ≡ (a,b] F2 dµ1 .
(a,b] 2
Theorem 5.2.3: Let F1 , F2 be two nondecreasing right continuous functions on [a, b] with no common points of discontinuity in (a, b]. Then
%
%
F1 (x)dF2 (x) = F1 (b)F2 (b) − F1 (a)F2 (a) −
F2 (x)dF1 (x). (2.10)
(a,b]
(a,b]
"
!
, i = 1, 2 are finite measure" spaces.
Proof: Note that (a, b], B(a,
b],
µ
i
!
Consider the product space (a, b] × (a, b], B((a, b] × (a, b]), µ1 × µ2 . Define
the sets
and
A = {(x, y) : a < x ≤ y ≤ b}
B = {(x, y) : a < y ≤ x ≤ b}
C
= {(x, y) : a < x = y ≤ b}.
For notational simplicity, write Ax and Ay for the x-section and the ysection of A, respectively, and similarly, for the sets B and C. Since F1 and
F2 have no common points of discontinuity, by Theorem 5.1.2
%
µ1 × µ2 (C) =
µ2 (Cx )µ1 (dx)
(a,b]
%
=
µ2 ({x})µ1 (dx)
(a,b]
=
0.
(see Problem 5.3). And by Theorem 5.1.2,
%
µ1 (Ay )µ2 (dy)
µ1 × µ2 (A) =
(a,b]
%
=
µ1 ((a, y])µ2 (dy)
(a,b]
%
=
[F1 (y) − F1 (a)]dF2 (y)
(a,b]
(2.11)
(2.12)
156
5. Product Measures, Convolutions, and Transforms
and similarly,
µ1 × µ2 (B)
=
=
%
(a,b]
%
(a,b]
µ2 (Bx )µ1 (dx)
[F2 (x) − F2 (a)]dF1 (x).
(2.13)
Next note that (µ1 × µ2 )((a, b] × (a, b]) = µ1 ((a, b]) · µ2 ((a, b]) = [F1 (b) −
F1 (a)][F2 (b) − F2 (a)]. Hence, by (2.11)–(2.13),
[F1 (b) − F1 (a)][F2 (b) − F2 (a)]
= (µ1 × µ2 )((a, b] × (a, b])
= µ1 × µ2 (A) + µ1 × µ2 (B) − µ1 × µ2 (C)
%
%
=
[F1 (y) − F1 (a)]dF2 (y) +
[F2 (x) − F2 (a)]dF1 (x),
(a,b]
(a,b]
which yields (2.10), thereby completing the proof of Theorem 5.2.3.
!
If F1 and F2 are absolutely continuous with nonnegative densities f1 and
f2 w.r.t. the Lebesgue measure on (a, b], then (2.10) yields
% b
% b
F1 (x)f2 (x)dx = F1 (b)F2 (b) − F1 (a)F2 (a) −
F2 (x)f1 (x)dx. (2.14)
a
a
If f1 and f2 are any two Lebesgue integrable function on (a, b] that are
not necessarily nonnegative, one can decompose fi as fi = fi+ − fi− and
apply (2.14) to fi+ ’s and fi− ’s separately. Then, by linearity, it follows that
the relation (2.14) also holds for the given f1 and f2 . Thus, the standard
‘integration by parts’ formula is a special case of Theorem 5.2.3. Relations
(2.10) and (2.14) can be extended to unbounded intervals under suitable
conditions on F1 and F2 (Problem 5.5).
Remark 5.2.2: The measure space (Ω1 ×Ω2 , F1 ×F2 , µ1 ×µ2 ) constructed
using the integrals in (1.4) and (1.5) is not necessarily complete. As mentioned at the beginning of this section, the approach based on the extension
theorem (applied to the set function defined in (1.1) on the algebra A of
finite disjoint unions of measurable rectangles) does yield a complete measure space (Ω1 ×Ω2 , M, λ) such that F1 ×F2 ⊂ M
& and λ(A) = (µ1 ×µ2 )(A)
for all A in F1 × F2 . To compute the integral f dλ of an M-measurable
function f w.r.t. λ, formula (2.5) in Tonelli’s theorem and formulas (2.7)
and (2.8) in Fubini’s theorem continue to be valid with some modifications.
In the following, a modified statement of Fubini’s theorem is presented.
Similar modification also holds for Tonelli’s theorem (cf. Royden (1988),
Chapter 12, Section 4).
Theorem 5.2.4: Let (Ωi , Fi , µi ), i = 1, 2 be σ-finite measure spaces
and let f ∈ L1 (Ω1 × Ω2 , M, µ1 × µ2 ). Let (Ωi , F̄i , µ̄i ) be the completion
of (Ωi , Fi , µi ), i = 1, 2. Then there exist sets Bi in F̄i , i = 1, 2 such that
5.3 Extensions to products of higher orders
157
(i) µ̄i (Bic ) = 0, i = 1, 2,
(ii) for ω1 ∈ B1 , f (ω1 , ·) is F̄2 -measurable,
&
(iii) Ω2 |f (ω1 , ·)|dµ̄2 < ∞ for all ω1 ∈ B1 ,
/ &
f (ω1 , ·)dµ̄2 for ω1 in B,
Ω2
(iv) g1 (ω1 ) ≡
0
for ω1 in B1c
is F̄1 -measurable,
&
&
(v) Ω1 g1 dµ̄1 = Ω1 ×Ω2 f dλ.
Further, a similar statement holds for i = 2.
5.3 Extensions to products of higher orders
Definition 5.3.1: Let (Ωi , Fi ), i = 1, . . . , k be measurable spaces, 2 ≤
k < ∞. Then
(a) ×ki=1 Ωi = Ω1 × . . . × Ωk = {(ω1 , . . . , ωi ) : ω :∈ Ωi for i = 1, . . . , k} is
called the product set of Ω1 , . . . , Ωk .
(b) A set of the form A1 × . . . × Ak with Ai ∈ Fi , 1 ≤ i ≤ k is called a
measurable rectangle. The product σ-algebra on ×ki=1 Ωi , denoted by
×ki=1 Fi , is the σ-algebra generated by the collection of all measurable
rectangles, i.e.,
×ki=1 Fi = σ1{A1 × . . . × Ak : Ai ∈ Fi , 1 ≤ i ≤ k}2.
(c) (×ki=1 Ωi , ×ki=1 Fi ) is called the product space or the product measurable
space. When (Ωi , Fi ) = (Ω, F) for all 1 ≤ i ≤ k, the product space
will be denoted by (Ωk , F k ).
To define the product measure, one starts with a natural set function
on C, the class of all measurable rectangles, extends it to the algebra A of
finite disjoint unions of measurable rectangles by linearity, and then uses
the extension theorem (Theorem 1.3.3) to obtain a measure on the product
space. The details of this construction are now described below.
Define a set function µ on C by
µ(A) =
k
K
µ(Ai )
(3.1)
i=1
where A = A1 × A2 × ×Ak with Ai ∈ Fi , i = 1, 2, . . . , k. #
Next extend it
m
to the algebra A by linearity. If B ∈ A is of the form B = j=1 Bj where
158
5. Product Measures, Convolutions, and Transforms
{Bj : 1, . . . , m} ⊂ C are disjoint, then define µ(B) by
µ(B) =
m
)
µ(Bj ).
(3.2)
j=1
If the set B ∈ A admits two different representations as finite unions of
measurable rectangles then it is not difficult to verify that the value of µ(B)
remains unchanged. Next it is shown that µ is a measure on A.
Proposition 5.3.1:. Let µ be as defined by (3.1) and (3.2) above on the
algebra A. Then, it is countably additive on A.
#
Proof: Let {Bn }∞
n=1 ⊂ A be disjoint such that B =
n≥1 Bn also belongs
to A. It is enough to show that
µ(B) =
∞
)
µ(Bn ).
(3.3)
n=1
#,
Let B and {Bn }∞
n=1 admit the representations B =
i=1 Ai and Bj =
#,j
r=1 Ajr where 2 and 2j are (finite) integers, Ai , Ajr ∈ C and each of the
,j
, j ≥ 1 is disjoint. Then each Ai can be
collections {Ai },i=1 and {Ajr }r=1
written as
,n
9
8*
* *
Bn =
(Ai ∩ Anr ).
Ai = Ai ∩
n≥1 r=1
n≥1
Suppose it is shown that for each i ≥ 1,
µ(Ai ) =
,j
∞ )
)
j=1 r=1
µ(Ai ∩ Ajr ).
Then, by (3.2),
µ(B)
=
,
)
µ(Ai )
i=1
=
,j
, )
∞ )
)
i=1 j=1 r=1
=
,j
,
∞ )
)
)
j=1 r=1 i=1
=
,j
∞ )
)
j=1 r=1
µ(Ai ∩ Ajr )
µ(Ai ∩ Ajr )
µ(B ∩ Ajr ),
(3.4)
5.3 Extensions to products of higher orders
159
where the last equality follows from the representation of B ∩ Ajr as
#,
i=1 (Ai ∩ Ajr ) and the fact that Ai ∩ Ajr ∈ C for all i, j, r. Since
Ajr ⊂ Bj ⊂ B, the above yields
,j
∞ )
)
µ(Ajr ) =
j=1 r=1
∞
)
µ(Bj )
(by (3.2))
j=1
which establishes (3.3).
Thus, it#remains to show (3.4). This is implied by the following:
Let C = n≥1 Cn where {Cn }∞
n=1 is a collection of disjoint measurable
rectangles and C is also a measurable rectangle. Then
µ(C) =
∞
)
µ(Ci ).
(3.5)
i=1
Let C =#A1 × A2 × ×Ak and Ci = Ai1 × Ai2 × ×Aik , i = 1, 2, . . ..
Since C = n≥1 Cn and {Cn }∞
n=1 are disjoint, this implies IC (ω1 , . . . , ωk ) =
$∞
i=1 ICi (ω1 , . . . , ωk ), for all (ω1 , . . . , ωk ) ∈ Ω1 × . . . × Ωk . That is,
k
K
IAj (ωj ) =
j=1
∞ K
k
)
IAij (ωj )
(3.6)
i=1 j=1
for all (ω1 , . . . , ωk ) ∈ Ω1 × . . . × Ωk . Integration of both sides of (3.6) w.r.t.
µ1 over Ω1 yields
µ1 (A1 )
-K
k
IAj (ωj )
j=2
.
=
∞
)
µ1 (Ai1 )
i=1
k
K
IAij (ωj ),
j=2
and by iteration
k
K
µj (Aj ) =
j=1
Hence, (3.5) follows.
∞ K
k
)
µj (Aij ).
i=1 j=1
!
By the extension theorem (Theorem 1.3.3), there exists a σ-algebra M
and a measure µ̄ such that
(i) (×ki=1 Ωi , M, µ̄) is complete,
(ii) ×ki=1 Fi ⊂ M,
(iii) µ̄(A) = µ(A)
and
for all A ∈ A.
Thus, the above procedure yields an extension µ̄ of µ on A, and this extension is unique when all the µi ’s are σ-finite. Further, under the hypothesis that µ1 , . . . , µk are σ-finite, the analogs of formulas (1.4) and (1.5) for
160
5. Product Measures, Convolutions, and Transforms
computing the product measure of a set via the iterated integrals are valid.
More generally, the Tonelli-Fubini theorems extend to the k-fold product
spaces in an obvious manner. For example, if (Ωi , Fi , µi ) are σ-finite measure spaces for i = 1, . . . , k and f is a nonnegative measurable function on
(×ki=1 Ωi , ×ki=1 Fi , µ̄), then
%
% %
%
f dµ̄ =
···
f (ω1 , ω2 , ωk )µi1 (dωi1 ) . . . µik (dωik )
Ωi1
Ωi2
Ωik
for any permutation (i1 , i2 , . . . , ik ) of (1, 2, . . . , k).
Definition 5.3.2: The measure space (×ki=1 Ωi , M, µ̄) is called the complete product measure space and µ̄ the complete product measure.
Remark 5.3.1: If (Ωi , Fi , µi ) ≡ (R, L, m) where L is the Lebesgue σalgebra and m is the Lebesgue measure, then the above extension coincides
with the (Rk , Lk , mk ), where Lk is the Lebesgue σ-algebra in Rk and mk
is the k dimensional Lebesgue measure.
5.4 Convolutions
5.4.1
Convolution of measures on (R, B(R))
!
"
In this section, convolution of measures on R, B(R) is discussed. From
this, one can easily obtain convolution of sequences, convolution of functions in L1 (R) and convolution of functions with measures.
!
"
Proposition 5.4.1: Let µ and λ be two σ-finite measures on R, B(R) .
For any Borel set A in B(R), let
% %
(4.1)
(µ ∗ λ)(A) ≡
IA (x + y)µ(dx)λ(dy).
!
"
Then (µ ∗ λ)(·) is a measure on R, B(R) .
Proof: Let h(x, y) ≡ x + y for (x, y) ∈ R × R. Then h : R × R → R
is continuous and hence is 1B(R) × B(R), B(R)2-measurable. Consider the
measure (µ × λ)h−1 (·) on 1R, B(R)2 induced by the map h and the measure
µ × λ on 1R × R, B(R) × B(R)2. Clearly
(µ ∗ λ)(·) = (µ × λ)h−1 (·)
and hence the proposition is proved.
!
"
Definition 5.4.1: For any two σ-finite measures µ and λ on R, B(R) ,
the measure (µ ∗ λ)(A) defined in (4.1) is called the convolution of µ and
λ.
!
5.4 Convolutions
161
The following proposition is easy to verify.
!
"
Proposition 5.4.2: Let µ1 , µ2 , µ3 be σ-finite measures on R, B(R) .
Then
(i) (Commutativity) µ1 ∗ µ2 = µ2 ∗ µ1 ,
(ii) (Associativity) µ1 ∗ (µ2 ∗ µ3 ) = (µ1 ∗ µ2 ) ∗ µ3 ,
(iii) (Distributive) µ1 ∗ (µ2 + µ3 ) = µ1 ∗ µ2 + µ1 ∗ µ3 ,
(iv) (Identity element) µ1 ∗ δ{0} = µ1
where δ{0} (·) is the delta measure at 0, i.e.,
/
1 if 0 ∈ A,
δ{0} (A) =
0 if 0 )∈ A
for all A ∈ B(R).
k
Remark
" (Extension to R ). Definition 5.4.1 extends to measures
! k 5.4.1:
k
on R , B(R ) for any integer k ≥ 1 as well as to any space that is a commutative group under addition such as the sequence space R∞ , the function
spaces C[0, 1] and C[0, ∞). These are relevant in the study of stochastic processes.
Remark 5.4.2: (Sums of independent random variables). It will be seen
in Chapter 7 that if X and Y are two independent random variables on a
probability space (Ω, F, P ), then PX ∗ PY = PX+Y where for any random
variable Z on
! (Ω, F,"P ), PZ is the distribution of Z, i.e., the probability
measure on R, B(R) induced by P and Z, i.e., PZ (·) = P Z −1 (·) on B(R).
Remark 5.4.3: (Extension
to
!
" signed measures). Let µ and λ be two finite
signed measures on R, B(R) as defined in Section 4.2. Let µ = µ+ − µ− ,
λ = λ+ − λ− be the Jordan decomposition of µ and λ, respectively. Then
the product measure µ × λ can be defined as the signed measure
C B
C
B
µ × λ ≡ (µ+ × λ+ ) + (µ− × λ− ) − (µ+ × λ− ) + (µ− + λ+ )
≡ γ1 − γ2 , say.
(4.2)
This is well defined since the measures (µ+ × λ+ ) + (µ− × λ− ) and (µ+ ×
λ− )+(µ− ×λ+ ) are both finite measures. Now the definition of convolution
of measures in (4.1) carries over to signed measures using the definition of
integration w.r.t. signed measures discussed in Section 4.2.
Definition
5.4.2:
Let µ and ν be (finite) signed measures on a measurable
!
"
space R, B(R) . The convolution of µ and ν, denoted by µ ∗ ν is the signed
measure defined by
% %
(µ ∗ ν)(A) =
IA (x + y)d(µ × ν)
162
5. Product Measures, Convolutions, and Transforms
% %
=
IA (x + y)dγ1 −
% %
IA (x + y)dγ2
(4.3)
where γ1 and γ2 are as in (4.2).
5.4.2
Convolution of sequences
Definition 5.4.3: Let ã ≡ {an }n≥0 and b̃ ≡ {bn }n≥0 be two sequences
of real numbers. Then the convolution of ã and b̃ denoted by ã ∗ b̃ is the
sequence
n
)
aj bn−j , n ≥ 0.
(4.4)
(ã ∗ b̃)(n) =
j=0
$∞
$∞
If j=0 |aj | < ∞ and j=0 |bj | < ∞, then ã ∗ b̃ corresponds to the convolution of the signed measures ã and b̃ on Z+ , defined by ã(i) = ai , b̃(i) = bi ,
i ∈ Z+ .
Example 5.4.1:
(a) Let bn ≡ 1 for n ≥ 0. Then for any {an }n≥0 , (ã ∗ b̃)(n) =
(b) Fix 0 < p < 1, k1 , k2 positive integers. For n ∈ Z+ , let
- .
k1 n
an ≡
p (1 − p)k1 −n I[0,k1 ] (n)
n
- .
k2 n
bn ≡
p (1 − p)k2 −n I[0,k2 ] (n).
n
" n
!
2
p (1 − p)k2 −n I[0,k1 +k2 ] (n).
Then (ã ∗ b̃)(n) = k1 +k
n
$n
j=0
aj .
(c) Fix 0 < λ1 , λ2 < ∞. Let
an
≡
bn
≡
λn1
, n = 0, 1, 2, . . .
n!
λn
e−λ2 2 , n = 0, 1, 2, . . . .
n!
e−λ1
Then (ã ∗ b̃)(n) = e−(λ1 +λ2 )
(λ1 +λ2 )n
,
n!
n = 0, 1, 2, . . . .
The verification of these claims is left as an exercise (Problem 5.20).
A useful technique to determine ã ∗ b̃ is the use of generating functions.
This will be discussed in Section 5.5.
5.4.3
Convolution of functions in L1 (R)
Let L1 (R) ≡ L1 (R, B(R),&m) where m(·) is the Lebesgue
& measure. In the
following, for f ∈ L1 (R), f dm will also be written as f (x)dx.
5.4 Convolutions
163
Proposition 5.4.3: Let f , g ∈ L1 (R). Then for almost all x (w.r.t. m),
%
|f (x − u)| |g(u)|du < ∞.
(4.5)
Proof: Let k(x, u) = f (x − u)g(u). Since π(x, u) ≡ x − u is a continuous
map from R × R → R, it is 1B(R) × B(R), B(R)2-measurable. Also, since
f and g are Borel measurable, it follows that k : R × R → R is 1B(R) ×
Also note that by the translation invariance of m,
&B(R), B(R)2-measurable.
&
|f (x − u)|dx = |f (x)|dx for any Borel measurable f : R → R. Hence,
by Tonelli’s theorem,
% %
% '%
(
|k(x, u)|dxdu =
|k(x, u)|dx du
%
'%
(
=
|g(u)|
|f (x)|dx du
= 7g71 7f 71 < ∞.
(4.6)
Also by Tonelli’s theorem
% %
% '%
(
|k(x, u)|dxdu =
|k(x, u)|du dx.
By (4.6), this yields
%
|k(x, u)|du < ∞
a.e. (m)
which is the same as (4.5).
!
Definition 5.4.4: Let f , g ∈ L1 (R). Then the convolution of f and g,
denoted by f ∗ g is the function defined a.e. (m) by
%
(f ∗ g)(x) ≡ f (x − u)g(u)du.
(4.7)
Note that by (4.5) this is well defined.
Proposition 5.4.4: Let f , g ∈ L1 (R). Then
(i) f ∗ g = g ∗ f
(ii) f ∗ g ∈ L1 (R) and 7f ∗ g71 ≤ 7f 71 7g71
(
(' &
'&
&
gdm .
f dm
(iii) f ∗ g dm =
Proof: For (i), use the change of variable u → x − u and the translation
invariance of m. For (ii) use (4.6). For (iii), use Fubini’s theorem.
!
164
5. Product Measures, Convolutions, and Transforms
It is easy to see that if µ and λ denote the signed measures with RadonNikodym derivatives f and g, respectively, w.r.t. m, then µ ∗ λ is the signed
measure with Radon-Nikodym derivative f ∗ g w.r.t. m.
Example 5.4.2: Here are two examples from probability theory.
(a) (Convolutions of Uniform [0, 1] densities). Let f = g = I[0,1] . Then
/
x
0≤x≤1
(f ∗ g)(x) =
2 − x 1 ≤ x ≤ 2.
(b) (Convolutions of N (0, 1) densities). Let f (x) ≡ g(x) ≡
−∞ < x < ∞. Then (f ∗ g)(x) =
5.4.4
√1 √1
2π 2
2
− x4
e
√1
2π
e−
x2
2
,
, −∞ < x < ∞.
Convolution of functions and measures
&
Let f ∈ L1 (R) and λ be a signed measure. Let µ(A) ≡ A f dm, A ∈ B(R).
Then it can be shown that µ∗λ is dominated by m and its Radon-Nikodym
derivative is given by
%
f ∗ λ(x) ≡ f (x − u)λ(du), x ∈ R.
(4.8)
Note that f ∗λ in (4.8) is well
! defined
" for any nonnegative Borel measurable
f and any measure λ on R, B(R) .
5.5 Generating functions and Laplace transforms
Definition 5.5.1: Let {an }n≥0 be a sequence in R and let ρ =
"−1
!
lim |an |1/n
. The power series
n→∞
A(s) ≡
∞
)
an sn ,
(5.1)
n=0
defined for all s in (−ρ, ρ), is called the generating function of {an }n≥0 .
Note that the power series in (5.1) converges absolutely for |s| < ρ and
ρ is called the radius of convergence of A(s) (cf. Appendix A.2).
The usefulness of generating functions is given by the following:
Proposition 5.5.1: Let {an }n≥0 and {bn }n≥0 be two sequences in R. Let
{cn }n≥0 be the convolution of {an } and {bn }. That is,
cn =
n
)
j=0
aj bn−j , n ≥ 0.
5.5 Generating functions and Laplace transforms
Then
C(s) = A(s)B(s)
165
(5.2)
for all s in (−ρ, ρ), where A(·), B(·), and C(·) are, respectively, the
generating function of the sequences {an }n≥0 , {bn }n≥0 and {cn }n≥0 and
"−1
"−1
!
!
and ρb = lim |bn |1/n
.
ρ = min(ρa , ρb ) with ρa = lim |an |1/n
n→∞
n→∞
Proof: By Tonelli’s theorem applied to the counting measure on the product space (Z+ × Z+ ),
.
-)
∞
n
)
n
|s|
|aj | |bn−j |
=
n=0
∞
)
j=0
j=0
|aj | |s|j
-)
∞
n=j
.
|s|n−j |bn−j |
= A(|s|)B(|s|) < ∞ if |s| < ρ.
Now by Fubini’s theorem, (5.2) follows.
!
It is easy to verify the claims in Example 5.4.1 by using the above proposition and the uniqueness of the power series coefficients (Problem 5.20).
Proposition 5.5.2: (Renewal sequences). Let {an }n≥0 , {bn }n≥0 , {pn }n≥0
be sequences in R such that
an = bn +
n
)
j=0
and p0 = 0. Then
A(s) =
an−j pj , n ≥ 0
(5.3)
B(s)
1 − P (s)
for all s such that |s| < ρ and P (s) )= 1, where ρ = min(ρb , ρp ), ρb =
"−1
"−1
!
!
lim |bn |1/n
, ρp = lim |pn |1/n
.
n→∞
n→∞
For applications of this to renewal theory, see Chapter 8.
Definition 5.5.2: (Laplace transform). Let f : [0, ∞) → R be Borel
measurable. The function
%
esx f (x)dx,
(5.4)
(Lf )(s) ≡
[0,∞)
defined for all s in R such that
%
[0,∞)
esx |f (x)|dx < ∞,
is called the Laplace transform of f .
(5.5)
166
5. Product Measures, Convolutions, and Transforms
It is easily seen that if (5.5) holds for some s = s0 , then it does for all
s < s0 . The analog of Proposition 5.5.1 is the following.
Proposition 5.5.3: Let f , g ∈ L1 (R). Then
L(f ∗ g)(s) = Lf (s)Lg(s)
(5.6)
for all s such that (5.5) holds for both f and g.
transform). Let µ be a measure on
!Definition
" 5.5.3: (Laplace-Stieltjes
!
"
R, B(R) such that µ (−∞, 0) = 0. The function
%
µ∗ (s) ≡ Lµ(s) ≡
esx µ(dx)
[0,∞)
is called the Laplace-Stieltjes transform of µ. Clearly, µ∗ (s) is well defined
for all s in R. However, µ∗ (s0 ) = ∞ implies µ∗ (s) = ∞ for s ≥ s0 .
!
"
Proposition
!
" 5.5.4:! Let µ "and λ be measures on R, B(R) such that
µ (−∞, 0) = 0 = λ (−∞, 0) . Then
L(µ ∗ λ)(s) = Lµ(s)Lλ(s) for all s in R.
For an inversion formula to obtain a probability measure µ from Lµ(·),
see Feller (1966), Section 13.4.
5.6 Fourier series
!
"
In this section, Lp [0, 2π] stands for Lp [0, 2π], B([0, 2π]), m where m(·) is
Lebesgue measure and 0 < p < ∞.
Definition 5.6.1: For f ∈ L1 [0, 2π], n ≥ 0, let
% 2π
1
an ≡
f (x) cos nx dx
2π 0
% 2π
1
bn ≡
f (x) sin nx dx,
2π 0
sn (f, x) ≡ a0 +
n
)
(aj cos jx + bj sin jx).
(6.1)
(6.2)
j=1
Then {an , bn , n = 0, 1, 2, . . .} are called the Fourier coefficients of f and
the sequence {sn (f, x) : n = 0, 1, 2, . . .} is called the partial sum sequence
of the Fourier series expansion of f .
Since C[0, 2π] ⊂ L2 [0, 2π] ⊂ L1 [0, 2π], the Fourier coefficients and the
partial sum sequence of the Fourier series are well defined for f in C[0, 2π]
5.6 Fourier series
167
and f in L2 [0, 2π]. J. Fourier introduced these series in the early 19th
century to approximate certain functions exhibiting a periodic behavior
that arose in the study of physics and mechanics and in particular in the
theory of heat conduction (see Körner (1989) and Bhatia (2003)).
A natural question is: In what sense does sn (f, x) approximate f (x)? It
turns out that one can prove the strongest results for f in C[0, 2π], less
stronger results for f in L2 [0, 2π], and finally, for f in L1 [0, 2π].
In the early 19th century it was believed that for any f in C[0, 2π],
sn (f, x) converged to f (x) for all x in [0, 2π]. But in 1876, D. Raymond
constructed a continuous function f for which limn→∞ |sn (f ; x0 )| = ∞ for
some x0 . However, Fejer showed in 1903 that for f in C[0, 2π], sn (f, ·) does
converge to f uniformly in the Cesaro sense.
Theorem 5.6.1: (Fejer). Let f ∈ C[0, 2π] and sn (f, ·), n ≥ 0 be as in
Definition 3.4.1. Let
Dn (f, ·) =
n−1
1)
sj (f, ·), n ≥ 1.
n j=0
(6.3)
Then
Dn (f, ·) → f (·) uniformly on [0, 2π] as
n → ∞.
(6.4)
For the proof of this result, the following result of some independent
interest is needed.
Lemma 5.6.2: (Fejer ). Let for m ≥ 1
m−1
2 )
(m − j) cos jx, x ∈ R.
m j=1
Km (x) ≡ 1 +
(6.5)
Then
(i)
Km (x) =
(ii) For δ > 0,



1
m
-
sin mx
2
sin x
2
m
.2
sup{Km (x) : δ ≤ |x| ≤ 2π − δ} → 0
(iii)
1
2π
%
0
2π
if x )= 0
(6.6)
if x = 0 .
as
Km (x)dx = 1.
m → ∞.
(6.7)
(6.8)
168
5. Product Measures, Convolutions, and Transforms
Proof: Clearly, (iii) follows from (6.5) on integration. Also, (ii) follows
from (i) since for δ ≤ x ≤ 2π − δ,
Km (x) ≤
1
1
→0
m (sin 2δ )2
as
m → ∞.
To establish (i) note that using Euler’s formula (cf. Section A.3, Appendix)
eιx = cos x + ι sin x, Km (x) can be written as
Km (x)
=
=
=
(m−1)
)
1
m
j=−(m−1)
(m − |j|)eιjx
2(m−1)
1 )
(m − |j − (m − 1)|)eι(j−(m−1))x
m j=0
1
m
- m−1
)
e
k=0
For x )= 0,
Km (x)
ι(k− m−1
2 )x
.2
.
1 ' − ι(m−1)x 1 − eιmx (2
2
,
e
m
1 − eιx
- − ιmx
ιmx .2
1 e 2 −e 2
,
ιx
ιx
m
e− 2 − e 2
(2
1 ' sin mx
2
.
x
m sin 2
=
=
=
For x = 0, (6.5) yields
Km (0)
=
1+
=
1+
m−1
2 )
(m − j)
m j=1
2 (m − 1)m
= m.
m
2
!
Proof of Theorem 5.6.1: Let f ∈ C[0, 2π]. Let {an , bn }n≥0 , sn (f, ·),
Dm (f, ·) be as in (6.1), (6.2), and (6.3). Then from the definition of
an , bn , n ≥ 0, it follows that
Dm (f, ·)
=
1
2π
=
1
2π
%
0
%
0
2π
2π
Km (x − u)f (u)du
f (x − u)Km (u)du
(6.9)
5.6 Fourier series
169
where Km (·) is defined in (6.5) and f (·) is extended to all of R periodically
with period 2π. Since f ∈ C[0, 2π], given # > 0, there exists a δ > 0 such
that
x, y ∈ [0, 2π], |x − y| < δ ⇒ |f (x) − f (y)| < #.
Also from (6.7), for this δ > 0, there exist an m0 such that m ≥ m0 ⇒
sup{Km (x) : δ ≤ x ≤ 2π −δ} < #. Now (6.9) yields, for x ∈ [0, 2π], m ≥ m0
% 2π
1
|Dm (f, x) − f (x)| ≤
|f (x − u) − f (x)|Km (u)du
2π 0
%
1
=
|f (x − u) − f (x)|Km (u)du
2π
(δ≤u≤2π−δ)c
+
≤
1
2π
%
(δ≤u≤2π−δ)
|f (x − u) − f (x)|Km (u)du
1
(# + 27f 7#2π)
2π
where 7f 7 = sup{|f (x)| : x ∈ [0, 2π]}. Thus, m ≥ m0 ⇒ sup{|Dm (f, x) −
/)
. Since # > 0 is arbitrary, the proof of
f (x)| : x ∈ [0, 2π]} ≤ # (1+4π/f
2π
Theorem 5.6.1 is complete.
!
An immediate consequence of Theorem 5.6.1 is the completeness of the
trigonometric functions.
Theorem 5.6.3: The collection T0 ≡ {cos nx : n = 0, 1, 2, . . .} ∪ {sin nx :
n = 1, 2, . . .} is a complete orthogonal system for L2 [0, 2π].
Proof: By Theorem 5.6.1 for each f ∈ C[0, 2π] and # > 0, there exists a
finite linear combination Dm (f, ·) of {cos nx : n = 0, 1, 2, . . .} and {sin nx :
n = 1, 2, . . .} such that
% 2π
|f − Dm (f, ·)|2 dx
0
≤ 2π(sup{|f (x) − Dm (f, x)| : x ∈ [0, 2π]})2 < #2 .
Also, from Theorem 2.3.14 it is known that given any g ∈ L2 [0, 2π], and
& 2π
any # > 0, there is a f ∈ C[0, 2π] such that 0 |f − g|2 dx < #2 . Thus, for
any g ∈ L2 [0, 2π], # > 0, there is a f ∈ C[0, 2π] and an m ≥ 1 such that
7g − Dm (f, ·)72 < 2#.
That is, the set T of all finite linear combinations of the functions in the
class T0 is dense in L2 [0, 2π]. Further, it is easy to verify that h1 , h2 ∈ T0 ,
h1 )= h2 implies
%
2π
0
h1 (x)h2 (x)dx = 0,
170
5. Product Measures, Convolutions, and Transforms
i.e., T0 is an orthogonal family. Since T0 is orthogonal and T is dense in
!
L2 [0, 2π], T0 is complete.
Definition 5.6.2: A function in T is called a trigonometric polynomial.
Thus, the above theorem says that trigonometric polynomials are dense
in L2 [0, 2π]. Completeness of T in L2 [0, 2π] and the results of Section 3.3
lead to
Theorem 5.6.4: Let f ∈ L2 [0, 2π]. Let {(an , bn ), n = 0, 1, 2, . . .} and
{sn (f, ·)}n≥0 be the associated Fourier coefficient sequences and partial sum
sequence of the Fourier series for f as in Definition 5.6.1. Then
(i) sn (f, ·) → f in L2 [0, 2π],
(ii)
& 2π 2
1
2
2
n=0 (ăn + b̆n ) = 2π 0 |f | dx where
& 2π
1
c2n = 2π
(cos nx)2 dx = 12 , and for
0
&
2π
1
1
2
2π 0 (sin nx) dx = 2 .
$∞
for n ≥ 0, ăn = an /cn , with
n ≥ 1, b̆n = dbnn with d2n =
& 2π
1
f g dx = a0 α0 +
(iii) Further,
if f , g ∈( L2 [0, 2π], then 2π
0
'
$∞
bn βn
an αn
+ d2 , where {(an , bn ) : n = 0, 1, 2, . . .} and
n=1
c2n
n
{(αn , βn ) : n = 0, 1, 2, . . .} are, respectively, the Fourier coefficients
of f and g.
Clearly (ii) above is a restatement of Bessel’s equality. Assertion (iii) is
known as the Parseval identity. As for convergence pointwise or almost
everywhere, A. N. Kolmogorov showed in 1926 (see Körner (1989)) that
there exists an f ∈ L1 [0, 2π] such that limn→∞ |sn (f, x)| = ∞ everywhere
on [0, 2π]. This led to the belief that for f )∈ C[0, 2π], the mean square convergence of (i) in Theorem 5.6.4 cannot be improved upon. But L. Carleson
showed in 1964 (see Körner (1989)) that for f in L2 [0, 2π], sn (f, ·) → f (·)
almost everywhere. Finally, turning to L1 [0, 2π], one has the following:
Theorem 5.6.5:
Let f ∈ L1 [0, 2π]. Let {(an , bn ) : n ≥ 0} be as in (6.1)
$∞
and satisfy n=0 (|an | + |bn |) < ∞. Let sn (f, ·) be as in (6.2). Then sn (f, ·)
converges uniformly on [0, 2π] and the limit coincides with f almost everywhere.
$∞
Proof: Note that
n=0 (|an | + |bn |) < ∞ implies that the sequence
{sn (f, ·)}n≥0 is a Cauchy sequence in the Banach space C[0, 2π] with the
sup-norm. Thus, there exists a g in C[0, 2π] such that sn (f, ·) → g uniformly on [0, 2π]. It is easy to check that this implies that g and f have
the same Fourier coefficients. Set h = g − f . Then h ∈ L1 [0, 2π] and the
Fourier coefficients of h are all zero. This implies that h is orthogonal to
the members of the class T , which in turn yields that h is orthogonal to all
5.6 Fourier series
171
continuous functions in C[0, 2π], i.e.,
% 2π
h(x)k(x)dx = 0
0
for all k ∈ C[0, 2π]. Since h ∈ L1 [0, 2π] and for any interval A ⊂ [0, 2π],
there exists a sequence {kn }n≥1 of uniformly bounded continuous functions,
such that kn → IA a.e. (m), by the DCT,
%
%
h(x)IA (x)dx = lim
h(x)kn (x)dx = 0.
n→∞
This in turn implies that h = 0 a.e., i.e., g = f a.e.
!
Remark 5.6.1: If f ∈ L2 [0, 2π], the Fourier coefficients {an , bn } are square
summable and hence go to zero as n → ∞. What if f ∈ L1 [0, 2π]?
If f ∈ L1 [0, 2π], one can assert the following:
Theorem 5.6.6: (Riemann-Lebesgue lemma). Let f ∈ L1 [0, 2π]. Then
% 2π
% 2π
lim
f (x) cos nx dx = 0 = lim
f (x) sin nx dx.
n→∞
n→∞
0
0
Proof: The lemma holds if f = IA for any interval A ⊂ [0, 2π] and since
step functions (i.e., linear combinations of indicator functions of intervals)
!
are dense in L1 [0, 2π], the lemma is proved.
It can be shown that the mapping f → {(an , bn )}n≥0 from L1 [0, 2π] to
bivariate sequences that go to zero as n → ∞ is one-to-one but not onto
(Rudin (1987), Chapter 5).
Remark 5.6.2: (The complex case). Let T ≡ {z : z = eιθ , 0 ≤ θ ≤ 2π}
be the unit circle in the complex plane C. Every function g : T → C can be
identified with a function f on R by f (t) = g(eιt ). Clearly, f (·) is periodic
on R with period 2π. In the rest of this section, for 0 < p < ∞, Lp (T )
will stand for the
& collection of all Borel measurable functions f : [0, 2π]
to C such that [0,2π] |f |p dm < ∞ where m(·) is the Lebesgue measure. A
trigonometric polynomial is a function of form
f (·) ≡
k
)
n=−k
αn eιnx ≡ a0 +
k
)
(an cos nx + bn sin nx),
n=1
k < ∞, where {αn }, {an }n≥0 , and {bn }n≥0 are sequences of complex numbers.
The completeness of the trigonometric polynomials proved in Theorem
5.6.3 implies that the family {eιnx : n = 0, ±1, ±2, . . .} is a complete orthonormal basis for L2 (T ), which is a complex Hilbert space.
172
5. Product Measures, Convolutions, and Transforms
Thus Theorem 5.6.4 carries over to this case.
Theorem 5.6.7:
(i) Let f ∈ L2 (T ). Then,
k
)
n=−k
where
αn ≡
1
2π
Further,
∞
)
in
αn eιnx → f
n=−∞
%
2π
0
L2 (T )
f (x)e−ιns dx, n ∈ Z.
%
1
2π
|αn |2 =
2π
0
|f |2 dm.
(ii) For any sequence {αn }n∈Z of complex numbers such that
:
;
$k
$∞
2
ιnx
conn=−∞ |αn | < ∞, the sequence fk (x) ≡
n=−k αn e
k≥1
2
verges in L (T ) to a unique f such that
αn =
1
2π
%
2π
0
(iii) For any f , g ∈ L2 (T ),
∞
)
αn β̄n =
n=−∞
1
2π
f (x)e−ιnx dx.
%
2π
0
f (x)g(x)dx
where
1
fˆ(n) =
2π
αn
=
βn
= ĝ(n) =
1
2π
%
2π
0
% 2π
0
f (x)e−ιnx dx,
g(x)e−ιnx dx, n ∈ Z.
Further,
∞
)
n=−∞
|αn βn | < ∞.
(iv) L2 (T ) is isomorphic to 22 (Z), the Hilbert space of all square summable
sequences of complex numbers on Z.
Similarly, Theorem 5.6.5 carries over to the complex case.
5.7 Fourier transforms on R
173
Theorem 5.6.8: Let f ∈ L1 (T ). Suppose
∞
)
|fˆ(n)| < ∞
%
f (x)e−ιnx dx, n ∈ Z.
n=−∞
where
1
fˆ(n) =
2π
Then
2π
0
sn (f, x) ≡
n
)
fˆ(j)e−ιjx
j=−n
converges uniformly on [0, 2π] and the limit coincides with f a.e. and hence
f is continuous a.e.
5.7 Fourier transforms on R
In this section and in Section 5.8, let Lp (R) stand for
%
|f |p dm < ∞}
Lp (R) ≡ {f : f : R → C, Borel measurable,
R
where m(·)&is Lebesgue measure. Also, for f ∈ L1 (R),
written as f (x)dx. Let
C0 ≡ {f : f : R → C, continuous and
&
R
(7.1)
f dm will often be
lim f (x) = 0}.
|x|→∞
Definition 5.7.1: For f ∈ L1 (R), t ∈ R,
%
fˆ(t) ≡ f (x)e−ιtx dx
(7.2)
(7.3)
is called the Fourier transform of f .
Proposition 5.7.1: Let f ∈ L1 (R) and fˆ(·) be as in (7.3). Then
(i) fˆ(·) ∈ C0 .
(ii) If fa (x) ≡ f (x − a), a ∈ R, then fˆa (t) = eιta fˆ(t), t ∈ R.
Proof:
(i) For any t ∈ R, tn → t ⇒ eιtn x f (x) → eιtx f (x) for all x ∈ R and since
|eιtn x f (x)| ≤ |f (x)| for all n and x, by the DCT fˆ(tn ) → fˆ(t). To
show that fˆ(t) → 0 as |t| → ∞, the same proof as that of Theorem
5.6.6 works. Thus, it holds if f = I[a,b] , for a, b ∈ R, a < b and since
the step functions are dense in L1 (R), it holds for all f ∈ L1 (R).
174
5. Product Measures, Convolutions, and Transforms
(ii) This is a consequence of the translation invariance of m(·), i.e., m(A+
a) = m(A) for all A ∈ B(R), a ∈ R.
!
The continuity of fˆ(·) can be strengthened to differentiability if, in addition to f ∈ L1 (R), xf (x) ∈ L1 (R). More generally, if f ∈ L1 (R) and
xk f (x) ∈ L1 (R) for some k ≥ 1, then fˆ(·) is differentiable k-times with all
derivatives fˆ(r) (t) → 0 as |t| → ∞ for r ≤ k (Problem 5.22).
Proposition 5.7.2: Let f , g ∈ L1 (R) and f ∗ g be their convolution as
defined in (4.7). Then
f!
∗ g = fˆĝ.
(7.4)
Proof:
f!
∗ g(t)
=
=
=
%
−ιtx
e
R
% '%
R
%
R
R
'%
R
(
f (x − u)g(u)du dx
(
e−ιt(x−u) f (x − u)e−ιtu g(u)du dx
(ft ∗ gt )(x)dx
(7.5)
where ft (x) = e−ιtx f (x), gt (x) = e−ιtx g(x). Thus, by Proposition 5.4.4
(' %
(
'%
ft (x)dx
gt (x)dx
f!
∗ g(t) =
R
=
fˆ(t)ĝ(t).
R
!
The process of recovering f from fˆ (i.e., that of finding an inversion
formula) can be developed along the lines of Fejer’s theorem (Theorem
5.6.1).
Theorem 5.7.3: (Fejer’s theorem). Let f ∈ L1 (R), fˆ(·) be as in (7.3)
and
% T
1
ST (f, x) ≡
(7.6)
fˆ(t)eιtx dt, T ≥ 0,
2π −T
%
1 R
ST (f, x)dT, R ≥ 0.
(7.7)
DR (f, x) ≡
R 0
(i) If f is continuous at x0 and f is bounded on R, then
lim DR (f, x0 ) = f (x0 ).
R→∞
(7.8)
(ii) If f is uniformly continuous and bounded on R, then
lim DR (f, ·) = f (·) uniformly on R.
R→∞
(7.9)
5.7 Fourier transforms on R
(iii) As R → ∞,
DR (f, ·) → f (·)
L1 (R).
in
175
(7.10)
(iv) If f ∈ Lp (R), 1 ≤ p < ∞, then as R → ∞,
DR (f, ·) → f (·)
in
Lp (R).
(7.11)
Corollary 5.7.4: (Uniqueness theorem). If f and g ∈ L1 (R) and fˆ(·) =
ĝ(·), then f = g a.e. (m).
Proof: Let h = f − g. Then h ∈ L1 (R) and ĥ(·) ≡ 0. Thus, ST (h, ·) ≡ 0
and DR (h, ·) ≡ 0 where ST (h, ·) and DR (h, ·) are as in (7.6) and (7.7).
Hence by Theorem 5.7.3 (iii), h = 0 a.e. (m), i.e., f = g a.e. (m).
!
Corollary 5.7.5: (Inversion formula). Let f ∈ L1 (R) and fˆ ∈ L1 (R).
Then
%
1
(7.12)
fˆ(t)eιtx dx a.e. (m).
f (x) =
2π R
Proof: Since fˆ ∈ L1 (R), by the DCT,
%
1
fˆ(t)eιtx dt
ST (f, x) →
2π R
as
T →∞
for all x in R and hence DR (f, x) has the same limit as R → ∞. Now (7.12)
follows from (7.10).
!
The following results, i.e., Lemma 5.7.6 and Lemma 5.7.7, are needed for
the proof of Theorem 5.7.3. The first one is an analog of Lemma 5.6.2.
Lemma 5.7.6: (Fejer ). For R > 0, let
KR (x) ≡
1 1
2π R
%
R
0
Then
(i)
KR (x) =
@
'%
T
−T
(
eιtx dt dT.
1 (1−cos Rx)
,
π
x2
x )= 0
R
2π
x = 0,
(7.13)
(7.14)
and hence KR (·) ≥ 0.
(ii) For δ > 0,
%
|x|≥δ
KR (x)dx → 0
as
R → ∞.
(7.15)
176
5. Product Measures, Convolutions, and Transforms
(iii)
%
∞
KR (x)dx = 1.
(7.16)
−∞
Proof:
(i) KR (0) =
1
2πR
&R
0
(2T )dT =
KR (x)
1
2π R.
For x )= 0 and R > 0,
%
=
1
2πR
=
1 2(1 − cos Rx)
.
2πR
x2
R
0
2 sin T x
dT
x
(ii) For δ > 0,
%
0≤
KR (x)dx
|x|≥δ
%
≤
2
2πR
=
2 1
→0
πR δ
|x|≥δ
1
dx
x2
as
R → ∞.
(iii)
%
∞
KR (x)dx
=
−∞
=
%
2 1 ∞ ' 1 − cos Rx (
dx
π R 0
x2
%
2 ∞ ' 1 − cos u (
du.
π 0
u2
Now
%
∞
1 − cos u
du
u2
0
% L'
1 − cos u (
= lim
du (by the MCT)
L→∞ 0
u2
% L'% u
(1
= lim
sin x dx 2 du
L→∞ 0
u
0
% L
'% L 1
(
= lim
sin x
du
dx (by Fubini’s theorem)
2
L→∞ 0
x u
%
' % L sin x
(
1 L
= lim
dx −
sin x dx
L→∞
x
L 0
0
% L
J% L
J
sin x
J
J
= lim
dx since J
sin x dxJ ≤ 1.
L→∞ 0
x
0
5.7 Fourier transforms on R
Thus,
%
∞
1 − cos u
du =
u2
0
(cf. Problem 5.9). Hence (iii) follows.
%
0
∞
π
sin x
dx =
x
2
Lemma 5.7.7: Let f ∈ Lp (R), 0 < p < ∞. Then
% ∞
|f (x − u) − f (x)|p dx → 0 as |u| → 0.
177
(7.17)
!
(7.18)
−∞
Proof: The lemma holds if f ∈ CK , i.e., if f is continuous on R (with
values in C) and vanishes outside a bounded interval. By Theorem 2.3.14,
such functions are dense in Lp (R). So given f ∈ Lp (R), 0 < p < ∞ and
# > 0, let g ∈ CK be such that
%
|f − g|p dm < #.
For any 0 < p < ∞, there is a 0 < cp < ∞ such that for all x, y, z ∈ (0, ∞),
|x + y + z|p ≤ cp (|x|p + |y|p + |z|p ). Then,
%
|f (x − u) − f (x)|p dx
%
!
≤ cp
|f (x − u) − g(x − u)|p + |g(x − u) − g(x)|p
%
"
+ |g(x) − f (x)|p dx
%
'
(
= cp 2# + |g(x − u) − g(x)|p du.
So
lim
|u|→0
%
|f (x − u) − f (x)|p dx ≤ cp 2#.
Since # > 0 is arbitrary, the lemma is proved.
Proof of Theorem 5.7.3: From (7.7)
% R '% T
'% ∞
( (
1
ιtx
DR (f, x) ≡
e
e−ιty f (y)dy dt dT
2πR 0
−T
−∞
% R '% T '% ∞
( (
1 1
eιtu f (x − u)du dt dT.
=
2π R 0
−T
−∞
!
Now Fubini’s theorem yields
% ∞
' 1 1 % R '% T
( (
DR (f, x) =
f (x − u)
eιtu dt dR du
2π R 0
−∞
−T
% ∞
f (x − u)KR (u)du
(7.19)
=
−∞
178
5. Product Measures, Convolutions, and Transforms
where KR (·) is as in (7.13).
Now let f be continuous at x0 and bounded on R by Mf . Fix # > 0 and
choose δ > 0 such that |x − x0 | < δ ⇒ |f (x) − f (x0 )| < #. From (7.16) and
(7.19),
%
!
"
DR (f, x0 ) − f (x0 ) =
f (x0 − u) − f (x0 ) KR (u)du
implying
|DR (f, x0 ) − f (x0 )|
≤
%
|u|<δ
+
%
|f (x0 − u) − f (x0 )|KR (u)du
|f (x0 − u) − f (x0 )|KR (u)du
%
< # + 2Mf
KR (u)du.
|u|≥δ
|u|≥δ
Now from (7.15), it follows that
lim |DR (f, x0 ) − f (x0 )| ≤ #,
R→∞
proving (i).
The proof of (ii) is similar to this and is omitted.
Clearly (iv) implies (iii). To establish (iv), note that for
& 1 ≤ p < ∞, by
Jensen’s inequality (which applies since KR (u) ≥ 0 and KR (u)du = 1),
for x in R,
%
J!
"Jp
|DR (f, x) − f (x)|p ≤ J f (x − u) − f (x) J KR (u)du
and hence
% '%
%
(
|f (x − u) − f (x)|p dx KR (u)du.
|DR (f, x) − f (x)|p dx ≤
Now (7.18) and the arguments in the proof of (i) yield (iv).
!
5.8 Plancherel transform
If f ∈ L2 (R), it need not be in L1 (R) and so the Fourier transform is not
defined. However, it is possible to extend the definition using an approximation procedure due to Plancherel.
&
Proposition 5.8.1: Let f ∈ L1 (R) ∩ L2 (R) and let fˆ(t) ≡ f (x)e−ιtx dx,
t ∈ R. Then, fˆ ∈ L2 (R) and
%
%
1
|f |2 dm =
|fˆ|2 dm.
(8.1)
2π
5.8 Plancherel transform
179
Proof: Let f˜(x) = f (−x) and g = f ∗ f˜. Since f and f˜ are in L1 (R), g is
well defined and is in L1 (R). Further by Cauchy-Schwarz inequality,
%
|f (x1 − u) − f (x2 − u)| |f˜(u)|du
|g(x1 ) − g(x2 )| ≤
(1/2 ' %
(
'%
2
|f˜(u)|2 du .
≤
|f (x1 − u) − f (x2 − u)| du
By Lemma 5.7.7, the right side goes to zero uniformly as |x1 − x2 | → 0.
Thus g(·) is uniformly continuous. Also, by Cauchy-Schwarz inequality,
J
J%
J
J
|g(x)| = J f (x − u)f˜(u)duJ
(1/2 ' %
(1/2
'%
2
≤
|f (u)| du
|f˜(u)|2 du
%
=
|f (u)|2 du,
and hence g(·) is bounded. Thus, g is continuous, bounded, and integrable
on R. By Theorem 5.7.3 (Fejer’s theorem)
DR (g, 0) → g(0)
But
g(0) =
%
as R → ∞.
(8.2)
%
(8.3)
f (u)f (−u)du =
|f |2 dm.
ˆ
By Proposition 5.7.2, ĝ(t) = fˆ(t)f˜(t). Also, note that
%
ˆ
˜
f (t) =
f˜(x)e−ιtx dx
%
=
f (−x)e−ιtx dx
fˆ(t)
=
and hence ĝ(t) = |fˆ(t)|2 ≥ 0. But
DR (g, 0)
=
=
Since ĝ(·) ≥ 0, by the MCT,
%
0
R
%
%
(
1 1 R' T
ĝ(t)dt dT
2π R 0
−T
% R
'
|t| (
1
ĝ(t) 1 −
dt.
π 0
R
% ∞
'
|t| (
dt ↑
ĝ(t) 1 −
ĝ(t)dt
R
0
as
R ↑ ∞.
180
5. Product Measures, Convolutions, and Transforms
Thus, lim DR (g, 0) =
R→∞
1
π
&∞
0
ĝ(t)dt. Since ĝ(−t) = ĝ(t) for all t in R,
1
lim DR (g, 0) =
R→∞
2π
Clearly, (8.2)–(8.4) imply (8.1).
%
∞
1
ĝ(t)dt =
2π
−∞
%
|fˆ(t)|2 dt.
(8.4)
!
Proposition 5.8.2: Let f ∈ L2 (R) and fn (·) ≡ f I[−n,n] (·), n ≥ 1.
Then, {fn }n≥1 , {fˆn }n≥1 are both Cauchy in L2 (R) and hence, convergent
in L2 (R).
Proof: Since for each n ≥ 1, fn ∈ L2 (R) ∩ L1 (R), by Proposition 5.8.1,
%
1
7fn1 − fn2 722 =
|fˆn1 − fˆn2 |2 dm.
(8.5)
2π
Since f ∈ L2 (R), fn → f in L2 (R), {fn }n≥1 is Cauchy in L2 (R). By (8.5)
{fˆn }n≥1 is also Cauchy in L2 (R).
!
Definition 5.8.1: Let f ∈ L2 (R). The Plancherel transform of f , denoted
by fˆ, is defined as limn→∞ fˆn , where
% n
ˆ
e−ιtx f (x)dx
(8.6)
fn (t) =
2
−n
and the limit is taken in L (R).
Theorem 5.8.3: Let f ∈ L2 (R) and fˆ be its Plancherel transform. Then
(i)
%
|f |2 dm =
1
2π
%
|fˆ|2 dm.
(8.7)
(ii) For f ∈ L1 (R) ∩ L2 (R), the Plancherel transform coincides with the
Fourier transform.
Proof: From Propositions 5.8.1 and 5.8.2 and the definition of fˆ,
%
%
%
%
1
1
|fn |2 dm = |f |2 dm
|fˆ|2 dm = lim
|fˆn |2 dm = lim
n→∞ 2π
n→∞
2π
proving (i). If f ∈ L1 (R) ∩ L2 (R), fˆn defined in (8.6) converges as n → ∞
pointwise to the Fourier transform of f (by the DCT). But by Definition
5.7.1, fˆn converges in L2 (R) to the Plancherel transform of f . So (ii) follows.
!
It can also be shown that for f , fˆ as in the above theorem, the following
inversion formula holds:
% n
1
(8.8)
fˆ(t)eιtx dt → f (x) in L2 (R)
2π −n
5.9 Problems
181
and that the map f → fˆ is a Hilbert space isomorphism of L2 (R) onto
L2 (R). For a proof, see Rudin (1987), Chapter 9. This in turn implies that
if f , g ∈ L2 (R) with Plancherel transforms fˆ and ĝ, respectively, then
%
%
1
fˆ ĝ¯ dm.
(8.9)
f ḡ dm =
2π
This is known as the Parseval identity.
5.9 Problems
5.1 Verify that if µ1 and µ2 are finite measures on (Ω1 , F1 ) and (Ω2 , F2 ),
respectively, then µ12 (·) and µ21 (·), defined by (1.4) and (1.5), respectively, are measures on (Ω1 × Ω2 , F1 × F2 ).
(Hint: Use the MCT.)
5.2 Let (Ω, F, µ) be a complete measure space such that F )= P(Ω) and
for some A0 ∈ F with A0 )= ∅, µ(A0 ) = 0. Let B ∈ P(Ω) \ F. Then
show that (µ × µ)(Ω × A0 ) = 0 but B × A0 )∈ F × F. Conclude that
(Ω×Ω, F ×F, µ×µ) is not complete. (An example of such a measure
space (Ω, F, µ) is the space ([0, 1], ML , m) where ML is the Lebesgue
σ-algebra on [0, 1] and m is the Lebesgue measure.)
5.3 Let µi , i = 1, 2 be two σ-finite measures on (R, B(R)). Let Di = {x :
µi ({x}) > 0}, i = 1, 2.
(a) Show that D1 ∪ D2 is countable.
(b) Let φi (x) = µi ({x}) x ∈ R, i = 1, 2. Show that φi is Borel
measurable for i = 1, 2.
&
$
(c) Show that φ1 dµ2 = z∈D1 ∩D2 φ1 (z)φ2 (z).
(d) Deduce (2.11) from (c).
5.4 Extend Theorem 5.2.3 as follows. Let F1 , F2 be two nondecreasing
right continuous functions on [a, b]. Then
%
%
%
F1 dF2 +
F2 dF1 = F1 (b)F2 (b) − F1 (a)F2 (a) +
φ1 dµ2 ,
(a,b]
(a,b]
(a,b]
where φ1 is as in Problem 5.3.
5.5 Let Fi : R → R be nondecreasing and right continuous, i = 1, 2.
Show that if limb↑∞ F1 (b)F2 (b) = λ1 and lima↓−∞ F1 (a)F2 (a) = λ2
exist and are finite, then
%
%
%
F1 dF2 + F2 dF1 = λ1 − λ2 + φ1 dµ2
R
R
where φ1 is as in Problem 5.3.
R
182
5. Product Measures, Convolutions, and Transforms
5.6 Let (Ω, F, µ) be a σ-finite measure
space
and f be a nonnegative
&
&
measurable function. Then, Ω f dµ = [0,∞) µ({f ≥ t})dt.
(Hint: Consider the product space of (Ω, F, µ) with (R+ , B(R+ ), m)
and apply Tonelli’s theorem to the function g(ω, t) = I(f (ω) ≥ t),
after showing that g is F × B(R+ ))-measurable.)
5.7 Let (Ω, F, P ) be a probability space and X : Ω → R+ be a random
variable.
(a) Show that for any h : R+ → R+ that is absolutely continuous,
%
%
h(X)dP = h(0) +
h$ (t)P (X ≥ t)dt
Ω
[0,∞)
%
= h(0) +
h$ (t)P (X > t)dt.
(0,∞)
(b) Show that for any 0 < p < ∞,
%
%
p
X dP =
ptp−1 P (X ≥ t)dt.
Ω
[0,∞)
(c) Show that for any 0 < p < ∞,
%
%
1
−p
X dP =
ψX (t)tp−1 dt,
Γ(p) [0,∞)
Ω
&
where Γ(p) =
t ∈ R+ .
[0,∞)
e−t tp−1 dt, p > 0, and ψX (t) =
&
Ω
e−tX dP ,
(Hint: (a) Apply Tonelli’s theorem to the function f (t, ω) ≡
h$ (t)I(X(w) ≥ t) on the product measure space ([0, ∞) ×
Ω, B([0, ∞)) × F, m × P ), where m is Lebesgue measure on
(R+ , B(R+ )).)
5.8 Let g : R+ → R+ and f : R2 → R+ be Borel measurable. Let
A = {(x, y) : x ≥ 0, 0 ≤ y ≤ g(x)}.
(a) Show that A ∈ B(R2 ).
(b) Show that
% '%
R+
[0,g(x)]
%
(
f (x, y)m(dy) m(dx) = f IA dm(2) ,
where m(2) is Lebesgue measure on (R2 , B(R2 )) and m(·) is
Lebesgue measure on R.
5.9 Problems
183
(c) If g is continuous and strictly increasing show that the two integrals in (b) equal
% '%
(
f (x, y)m(dx) m(dy).
R+
[g −1 (y),∞)
5.9 (a) For 1 < A < ∞, let
hA (t) =
%
A
0
e−xt sin x dx, t ≥ 0.
Use integration by parts to show that
|hA (t)| ≤
and
hA (t) →
1
+ e−t
1 + t2
1
1 + t2
as
A → ∞.
(b) Show using Fubini’s theorem that for 0 < A < ∞,
% A
% ∞
sin x
dx.
hA (t)dt =
x
0
0
(c) Conclude using the DCT that
% A
% ∞
sin x
1
lim
dx =
dt.
A→∞ 0
x
1
+
t2
0
(d) Using Theorem 4.4.1 and the fact that φ(x) ≡ tan x is a (1–1)
strictly monotone map from (0, π2 ) to (0, ∞) having the inverse
1
map ψ(·) with derivative ψ $ (t) ≡ 1+t
2 , 0 < t < ∞, conclude
that
% ∞
π
1
dt = .
2
1+t
2
0
& ∞ −x2 /2
Lπ
5.10 Show that I ≡ 0 e
dx = 2 .
& ∞ & ∞ (x2 +y2 )
(Hint: By Tonelli’s theorem, I 2 = 0 0 e− 2 dxdy. Now use
the change of variables x = r cos θ, y = r sin θ, 0 < r < ∞, 0 < θ <
π
2 .)
!
"
5.11 Let µ be a finite measure on R, B(R) . Let f , g : R → R+ be
nondecreasing. Show that
%
'%
(' %
(
µ(R) f g dµ ≥
f dµ
gdµ .
!
"!
"
(Hint: Consider h(x1 , x2 ) = f (x1 ) − f (x2 ) g(x1 ) − g(x2 ) on R2
and integrate w.r.t. µ × µ.)
184
5. Product Measures, Convolutions, and Transforms
!
"
5.12 Let µ and λ be σ-finite measures on R, B(R) . Recall that
% %
ν(A) ≡ (µ ∗ λ)(A) ≡
IA (x + y)dµdλ, A ∈ B(R).
(a) Show that for any Borel measurable f : R → R+ , f (x + y) is
1B(R) × B(R), B(R)2-measurable from R × R → R and
%
% %
f dν =
f (x + y)dµdλ.
(b) Show that
ν(A) =
%
R
µ(A − t)λ(dt), A ∈ B(R).
(c) Suppose there exist countable sets Bλ , Bµ such that µ(Bµc ) =
0 = λ(Bλc ). Show that there exists a countable set Bν such that
ν(Bνc ) = 0.
(d) Suppose that µ({x}) = 0 for all x in R. Show that ν({x}) = 0
for all x in R.
dµ
(e) Suppose that µ ; m with dm
= h. Show that ν ; m and find
dν
dm in terms of h, µ and λ.
(f) Suppose that µ ; m and λ ; m. Show that
dµ dλ
dν
=
∗
.
dm
dm dm
5.13 (Convolution of cdfs). Let Fi , i = 1, 2 be cdfs on R. Recall that a cdf
F on R is a function from R → R+ such that it is nondecreasing, right
continuous with F (x) → 0 as x → −∞ and F (x) → 1 as x → ∞.
&
(a) Show that (F1 ∗ F2 )(x) ≡ R F1 (x − u)dF2 (u) is well defined and
is a cdf on R.
(b) Show also that (F1 ∗ F2 )(·) = (F2 ∗ F1 )(·).
&
(c) Suppose t ∈ R is such that' etx dFi (x)('
< ∞ for i = 1,(2. Show
& tx
& tx
& tx
e dF1 (x)
e dF2 (x) .
that e d(F1 ∗ F2 )(x) =
5.14 Let f , g ∈ L1 (R, B(R), m).
(a) Show that if f is continuous and bounded on R, then so is f ∗ g.
(b) Show that if f is differentiable with a bounded derivative on R,
then so is f ∗ g.
(Hint: Use the DCT.)
5.9 Problems
185
5.15 Let f ∈ L1 (R), g ∈ Lp (R), 1 ≤ p ≤ ∞.
(a) Show that if 1 ≤ p < ∞, then for all x in R
Jp ' %
J%
(
(p−1 ' %
J
J
|g(x−u)|p |f (u)|du
|f |dm
J |f (u)g(x−u)|duJ ≤
and hence that
(f ∗ g)(x) ≡
%
f (u)g(x − u)du
is well defined a.e. (m) and
7f ∗ g7p ≤ 7f 71 7g7p
with “=” holding iff either f = 0 a.e. or g = 0 a.e.
(b) Show that if p = ∞ then
7f ∗ g7∞ ≤ 7f 71 7g7∞
and “=” can hold for some nonzero f and g.
(Hint for (a): Use Jensen’s inequality with probability measure dµ =
|f |dm
/f /1 if 7f ||1 > 0.)
5.16 Let 1 ≤ p ≤ ∞ and q = 1 − p1 . Let f ∈ Lp (R), g ∈ Lq (R).
(a) Show that f ∗ g is well defined and uniformly continuous.
(b) Show that if 1 < p < ∞,
lim (f ∗ g)(x) = 0.
|x|→∞
(Hint: For (a) use Hölder’s inequality and Lemma 5.7.7. For (b)
approximate g by simple functions.)
5.17 Let g : R → R be infinitely differentiable and be zero outside a
bounded interval.
&
(a) Let f : R → R be Borel measurable and A |f |dm < ∞ for all
bounded intervals A in R. Show that f ∗ g is well defined and
infinitely differentiable.
(b) Show that for any f ∈ L1 (R), there exist a sequence {gn }n≥1 of
such functions such that f ∗ gn → f in L1 (R).
5.18 For f ∈ L1 (R), let fσ = f ∗φσ where φσ (x) =
x ∈ R.
2
x
√ 1 e− 2σ2
2πσ
, 0 < σ < ∞,
186
5. Product Measures, Convolutions, and Transforms
(a) Show that fσ is infinitely differentiable.
(b) Show that if f is continuous and zero outside a bounded interval,
then fσ converges to g uniformly on R as σ → 0.
(c) Show that if f ∈ Lp (R), 1 ≤ p < ∞, then fσ → f in Lp (R) as
σ → 0.
5.19 Let f ∈ Lp (R), 1 ≤ p < ∞ and h(x) =
bounded Borel set and
&
A+x
f (u)du where A is a
A + x ≡ {y : y = a + x, a ∈ A}.
(a) Show that h = f ∗ g for some g bounded and with bounded
support.
(b) Show that h(·) is continuous and that lim h(x) = 0.
|x|→∞
!
(Hint: For 1 <"p < ∞, use Hölder’s inequality and show that m (A+
x1 ) + (A + x2 ) → 0 as |x1 − x2 | → 0.)
5.20 (a) Verify the claims in Examples 5.4.1 directly.
(b) Verify the same using generating functions.
5.21 Let f be a probability
density on R, i.e., f is nonnegative, Borel
&
measurable and f dm = 1. Show that |fˆ(t)| < 1 for all t )= 0.
"
&!
1 − cos t0 (x − θ)
(Hint: If |fˆ(t0 )| = 1 for some t0 )= 0, show that
f (x)dx = 0 for some θ.)
5.22 (a) Let f ∈ L1 (R) and xk f (x) ∈ L1 (R) for some k ≥ 1. Show that
fˆ(·) is k-times differentiable on R with all derivatives fˆ(r) (t) → 0
as |t| → ∞ for r ≤ k.
&
(b) Let f ∈ L1 (R). Suppose |tfˆ(t)|dt < ∞. Show that there
exists a function g : R → R such that it is differentiable,
lim|x|→∞ (|g(x)| + |g $ (x)|) = 0 and g = f a.e. (m). Extend this
&
to the case where |tk fˆ(t)|dt < ∞ for some k > 1.
& −ιtx
1
(Hint: Consider g(x) = 2π
fˆ(t)dt.)
e
5.23 Let f (x) =
2
x
√1 e− 2
2π
, x ∈ R.
(a) Show that fˆ(·) is real valued, differentiable and satisfies the ordinary differential equation fˆ$ (t) + tfˆ(t) = 0, t ∈ R and fˆ(0) = 1.
Find fˆ(t).
5.9 Problems
187
(b) For µ in R, σ > 0, let
fµ,σ (x) = √
1 x−u 2
1
e− 2 ( σ ) .
2πσ
Find fˆµ,σ and verify that for any (µi , σi ), i = 1, 2,
fµ1 ,σ1 ∗ fµ2 ,σ2 = fµ1 +µ2 ,σ12 +σ22 .
(Hint for (b): Use Fourier transforms and uniqueness.)
5.24 (Rate of convergence of Fourier series). Consider the function h(x) =
π
2 − |x| in −π ≤ x ≤ π.
& π −ιnx
1
e
h(x)dx, n = 0, ±1, ±2.
(a) Find ĥ(n) ≡ 2π
−π
$∞
(b) Show that n=−∞ |ĥ(n)| < ∞.
$+n
ιjx
(c) Show that Sn (h, x) ≡
converges to h(x) unij=−n ĥ(j)e
formly on [−π, π].
(d) Verify that
sup{|Sn (h, x) − h(x)| : −π ≤ x ≤ π} ≤
and
|Sn (h, 0) − h(0)| ≥
2
1
, n≥2
π (n − 1)
2
1
.
π (n + 2)
(Remark: This example shows that the Fourier series of a function
can converge very slowly such as in this example where the rate of
decay is n1 .)
5.25 Using Fejer’s theorem (Theorem 5.6.1) prove Wierstrass’ theorem on
uniform approximation of a continuous function on a bounded closed
interval by a polynomial.
(Hint: Show that on bounded intervals a trigonometric polynomial
can be approximated uniformly by a polynomial using the power
series representation of sine and cosine functions (see Section A.3).
5.26 Evaluate
lim
n→∞
%
n
−n
sin λx ιtx
e dx, 0 < λ < ∞, 0 < x < ∞.
x
(Hint: For 0 < λ < ∞, sinxλx ∈ L2 (R) and it is the Fourier transform
of f (t) = I[−λ,λ] (·). Now apply Plancherel theorem. Alternatively use
&n
Fubini’s theorem and the fact lim −n siny y dy exists in R.)
n→∞
188
5. Product Measures, Convolutions, and Transforms
!
"c
5.27 Find an example of a function f ∈ L2 (R) ∩ L1 (R) such that its
Plancherel transform fˆ ∈ L1 (R).
(Hint: Examine Problem 5.26.)
6
Probability Spaces
6.1 Kolmogorov’s probability model
Probability theory provides a mathematical model for random phenomena,
i.e., those involving uncertainty. First one identifies the set Ω of possible
outcomes of (random) experiment associated with the phenomenon. This
set Ω is called the sample space, and an individual element ω of Ω is called
a sample point. Even though the outcome is not predictable ahead of time,
one is interested in the “chances” of some particular statement to be valid
for the resulting outcome. The set of ω’s for which a given statement is
valid is called an event. Thus, an event is a subset of Ω. One then identifies
a class F of events, i.e., a class F of subsets of Ω (not necessarily all of
P(Ω), the power set of Ω), and then a set function P on F such that for A
in F, P (A) represents the “chance” of the event A happening. Thus, it is
reasonable to impose the following conditions on F and P :
(i) A ∈ F ⇒ Ac ∈ F
(i.e., if one can define the probability of an event A, then the probability of A not happening is also well defined).
(ii) A1 , A2 ∈ F ⇒ A1 ∪ A2 ∈ F
(i.e., if one can define the probabilities of A1 and A2 , then the probability of at least one of A1 or A2 happening is also well defined).
(iii) for all A in F, 0 ≤ P (A) ≤ 1, P (∅) = 0, and P (Ω) = 1.
190
6. Probability Spaces
(iv) A1 , A2 ∈ F, A1 ∩ A2 = ∅ ⇒ P (A1 ∪ A2 ) = P (A1 ) + P (A2 )
(i.e., if A1 and A2 are mutually exclusive events, then the probability of at least one of the two happening is simply the sum of the
probabilities).
The above conditions imply that F is an algebra and P is a finitely
additive set function. Next, as explained in Section 1.2, it is natural to
require that F be closed under monotone increasing unions and P be
monotone continuous from below. That is, if {An }n≥1 is a sequence
of events in F such that An implies An+1 (i.e., An ⊂ An+1 ) for all
n ≥ 1, then the probability of at least one of the An ’s happening
is well defined and is the limit of the corresponding probabilities.
In other words, the following conditions on F and P must hold in
addition to (i)–(iv) above:
(v) An ∈ F, A#
n ⊂ An+1 for all n = 1, 2, . . .
P (An ) ↑ P ( n≥1 An ).
⇒
#
n≥1
An , ∈ F and
As noted in Section 1.2, conditions (i)–(v) imply that (Ω, F, P ) is a measure
space, i.e., F is a σ-algebra and P is a measure on F with P (Ω) = 1. That
is, (Ω, F, P ) is a probability space. This is known as Kolmogorov’s probability model for random phenomena (see Kolmogorov (1956), Parthasarathy
(2005)). Here are some examples.
Example 6.1.1: (Finite sample spaces). Let Ω ≡ {ω1 , ω2 , . . . , ωk }, 1 ≤
$k
k < ∞, F ≡ P(Ω), the power set of Ω and P (A) = i=1 pi IA (ωi ) where
$
k
{pi }ki=1 are such that pi ≥ 0 and i=1 pi = 1. This is a probability model
for random experiments with finitely many possible outcomes.
An important application of this probability model is finite population
sampling. Let {U1 , U2 , . . . , UN } be a finite population of N units or objects.
These could be individuals in a city, counties in a state, etc. In a typical
sample survey procedure, one chooses a subset of size n (1 ≤ n ≤ N ) from
this population.
! " Let Ω denote the collection of all possible subsets of size n.
Here k = N
n , each ωi is a sample of size n and pi is the selection probability
of ωi . The assignment of {pi }ki=1 is determined by a given sampling scheme.
For example, in simple random sampling without replacement, pi = k1 for
i = 1, 2, . . . , k.
Other examples include coin tossing, rolling of dice, bridge hands, and
acceptance sampling in statistical quality control (Feller (1968)).
Example 6.1.2: (Countably infinite sample spaces).
Let Ω ≡ {ω1 , ω2 , . . .}
$∞
∞
p
I
be a countable set, F
=
P(Ω),
and
P
(A)
≡
i
A (ωi ) where {pi }i=1
i=1
$∞
satisfy pi ≥ 0 and
i=1 pi = 1. It is easy to verify that (Ω, F, P ) is a
probability space. This is a probability model for random experiments with
countably infinite number of outcomes. For example, the experiment of
tossing a coin until a “head” is produced leads to such a probability space.
6.2 Random variables and random vectors
191
Example 6.1.3: (Uncountable sample spaces).
(a) (Random variables). Let Ω = R, F = B(R), P = µF , the LebesgueStieltjes measure corresponding to a cdf F , i.e., corresponding to a
function F : R → R that is nondecreasing, right continuous, and
satisfies F (−∞) = 0, F (+∞) = 1. See Section 1.3. This serves as a
model for a single random variable X.
(b) (Random vectors). Let Ω = Rk , F = B(Rk ), P = µF , the LebesgueStieltjes measure corresponding to a (multidimensional) cdf F on Rk
where k ∈ N. See Section 1.3. This is a model for a random vector
(X1 , X2 , . . . , Xk ).
(c) (Random sequences). Let Ω = R∞ ≡ R × R × . . . be the set of all
sequences {xn }n≥1 of real numbers. Let C be the class of all finite
dimensional sets of the form A × R × R × . . ., where A ∈ B(Rk )
for some 1 ≤ k < ∞. Let F be the σ-algebra generated by C. For
each 1 ≤ k < ∞, let µk be a probability measure on B(Rk ) such
that µk+1 (A × R) = µk (A) for all A ∈ B(Rk ). Then there exists a
probability measure µ on F such that µ(A × R × R × . . .) = µk (A)
if A ∈ B(Rk ). (This will be shown later as a special case of the
Kolmogorov’s consistency theorem in Section 6.3.) This will be a
model for a sequence {Xn }n≥1 of random variables such that for
each k, 1 ≤ k < ∞, the distribution of (X1 , X2 , . . . , Xk ) is µk .
6.2 Random variables and random vectors
Recall the following definitions introduced earlier in Sections 2.1 and 2.2.
Definition 6.2.1: Let (Ω, F, P ) be a probability space and X : Ω → R
be 1F, B(R)2-measurable, that is, X −1 (A) ∈ F for all A ∈ B(R). Then, X
is called a random variable on (Ω, F, P ).
Recall that X : Ω → R is 1F, B(R)2-measurable iff for all x ∈ R, {ω :
X(ω) ≤ x} ∈ F.
Definition 6.2.2: Let X be a random variable on (Ω, F, P ). Let
FX (x) ≡ P ({ω : X(ω) ≤ x}), x ∈ R.
(2.1)
Then FX (·) is called the cumulative distribution function (cdf) of X.
Definition 6.2.3: Let X be a random variable on (Ω, F, P ). Let
PX (A) ≡ P (X −1 (A))
for all A ∈ B(R).
(2.2)
Then the probability measure PX is called the probability distribution of
X.
192
6. Probability Spaces
Note that PX is the measure induced by X on (R, B(R)) under P and
that the Lebesgue-Stieltjes measure µFX on B(R) corresponding with the
cdf FX of X is the same as PX .
Definition 6.2.4: Let (Ω, F, P ) be a probability space, k ∈ N and X :
Ω → Rk be 1F, B(Rk )2-measurable, i.e., X −1 (A) ∈ F for all A ∈ B(Rk ).
Then X is called a (k-dimensional) random vector on (Ω, F, P ).
Let X = (X1 , X2 , . . . , Xk ) be a random vector with components Xi ,
i = 1, 2, . . . , k. Then each Xi is a random variable on (Ω, F, P ). This follows
from the fact that the coordinate projection maps from Rk to R, given by
πi (x1 , x2 , . . . , xk ) ≡ xi , 1 ≤ i ≤ k
are continuous and hence, are Borel measurable. Conversely, if for 1 ≤ i ≤
k, Xi is a random variable on (Ω, F, P ), then X = (X1 , X2 , . . . , Xk ) is a
random vector (cf. Proposition 2.1.3).
Definition 6.2.5: Let X be a k-dimensional random vector on (Ω, F, P )
for some k ∈ N. Let
FX (x) ≡ P ({ω : X1 (ω) ≤ x1 , X2 (ω) ≤ x2 , . . . , Xk (ω) ≤ xk })
(2.3)
for x = (x1 , x2 , . . . , xk ) ∈ Rk . Then FX (·) is called the joint cumulative
distribution function (joint cdf) of the random vector X.
Definition 6.2.6: Let X be a k-dimensional random vector on (Ω, F, P )
for some k ∈ N. Let
PX (A) = P (X −1 (A))
for all A ∈ B(Rk ).
(2.4)
The probability measure PX is called the (joint) probability distribution of
X.
As in the case k = 1, the Lebesgue-Stieltjes measure µFX on B(Rk )
corresponding to the joint cdf FX is the same as PX .
Next, let X = (X1 , X2 , . . . , Xk ) be a random vector. Let Y =
(Xi1 , Xi2 , . . . , Xir ) for some 1 ≤ i1 < i2 < . . . < ir ≤ k and some 1 ≤ r ≤ k.
Then, Y is also a random vector. Further, the joint cdf of Y can be obtained from FX by setting the components xj , j )∈ {i1 , i2 , . . . , ir } equal
to ∞. Similarly, the probability distribution PY can be obtained from PX
as an induced measure from the projection map π(x) = (xi1 , xi2 , . . . , xir ),
x ∈ Rk . For example, if (i1 , i2 , . . . , ir ) = (1, 2, . . . , r), r ≤ k, then
FY (y1 , . . . , yr ) = FX (y1 , . . . , yr , ∞, . . . , ∞), (y1 , . . . , yr ) ∈ Rr
and
PY (A) = PX (A × R(k−r) ), A ∈ B(Rr ).
6.2 Random variables and random vectors
193
Definition 6.2.7: Let X = (X1 , X2 , . . . , Xk ) be a random vector on
(Ω, F, P ). Then, for each i = 1, . . . , k, the cdf FXi and the probability
distribution PXi of the random variable Xi are called the marginal cdf and
the marginal probability distribution of Xi , respectively.
It is clear that the distribution of X determines the marginal distribution PXi of Xi for all i = 1, 2, . . . , k. However, the marginal distributions
{PXi : i = 1, 2, . . . , k} do not uniquely determine the joint distribution PX ,
without additional conditions, such as independence (see Problem 6.1).
Definition 6.2.8: Let X be a random variable on (Ω, F, P ). The expected
value of X, denoted by EX or E(X), is defined as
%
XdP,
(2.5)
EX =
Ω
provided
& the integral
& is well defined. That is, at least one of the two quantities X + dP and X − dP is finite.
If X is a random variable on (Ω, F, P ) and h : R → R is Borel measurable, then Y = h(X) is also a random variable on (Ω, F, P ). The expected
value of Y may be computed as follows.
Proposition 6.2.1: (The change of variable formula). Let X be a random
variable on (Ω, F, P ) and h : R → R be Borel measurable. Let Y = h(X).
Then
%
%
%
|Y |dP =
|h(x)|PX (dx) =
|y|PY (dy).
(i)
Ω
(ii) If
&
Ω
R
R
|Y |dP < ∞, then
%
%
%
Y dP =
h(x)PX (dx) =
yPY (dy).
Ω
R
R
(2.6)
Proof: If h = IA for A in B(R), the proposition follows from the definition
of PX . By linearity, this extends to a nonnegative and simple function h
and by the MCT, to any nonnegative measurable h, and hence to any
measurable h.
!
Remark 6.2.1: Proposition 6.2.1 shows that the expectation of Y can be
computed in three different ways, i.e., by integrating Y with respect to P on
Ω or by integrating h(x) on R with respect to the probability distribution
PX of the random variable X or by integrating y on R with respect to the
probability distribution PY of the random variable Y .
Remark
6.2.2: If the function h is nonnegative, then the relation EY =
&
h(x)P
(dx) is valid even if EY = ∞.
X
R
194
6. Probability Spaces
Definition 6.2.9: For any positive integer n, the nth moment µn of a
random variable X is defined by
µn ≡ EX n ,
(2.7)
provided the expectation is well defined.
Definition 6.2.10: The variance of a random variable X is defined as
Var(X) = E(X − EX)2 , provided EX 2 < ∞.
Definition 6.2.11: The moment generating function (mgf ) of a random
variable X is defined by
MX (t) ≡ E(etX )
for all t ∈ R.
(2.8)
Since etX is always nonnegative, E(etX ) is well defined but could be
infinity. Proposition 6.2.1 gives a way of computing the moments and the
mgf of X without explicitly computing the distribution of X k or etX . As
an illustration, consider the case of a random variable X defined on the
probability space (Ω, F, P ) with Ω = {H, T }n , n ∈ N, F = the power set
of Ω and P = the probability distribution defined by
P ({ω}) = pX(ω) q n−X(ω)
where 0 < p < 1, q = 1 − p, and X(ω) = the number of H’s in ω. By the
change of variable formula, the mgf of X is given by
%
etx PX (dx)
MX (t) ≡
=
n
)
r=0
etr
- .
n r n−r
p q
= (pet + q)n ,
r
distribution of X, is supported on {0, 1, 2, . . . , n}
since PX , the probability
! "
with PX ({r}) = nr pr q n−r . Note that PX is the Binomial (n, p) distribution. Here, MX (t) is computed using the distribution of X, i.e., using the
middle term in (2.6) only.
The connection between the mgf MX (·) and the moments of a random
variable X is given in the following propositions.
Proposition 6.2.2: Let X be a nonnegative random variable and t ≥ 0.
Then
∞ n
)
t µn
MX (t) ≡ E(etX ) =
(2.9)
n!
n=0
where µn is as in (2.7).
$∞
n
Proof: Since etX = n=0 tn Xn! and X is nonnegative, (2.9) follows from
the MCT.
!
6.2 Random variables and random vectors
195
Proposition 6.2.3: Let X be a random variable and let MX (t) be finite
for all |t| < #, for some # > 0. Then
(i) E|X|n < ∞
(ii) MX (t) =
∞
)
for all
tn
n=0
µn
n!
n ≥ 1,
for all
|t| < #,
(iii) MX (·) is infinitely differentiable on (−#, +#) and for r ∈ N, the rth
derivative of MX (·) is
(r)
MX (t) =
In particular,
∞ n
)
t
µn+r = E(etX X r )
n!
n=0
for
|t| < #.
(r)
MX (0) = µr = EX r .
(2.10)
(2.11)
Proof: Since MX (t) < ∞ for all |t| < #,
E(e|tX| ) ≤ E(etX ) + E(e−tX ) < ∞
Also, e|tX| ≥
for |t| < #.
(2.12)
|t|n |X|n
n!
for Jall n ∈ N and
J hence, (i) follows by choosing a t
J$n (tx)j J
in (0, #). Next note that J j=0 j! J ≤ e|tx| for all x in R and all n ∈ N.
Hence, by (2.12) and the DCT, (ii) follows.
Turning to (iii), since MX (·) admits a power series expansion convergent
in |t| < #, it is infinitely differentiable in |t| < # and the derivatives of MX (·)
can be found by term-by-term differentiation of the power series (see Rudin
(1976), Chapter 9). Hence,
>
=∞
dr ) tn µn
(r)
MX (t) =
dtr n=0 n!
=
=
=
∞
)
µn dr (tn )
n! dtr
n=0
∞
)
µn
n=r
∞ n
)
tn−r
(n − r)!
t
µn+r .
n!
n=0
The verification of the second equality in (2.10) is left in an exercise (see
Problem 6.4).
!
Remark 6.2.3: If the mgf MX (·) is finite for |t| < # for some # > 0, then
by part (ii) of the above proposition, MX (t) has a power series expansion
196
6. Probability Spaces
in t around 0 and µn!n is simply the coefficient of tn . For example, if X has
a N (0, 1) distribution, then for all t ∈ R,
% ∞
2
2
1
MX (t) =
etx √ e−x /2 dx = et /2
2π
−∞
∞
2 k
)
(t ) 1
=
.
(2.13)
k! 2k
k=0
Thus, µn =
/
0
(2k)!
k!2k
if n is odd
if n = 2k, k = 1, 2, . . .
Remark 6.2.4: If MX (t) is finite for |t| < # for some # > 0, then all the
moments {µn }n≥1 of X are determined and also its probability distribution.
However, in general, the sequence {µn }n≥1 of moments of X need not
determine the distribution of X uniquely.
Table 6.2.1 gives the mean, variance, and the mgf of a number of standard
probability distributions on the real line.
For future reference, some of the inequalities established in Section 3.1
are specialized for random variables and collected below without proofs.
Proposition 6.2.4: (Markov’s inequality). Let X be a random variable
on (Ω, F, P ). Then for any φ : R+ → R+ nondecreasing and any t > 0 with
φ(t) > 0,
!
"
E φ(|X|)
.
(2.14)
P (|X| ≥ t) ≤
φ(t)
In particular,
(i) for r > 0, t > 0,
P (X ≥ t) ≤ P (|X| ≥ t) ≤
(ii) for any t ≥ 0,
P (|X| ≥ t) ≤
E|X|r
,
tr
(2.15)
E(eθ|X| )
,
eθt
for any θ > 0 and hence
E(eθ|X| )
.
θ>0
eθt
P (|X| ≥ t) ≤ inf
(2.16)
Proposition 6.2.5: (Chebychev’s inequality). Let X be a random variable
with EX 2 < ∞, EX = µ, Var(X) = σ 2 . Then for any k > 0,
P (|X − µ| ≥ kσ) ≤
1
.
k2
(2.17)
6.2 Random variables and random vectors
197
TABLE 6.2.1. Mean, variance, mgf of the distributions listed in Tables 4.6.1 and
4.6.2.
Distribution
Mean
Variance
mgf M(t)
Bernoulli (p),
0<p<1
p
p(1 − p)
np
np(1 − p)
1
p
1−p
p
(1 − p) + pet ,
t∈R
"n
!
(1 − p) + pet ,
t∈R
λ
λ
a+b
2
(b−a)2
12
β
β2
Binomial (n, p),
p ∈ (0, 1), n ∈ N
Geometric (p),
p ∈ (0, 1)
Poisson (λ)
λ ∈ (0, ∞)
Uniform (a, b),
−∞ < a < b < ∞
Exponential (β),
β ∈ (0, ∞)
!
αβ 2
Beta (α, β),
α
α+β
αβ
(α+β)2 (α+β+1)
t ∈ R \ {0};
M (0) = 1
(1 − βt)−α ,
t ∈ (−∞, β1 )
8
$∞
1 + k=1
'<
( k9
k−1 α+r
r=0 α+β+r
α, β ∈ (0, ∞)
Lognormal (γ, σ 2 ),
γ ∈ R, σ ∈ (0, ∞)
ebt −eat
(b−a)t ,
t ∈ (−∞, β1 )
αβ
Normal (γ, σ 2 ),
γ ∈ R, σ ∈ (0, ∞)
t∈
"
− ∞, − log(1 − p)
!
"
exp λ(et − 1) ,
t∈R
1
1−βt ,
Gamma (α, β),
α, β ∈ (0, ∞)
Cauchy (γ, σ),
γ ∈ R, σ ∈ (0, ∞)
pet
1−(1−p)et ,
not defined
not defined
since
E|X| = ∞
since
E|X|2 = ∞
σ2
γ
eγ+
σ2
2
2
[e2(γ+σ )
2
−e2γ+σ ]
t
k!
,
t∈R
∞ for all t )= 0
2
2
exp(γt + t 2σ ),
t∈R
∞ for all t > 0
Proposition 6.2.6: (Jensen’s inequality). Let X be a random variable
with P (a < X < b) = 1 for −∞ ≤ a < b ≤ ∞. Let φ : (a, b) → R be convex
on (a, b). Then
Eφ(X) ≥ φ(EX),
provided E|X| < ∞ and E|φ(X)| < ∞.
(2.18)
198
6. Probability Spaces
Proposition 6.2.7: (Hölder’s inequality). Let X and Y be random variables on (Ω, F, P ) with E|X|p < ∞, E|Y |q < ∞, 1 < p < ∞, 1 < q < ∞,
1
1
p + q = 1. Then
E|XY | ≤ (E|X|p )1/p (E|Y |q )1/q ,
(2.19)
with equality holding iff P (c1 |X|p = c2 |Y |q ) = 1 for some 0 ≤ c1 , c2 < ∞.
Proposition 6.2.8: (Cauchy-Schwarz inequality). Let X and Y be random variables on (Ω, F, P ) with E|X|2 < ∞, E|Y |2 < ∞. Then
L
L
|Cov(X, Y )| ≤ Var(X) Var(Y ) ,
(2.20)
where Cov(X, Y ) = EXY − EXEY . If Var(X) > 0, then equality holds in
(2.20) iff P (Y = aX + b) = 1 for some constants a, b in R (Problem 6.6).
Proposition 6.2.9: (Minkowski’s inequality). Let X and Y be random
variables on (Ω, F, P ) such that E|X|p < ∞, E|Y |p < ∞ for some 1 ≤ p <
∞. Then
(E|X + Y |p )1/p ≤ (E|X|p )1/p + (E|Y |p )1/p .
(2.21)
Definition 6.2.12: (Product moments of random vectors). Let X =
(X1 , X2 , . . . , Xk ) be a random vector. The product moment of order r =
(r1 , r2 , . . . , rk ), with ri being a nonnegative integer for each i, is defined as
µr ≡ µr1 ,r2 ,...,rk ≡ E(X1r1 X2r2 · · · Xkrk ),
(2.22)
provided E|X1r1 · · · Xkrk | < ∞. The joint moment generating function (joint
mgf ) of a random vector X = (X1 , X2 , . . . , Xk ) is defined by
MX1 ,...,Xk (t1 , t2 , . . . , tk ) ≡ E(et1 X1 +t2 X2 +···+tk Xk ),
(2.23)
for all t1 , t2 , . . . , tk in R.
As in the case of a random variable, if the joint mgf
MX1 ,X2 ,...,Xk (t1 , . . . , tk ) is finite for all (t1 , t2 , . . . , tk ) with |ti | < #
for all i = 1, 2, . . . , k for some # > 0, then an analog of Proposition 6.2.3
holds. For example, the following assertions are valid (cf. Problem 6.4):
(i)
E|Xi |n < ∞ for all i = 1, 2, . . . , k
and n ≥ 1.
(ii) For t = (t1 , . . . , tk ) ∈ Rk and r = (r1 , r2 , . . . , rk ) ∈ Zk+ , let
tr = tr11 tr22 · · · trkk ,
r! = r1 !r2 ! · · · rk !, and
µr = EX1r1 X2r2 · · · Xkrk .
(2.24)
6.3 Kolmogorov’s consistency theorem
Then,
MX (t1 , . . . , tk ) =
) tr
µr
r!
k
199
(2.25)
r∈Z+
for all t = (t1 , t2 , . . . , tk ) ∈ (−#, +#)k .
(iii) For any r = (r1 , . . . , rk ) ∈ Zk+ ,
where
dr
dtr
=
∂ r1 ∂ r2
r
r
∂t11 ∂t22
J
dr
J
M
(t)
= µr ,
J
X
dtr
t=0
(2.26)
rk
∂
. . . ∂t
rk .
k
6.3 Kolmogorov’s consistency theorem
In the previous section, the case of a single random variable and that of a
finite dimensional random vector were discussed. The goal of this section is
to discuss infinite families of random variables such as a random sequence
{Xn }n≥1 or a random function {X(t) : 0 ≤ t < T }, 0 ≤ T ≤ ∞. For example, Xn could be the population size of the nth generation of a randomly
evolving biological population, and X(t) could be the temperature at time
t in a chemical reaction over a period [0, T ]. An example from modeling
of spatial random phenomenon is a collection {X(s) : s ∈ S} of random
variables X(s) where S is a specified region such as the U.S., and X(s) is
the amount of rainfall at location s ∈ S during a specified month.
Let (Ω, F, P ) be a probability space and {Xα : α ∈ A} be a collection of random variables defined on (Ω, F, P ), where A is a nonempty
set. Then for any (α1 , α2 , . . . , αk ) ∈ Ak , 1 ≤ k < ∞, the random vector
(Xα1 , Xα2 , . . . , Xαk ) has a joint probability distribution µ(α1 ,α2 ,...,αk ) over
(Rk , B(Rk )).
Definition 6.3.1: A (real valued) stochastic process with index set A is
a family {Xα : α ∈ A} of random variables defined on a probability space
(Ω, F, P ).
Example 6.3.1: (Examples of stochastic processes). Let Ω = [0, 1], F =
B([0, 1]), P = the Lebesgue measure on [0, 1]. Let A1 = {1, 2, 3, . . .}, A2 =
[0, T ], 0 < T < ∞. For ω ∈ Ω, n ∈ A1 , t ∈ A2 , let
Xn (ω)
=
sin 2πnω
Yt (ω)
=
sin 2πtω
Zn (ω)
Vn,t (ω)
=
nth digit in the decimal expansion of ω
=
Xn2 (ω) + Yt2 (ω).
Then {Xn : n ∈ A1 }, {Zn : n ∈ A1 }, {Vn,t : (n, t) ∈ A1 × A2 }, {Yt : t ∈ A2 }
are all stochastic processes.
200
6. Probability Spaces
Note that a real valued stochastic process {Xα : α ∈ A} may also be
viewed as a random real valued function on the set A by the identification
ω → f (ω, ·), where f (ω, α) = Xα (ω) for α in A.
Definition 6.3.2: The family {µ(α1 ,α2 ,...,αk ) (·) ≡ P ((Xα1 , . . . , Xαk ) ∈ ·):
(α1 , α2 , . . . , αk ) ∈ Ak , 1 ≤ k < ∞} of probability distributions is called
the family of finite dimensional distributions (fdds) associated with the
stochastic process {Xα : α ∈ A}.
This family of finite dimensional distributions satisfies the following consistency conditions: For any (α1 , α2 , . . . , αk ) ∈ Ak , 2 ≤ k < ∞, and any
B1 , B2 , . . . , Bk in B(R),
C1: µ(α1 ,α2 ,...,αk ) (B1 ×· · ·×Bk−1 ×R) = µ(α1 ,α2 ,...,αk−1 ) (B1 ×· · ·×Bk−1 );
C2: For any permutation (i1 , i2 , . . . , ik ) of (1, 2, . . . , k),
µ(αi1 ,αi2 ,...,αik ) (Bi1 ×Bi2 ×· · ·×Bik ) = µ(α1 ,...,αk ) (B1 ×B2 ×· · ·×Bk ) .
To verify C1, note that
µ(α1 ,α2 ,...,,αk ) (B1 × B2 × · · · × Bk−1 × R)
= P (Xα1 ∈ B1 , Xα2 ∈ B2 , . . . , Xαk−1 ∈ Bk−1 , Xαk ∈ R)
=
P (Xα1 ∈ B1 , Xα2 ∈ B2 , . . . , Xαk−1 ∈ Bk−1 )
= µ(α1 ,α2 ,...,αk−1 ) (B1 × B2 × · · · × Bk−1 ).
Similarly, to verify C2, note that
µ(αi1 ,αi2 ,...,αik ) (Bi1 × Bi2 · · · × Bik )
= P (Xαi1 ∈ Bi1 , Xαi2 ∈ Bi2 , . . . , Xαik ∈ Bik )
= P (Xα1 ∈ B1 , Xα2 ∈ B2 , . . . , Xαk ∈ Bk )
= µ(α1 ,α2 ,...,αk ) (B1 × B2 × · · · × Bk ).
A natural question is that given a family of probability distributions QA ≡
{ν(α1 ,α2 ,...,αk ) : (α1 , α2 , . . . , αk ) ∈ Ak , 1 ≤ k < ∞} on finite dimensional
Euclidean spaces, does there exist a real valued stochastic process {Xα :
α ∈ A} such that its family of finite dimensional distributions coincides
with QA ?
Kolmogorov (1956) showed that if QA satisfies C1 and C2, then such a
stochastic process does exist. This is known as Kolmogorov’s consistency
theorem (also known as Kolmogorov’s existence theorem).
Theorem 6.3.1: (Kolmogorov’s consistency theorem). Let A be a
nonempty set. Let QA ≡ {ν(α1 ,α2 ,...,αk ) : (α1 , α2 , . . . , αk ) ∈ Ak , 1 ≤ k < ∞}
be a family of probability distributions such that for each (α1 , α2 , . . . , αk ) ∈
Ak , 1 ≤ k < ∞,
6.3 Kolmogorov’s consistency theorem
201
(i) ν(α1 ,α2 ,...,αk ) is a probability distribution on (Rk , B(Rk )),
(ii) C1 and C2 hold, i.e., for all B1 , B2 , . . . , Bk ∈ B(R), 2 ≤ k < ∞,
ν(α1 ,α2 ,...,αk ) (B1 × B2 × · · · × Bk−1 × R)
= ν(α1 ,α2 ,...,αk−1 ) (B1 × B2 × · · · × Bk−1 )
(3.1)
and for any permutation (i1 , i2 , . . . , ik ) of (1, 2, . . . , k),
µ(αi1 ,αi2 ,...,αik ) (Bi1 × Bi2 × · · · × Bik )
= µ(α1 ,α2 ,...,αk ) (B1 × B2 × · · · × Bk ).
(3.2)
Then, there exists a probability space (Ω, F, P ) and a stochastic process
XA ≡ {Xα : α ∈ A} on (Ω, F, P ) such that QA is the family of finite
dimensional distributions associated with XA .
Remark 6.3.1: Thus the above theorem says that given the family QA
satisfying conditions (i) and (ii), there exists a real valued function on
A × Ω such that for each ω, f (·, ω)
! is a function on A and for" each
(α1 , α2 , . . . , αk ) ∈ Ak , the vector f (α1 , ω), f (α2 , ω), . . . , f (αk , ω) is a
random vector with probability distribution ν(α1 ,α2 ,...,αk ) . This random
function point of view is useful in dealing with functionals of the form
. .}, then one
M (ω) ≡ {sup f (α, ω) : α ∈ A}. For example, if A1 = {1, 2, . $
n
might consider functionals such as limn→∞ f (n, ω), limn→∞ n1 j=1 f (j, ω),
$∞
j=1 f (j, ω), etc. Since the random functionals are not fully determined
by f (α, ω) for finitely many α’s, it is not possible to compute probabilities of events defined in terms of these functionals from the knowledge of
the finite dimensional distribution of (f (α1 , ω), . . . , f (αk , ω)) for a given
(α1 , . . . , αk ), no matter how large k is. Kolmogorov’s consistency theorem
allows one to compute these probabilities given all finite dimensional distributions (provided that the functionals satisfy appropriate measurability
conditions).
Given a probability measure µ on (R, B(R)), now consider the problem
of constructing a probability space (Ω, F, P ) and a random variable X on
it with distribution µ. A natural solution is to set the sample space Ω to be
R, the σ-algebra F to be B(R), and the probability measure P to be µ and
the random variable X to be the identity map X(ω) ≡ ω. Similarly, given
a probability measure µ on (Rk , B(R)k ), one can set the sample space Ω to
be Rk and the σ-algebra F to be B(Rk ) and the probability measure P to
be µ and the random vector X to be the identity map.
Arguing in the same fashion, given a family QA of finite dimensional
distributions with index set A, to construct a stochastic process {Xα : α ∈
A} with index set A on some probability space (Ω, F, P ), it is natural to set
the sample space Ω to be RA , the collection of all real valued functions on
A, F to be a suitable σ-algebra that includes all finite dimensional events,
202
6. Probability Spaces
P to be an appropriate probability measure that yields QA , and X to be
the identity map.
These considerations lead to the following definitions.
Definition 6.3.3: Let A be a nonempty set. Then RA ≡ {f | f : A → R},
the collection of all real valued functions on A.
If A is a finite set {a1 , a2 , . . . , ak }, then RA can be identified with Rk by
associating each f ∈ RA with the vector (f (a1 ), f (a2 ), . . . , f (ak )) in Rk .
If A is a countably infinite set {a1 , a2 , a3 , . . .}, then RA can be similarly
identified with R∞ , the set of all sequences {x1 , x2 , x3 , . . .} of real numbers.
If A is the interval [0, 1], then RA is the collection of all real valued functions
on [0, 1].
Definition 6.3.4: Let A be a nonempty set. A subset C ⊂ RA is called a
finite dimensional cylinder set (fdcs) if there exists a finite subset A1 ⊂ A,
say, A1 ≡ {α1 , α2 , . . . , αk }, 1 ≤ k < ∞ and a Borel set B in B(Rk ) such
that C = {f : f ∈ RA and (f (α1 ), f (α2 ), . . . , f (αk )) ∈ B}. The set B is
called a base for C.
The collection of all finite dimensional cylinder sets will be denoted by
C.
The name cylinder is motivated by the following example:
Example 6.3.2: Let A = {1, 2, 3} and C = {(x1 , x2 , x3 ) : x21 + x22 ≤ 1}.
Then C is a cylinder (in the usual sense of the English word), but with
infinite height and depth. According to Definition 6.3.4, C is also a cylinder
in R3 with the unit circle in R2 as its base.
Examples 6.3.3 and 6.3.4 below are examples of fdcs, whereas Example
6.3.5 is an example of a set that is not a fdcs.
Example 6.3.3: Let A = {1, 2} and C = {(x1 , x2 ) : | sin 2πx1 | ≤
√1 }.
2
Example 6.3.4: Let A = {1, 2, 3, . . .} and C = {(x1 , x2 , x3 , . . .) :
x230
10
−
x242
5
x217
4
+
≤ 10}.
Example 6.3.5: Let A = {1,$2, 3, . . .} and D = {(x1 , x2 , x3 , . . .) : xj ∈ R
n
for all j ≥ 1 and limn→∞ n1 j=1 xj exists} is not a finite dimensional
cylinder set (Problem 6.8).
Proposition 6.3.2: Let A be a nonempty set and C be the collection of
all finite dimensional cylinder sets in RA . Then C is an algebra.
Proof: Let C1 , C2 ∈ C and let
C1
C2
!
"
= {f : f ∈ RA and f (α1 ), f (α2 ), . . . , f (αk ) ∈ B1 }
!
"
= {f : f ∈ RA and f (β1 ), f (β2 ), . . . , f (βj ) ∈ B2 }
6.3 Kolmogorov’s consistency theorem
203
for some A1 = {α1 , α2 , . . . , αk } ⊂ A, A2 = {β1 , β2 , . . . , βj } ⊂ A, B1 ∈
B(Rk ), B2 ∈ B(Rj ), 1 ≤ k < ∞, 1 ≤ j < ∞. Let A3 = A1 ∪
A2 = {γ1 , γ2 , . . . , γ, }, where without loss of generality (γ1 , γ2 , . . . , γk ) =
(α1 , α2 , . . . , αk ) and (γ,−j+1 , . . . , γ,−1 , γ, ) = (β1 , β2 , . . . , βj ). Then C1 and
C2 may be expressed as
!
"
C1 = {f : f ∈ RA and f (γ1 ), f (γ2 ), . . . , f (γ, ) ∈ B̃1 }
!
"
C2 = {f : f ∈ RA and f (γ1 ), . . . , f (γ, ) ∈ B̃2 }
where B̃1 = B1 × R,−k and B̃2 = R,−j × B2 . Thus, C1 ∪ C2 = {f : f ∈ RA
and (f (γ1 ), . . . , f (γ, )) ∈ B̃1 ∪ B̃2 }. Since both B̃1 and B̃2 lie in B(R, ),
C1 ∪ C2 ∈ C.
!
"
Next note that, C1c = {f : f ∈ RA and f (α1 ), . . . , f (αk ) ∈ B1c }. Since
B1c ∈ B(Rk ), it follows that C1c ∈ C. Thus, C is an algebra.
!
Remark 6.3.2: If A is a finite nonempty set, the collection C is also a
σ-algebra.
Definition 6.3.5: Let A be a nonempty set. Let RA be the σ-algebra
generated by the collection C. Then RA is called the product σ-algebra on
RA .
Remark 6.3.3: If A = {1, 2, 3, . . .} ≡ N and RN is identified with the
set R∞ of all sequences of real numbers, then the product σ-algebra RN
coincides with the Borel σ-algebra B(R∞ ) on R∞ under the metric
.
∞
)
|xj − yj |
1
d(x, y) =
(3.3)
2j 1 + |xj − yj |
j=1
for x = (x1 , x2 , . . .), y = (y1 , y2 , . . .) in R∞ (Problem 6.9).
Definition 6.3.6: Let A be a nonempty set. For any (α1 , α2 , . . . , αk ) ∈ Ak ,
1 ≤ k < ∞, the projection map π(α1 ,...,αk ) from RA to Rk is defined by
π(α1 ,α2 ,...,αk ) (f ) = (f (α1 ), f (α2 ), . . . , f (αk )).
In particular, for α ∈ A,
πα (f ) = f (α)
(3.4)
(3.5)
is called a co-ordinate map.
The projection map πA1 for any arbitrary subset A1 ⊂ A may be similarly
defined. The next proposition follows from the definition of RA .
Proposition 6.3.3:
(i) For each α ∈ A, the map πα from RA to R is 1RA , B(R)2-measurable.
(ii) For any (α1 , α2 , . . . , αk ) ∈ Ak , 1 ≤ k < ∞, the map π(α1 ,α2 ,...,αk )
from RA to Rk is 1RA , B(Rk )2-measurable.
204
6. Probability Spaces
Proof of Theorem 6.3.1: Let Ω = RA and F ≡ RA . Define a set function
P on C by
(3.6)
P (C) = µ(α1 ,α2 ,...,αk ) (B)
for a C in C with representation
!
"
C = {ω : ω ∈ RA , ω(α1 ), ω(α2 ), . . . , ω(αk ) ∈ B}.
(3.7)
The main steps in the proof are
(i) To show that P (C) as defined in (3.6) is independent of the representation (3.7) of C, and
(ii) P (·) is countably additive on C.
Next, by the Caratheodory extension theorem (Theorem 1.3.3), there exists
a unique extension of P (also denoted by P ) to F such that (Ω, F, P ) is
a probability space. Defining Xα (ω) ≡ πα (ω) = ω(α) for α in A yields a
stochastic process {Xα : α ∈ A} on the probability space
(RA , RA , P ) ≡ (Ω, F, P )
with the family QA as its set of finite dimensional distributions. Hence, it
remains to establish (i) and (ii). Let C ∈ C admit two representations:
!
"
C ≡ {ω : ω(α1 ), ω(α2 ), . . . , ω(αk ) ∈ B1 }
≡ π(α1 ,α2 ,...,αk ) (B1 )
and
C
≡
!
"
{ω : ω(β1 ), ω(β2 ), . . . , ω(βj ) ∈ B2 }
−1
(B2 )
≡ π(β
1 ,β2 ,...,βj )
for some A1 = {α1 , α2 , . . . , αk } ⊂ A, 1 ≤ k < ∞, and some A2 =
{β1 , β2 , . . . , βj } ⊂ A, 1 ≤ j < ∞, B1 ∈ B(Rk ) and B2 ∈ B(Rj ). Let A3 =
A1 ∪A2 = {γ1 , γ2 , . . . , γ, } and w.l.o.g., let (γ1 , γ2 , . . . , γk ) = (α1 , α2 , . . . , αk )
and (γ,−j+1 , γ,−j+2 , γ,−1 , γ, ) = (β1 , β2 , . . . , βj ). Then C may be represented as
C
= πγ−1
(B̃1 )
1 ,γ2 ,...,γ)
(B̃2 )
= πγ−1
1 ,γ2 ,...,γ)
where B̃1 = B1 ×R,−k and B̃2 = R,−j ×B2 . Note that (ω(γ1 ), . . . , ω(γ, )) ∈
B̃1 iff ω ∈ C iff (ω(γ1 ), . . . , ω(γ, )) ∈ B̃2 and thus
B̃1 = B̃2 .
(3.8)
6.3 Kolmogorov’s consistency theorem
205
Next by the first consistency condition (3.1) and induction,
ν(γ1 ,γ2 ,...,γ) ) (B̃1 ) = ν(α1 ,α2 ,...,αk ) (B1 ).
(3.9)
Also by (3.2), for B2 of the form B21 × B22 × · · · × B2j with B2i ∈ B(R)
for all 1 ≤ i ≤ j,
ν(γ1 ,γ2 ,...,γ) ) (R,−j × B2 ) = ν(γ)−j+1 ,...,γ) ,γ1 ,γ2 ,...,γ)−j ) (B2 × R,−j ).
Now note that
(a) ν(γ1 ,γ2 ,...,γ) ) (R,−j × B) and ν(γ)−j+1 ,...,γ) ,γ1 ,γ2 ,...,γ)−j ) (B × R,−j ), considered as set functions defined for B ∈ B(Rj ), are probability measures on B(Rj ),
(b) they coincide on the class Γ of sets of the form B = B21 × B22 × · · · ×
B2j with B2i ∈ B(R) for all i, and
(c) the class Γ is a π-class and it generates B(Rj ).
Hence, by the uniqueness theorem (Theorem 1.3.6),
ν(γ1 ,γ2 ,...,γ) ) (R,−j × B) = ν(γ)−j+1 ,...,γ) ,γ1 ,γ2 ,...,γ)−j ) (B × R,−j )
(3.10)
for all B ∈ B(Rj ).
Again by (3.1) and induction
ν(γ)−j+1 ,...,γ) ,γ1 ,γ2 ,...,γ)−j ) (B2 × R,−j )
= ν(γ)−j+1 ,...,γ) ) (B2 ) = ν(β1 ,β2 ,...,βj ) (B2 ).
Since B̃2 = R,−j × B2 , by (3.10) and (3.11)
ν(γ1 ,γ2 ,...,γ) ) (B̃2 ) = ν(β1 ,β2 ,...,βj ) (B2 ).
Now from (3.8) and (3.9) it follows that
ν(α1 ,...,αk ) (B1 )
=
ν(γ1 ,γ2 ,...,γ) ) (B̃1 )
= ν(γ1 ,γ2 ,...,γ) ) (B̃2 )
= ν(β1 ,β2 ,...,βj ) (B2 ),
thus establishing (i).
To establish (ii), it needs to be shown that
a
(ii) P (C1 ∪ C2 ) = P (C1 ) + P (C2 ) if C1 , C2 ∈ C and C1 ∩ C2 = ∅.
b
(ii) Cn ∈ C, Cn ⊃ Cn+1 for all n,
+
n≥1 Cn
= ∅ ⇒ P (Cn ) ↓ 0.
(3.11)
206
6. Probability Spaces
−1
−1
Let C1 = π(α
(B1 ) and C2 = π(β
(B2 ) for B1 ∈ B(Rk ), B2 ∈
1 ,...,αk )
1 ,...,βj )
j
B(R ), {α1 , . . . , αk } ⊂ A and {β1 , . . . , βj } ⊂ A, 1 ≤ j, k < ∞.
As in the proof of Proposition 6.3.2, C1 and C2 may be represented as
−1
(B̃i ), i = 1, 2,
Ci = π(γ
1 ,γ2 ,...,γ) )
where B̃i ∈ B(R, ). Since C1 and C2 are disjoint by hypothesis, it follows
that B̃1 and B̃2 are disjoint. Also, since
P (Ci ) = ν(γ1 ,γ2 ,...,γ) ) (B̃i ), i = 1, 2,
and ν(γ1 ,...,γ) ) (·) is a measure on B(R, ), it follows that
P (C1 ∪ C2 )
=
ν(γ1 ,...,γ) ) (B̃1 ∪ B̃2 )
= ν(γ1 ,...,γ) ) (B̃1 ) + ν(γ1 ,...,γ) ) (B̃2 )
=
P (C1 ) + P (C2 ),
thus proving (ii)a .
To prove (ii)b , note that for any sequence {Cn }n≥1 ⊂ C, there exists a
countable set A1 = {α1 , α2 , . . . , αn , . . .}, an increasing sequence {kn }n≥1
of positive integers and a sequence of Borel sets {Bn }n≥1 such that Bn ∈
−1
B(Rkn ) and Cn = π(α
(Bn ) for all n ∈ N. Now suppose that
1 ,α2 ,...,αkn )
{Cn }n≥1
+ is decreasing. It will be shown that if limn→∞ P (Cn ) = δ > 0,
then n≥1 Cn )= ∅. For each n, by the regularity of measures (Corollary
1.3.5), there exists a compact set Gn ⊂ Bn such that
ν(α1 ,...,αkn ) (Bn \ Gn ) <
δ
.
2n+1
−1
(Gn ). Then P (Cn \Dn ) <
Let Dn = π(α
1 ,α2 ,...,αkn )
Then {Hn }n≥1 is decreasing and
P (Cn \ Hn )
=
"
!
P Cn ∩ Hnc = P
≤
n
)
≤
n
)
j=1
j=1
δ
2n+1 .
-*
n
P (Cn \ Dj ) ≤
j=1
n
)
j=1
Let Hn =
(Cn ∩
Djc )
.
+n
j=1
Dj .
P (Cj \ Dj )
(since {Cn }n≥1 is decreasing)
δ
δ
< .
2j+1
2
Since P (Cn ) ↓ δ > 0, Hn ⊂ Cn , and P (Cn \ Hn ) < 2δ , it follows that
all n ≥ 1. This implies Hn )= ∅ for each n. It will now
P (Hn ) > 2δ for +
be shown that n≥1 Hn )= ∅. Let {ωn }n≥1 be a sequence of elements
6.3 Kolmogorov’s consistency theorem
207
from Ω = RA such that for each n, ωn ∈ Hn . Then, since {Hn }n≥1
is a decreasing sequence, for each 1 ≤ j < ∞, ωn ∈ Hj for n ≥ j.
This implies that the vector (ωn (α1 ), ωn (α2 ), . . . , ωn (αkj )) ∈ Gj for all
n ≥ j. Since G1 is compact, there exists a subsequence {n1i }i≥1 such that
limi→∞ ωn1i (α1 ) = ω(α1 ) exists. Next, since G2 is compact, there exists a
further sequence {n2i }i≥1 of {n1i }i≥1 such that limi→∞ ωn2i (α2 ) = ω(α2 )
exists. Proceeding this way and applying the usual ‘diagonal method,’ a
subsequence {ni }i≥1 is obtained such that limi→∞ ωni (αj ) = ω(αj ) for
all 1 ≤ j < ∞. Let ω(α) = 0 for α )∈ {α1 , α2 , . . .}. Since for each j,
Gj is+compact, (ω(α
+ 1 ), ω(α2 ), . . . , ω(α
+ kj )) ∈ Gj and hence ω ∈ Hj . Thus,
ω ∈ j≥1 Hj ⊂ j≥1 Cj implying j≥1 Cj )= ∅. The proof of the theorem
is now complete.
!
When the index set A is countable and identified with the set N ≡
{1, 2, 3, . . .}, it is possible to give a simpler formulation of the consistency
conditions.
Theorem 6.3.4: Let {µn }n≥1 be a sequence of probability measures such
that
(i) for each n ∈ N, µn is a probability measure on (Rn , B(Rn )),
(ii) for each n ∈ N, µn+1 (B × R) = µn (B) for all B ∈ B(Rn ).
Then there exists a stochastic process {Xn : n ≥ 1} on a probability space
(Ω, F, P ) with Ω = R∞ , F = B(R∞ ) such that for each n ≥ 1, the probability distribution P(X1 ,X2 ,...,Xn ) of the random vector (X1 , X2 , . . . , Xn ) is
µn .
Proof: For any {i1 , i2 , . . . , ik } ⊂ N, let j1 < j2 < · · · < jk be the increasing rearrangement of i1 , i2 , . . . , ik . Then there exists a permutation
(r1 , r2 , . . . , rk ) of (1, 2, . . . , k) such that j1 = ir1 , j2 = ir2 , . . . , jk = irk .
Now define
(·)
ν(j1 ,j2 ,...,jk ) (·) ≡ µjk πj−1
1 ,j2 ,...,jk
where πj1 ,j2 ,...,jk (x1 , . . . , xjk ) = (xj1 , xj2 , . . . , xjk ) for all (x1 , x2 , . . . , xjk ) ∈
Rjk .
Next define
ν(i1 ,i2 ,...,ik ) (B1 × B2 × . . . × Bk ) ≡ ν(j1 ,j2 ,...,jk ) (Br1 × Br2 × . . . × Brk )
where Bi ∈ B(R) for all i, 1 ≤ i ≤ k. It can be verified that this family of
finite dimensional distributions
QN ≡ {ν(i1 ,i2 ,...,ik ) (·) : {i1 , i2 , . . . , ik } ⊂ N, 1 ≤ k < ∞}
(3.12)
satisfies the consistency conditions (3.1) and (3.2) of Theorem 6.3.1 and
hence the assertion follows.
!
208
6. Probability Spaces
Example 6.3.6: (Sequence of independent random variables). Let
{Fn }n≥1 be a sequence of cdfs on R. Consider the problem of constructing
a sequence {Xn }n≥1 of random variables on a probability space (Ω, F, P )
such that (i) for each n ∈ N, Xn has cdf Fn and (ii) for any n ∈ N and any
{i1 , i2 , . . . , in } ⊂ N, the random variables {Xi1 , Xi2 , . . . , Xin } are independent, i.e.,
P (Xi1 ≤ x1 , Xi2 ≤ x2 , . . . , Xin ≤ xn ) =
n
K
Fij (xj )
(3.13)
j=1
for all x1 , x2 , . . . , xn in R.
This problem can be solved by using Theorem 6.3.4. Let µn be the
Lebesgue-Stieltjes probability measure on (Rn , B(Rn )) corresponding to
the distribution function
F1,2,...,n (x1 , x2 , . . . , xn ) ≡
n
K
j=1
F (xj ),
x1 , . . . , xn ∈ R.
It is easy to verify that the family {µn : n ≥ 1} satisfies (i) and (ii) of
Theorem 6.3.4. Hence, there exist a probability measure P on the sequence
space Ω ≡ R∞ equipped with σ-algebra F ≡ B(R∞ ) and random variables
Xn (ω) ≡ πn (ω) ≡ ω(n), for ω = (ω(1), ω(2), . . .) in R∞ , n ≥ 1, such that
(3.13) holds.
Example 6.3.7: (Family of independent random variables). Given a family {Fα : α ∈ A} of cdfs on R for some index set A, a construction similar to
Example 6.3.6, but using Theorem 6.3.1 yields the existence of a real valued
stochastic process {Xα : α ∈ A} such that for any {α1 , α2 , . . . , αn } ⊂ A,
1 ≤ n < ∞, the random variables {Xα1 , Xα2 , . . . , Xαn } are independent,
i.e., (3.13) holds.
Example 6.3.8: (Markov chains). Let Q = ((qij )) be a k × k stochastic
matrix for some 1 < k < ∞. That is,
(a) for all 1 ≤ i, j ≤ k, qij ≥ 0 and
(b) for each 1 ≤ i ≤ k,
$k
j=1 qij
= 1.
Let p = (p1 , p2 , . . . , pk ) be a probability vector, i.e., for all i, pi ≥ 0, and
$k
i=1 pi = 1. Consider the problem of constructing a sequence {Xn }n≥1 of
random variables such that for each n ∈ N,
P (X1 = j1 , X2 = j2 , . . . , Xn = jn ) = pj1 qj1 j2 . . . qjn−1 jn
for 1 ≤ ji ≤ k, i = 1, 2, . . . , n.
(3.14)
6.3 Kolmogorov’s consistency theorem
209
Let µn be the discrete probability distribution determined by the right
side of (3.14), that is,
µn ({(j1 , j2 , . . . , jn )}) = pj1 qj1 j2 . . . qjn−1 jn
for all (j1 , . . . , jn ) such that 1 ≤ ji ≤ k for all 1 ≤ i ≤ n. It is easy to
verify that {µn }n≥1 satisfies the conditions of Theorem 6.3.4 and hence
there exist a sequence {Xn }n≥1 of random variables satisfying (3.14). It
may be verified that (3.14) is equivalent to
P (Xn+1 = jn+1 |X1 = j1 , . . . , Xn = jn )
= qjn jn+1 = P (Xn+1 = jn+1 |Xn = jn )
(3.15)
for all n ≥ 1, 1 ≤ ji ≤ k, i = 1, 2, . . . , n + 1 provided P (X1 = j1 , . . . , Xn =
jn ) > 0 and P (X1 = j) = pj for 1 ≤ j ≤ k. This says that the conditional
distribution of Xn+1 given X1 , X2 , . . . , Xn depends only on Xn . This property is known as the Markov property, and the sequence {Xn }n≥1 is called
a Markov chain with state space S ≡ {1, 2, . . . , k} and time homogeneous
transition probability matrix ((qij )).
When the state space S = {1, 2, . . .}, the above construction goes over
with minor notational modifications.
Next consider the case S = R. A function Q : R × B(R) → [0, 1] is called
a probability transition function if
(i) for each x in R, Q(x, ·) is a probability measure on (R, B(R)) and
(ii) for each B in B(R), Q(·, B) is a Borel measurable function on R.
Let µ be a probability distribution on (R, B(R)). Using Theorem 6.3.4, it
can be shown that there exists a stochastic process {Xn }n≥1 such that
P (X1 ∈ B1 , X2 ∈ B2 , . . . , Xn ∈ Bn )
% %
%
=
···
Q(xn−1 , Bn )Q(xn−2 , dxn−1 ) · · · Q(x1 , dx2 )µ(dx1 ),
B1
B2
Bn−1
(3.16)
where
right" side of (3.16) is a well-defined probability measure on
! n
R , B(Rn ) (Problem 6.18). Such a sequence {Xn }n≥ is called a Markov
chain with state space R, initial distribution µ, and transition probability
function Q. For more on Markov chains, see Chapter 14.
Example 6.3.9: (Gaussian processes). Let A be a nonempty set and
{Xα : α ∈ A} be a stochastic process. Such a process is called Gaussian
if for {α1 , α2 , . . . , αk } ⊂ A and real numbers t1 , t2 , . . . , tk , the random
$k
variable
i=1 ti Xαi has a univariate normal distribution (with possibly
zero variance). For such a process, the functions µ(α) ≡ EXα and σ(α, β) ≡
210
6. Probability Spaces
Cov(Xα , Xβ ) are called the mean and covariance functions, respectively.
$k
Since Var( i=1 ti Xαi ) ≥ 0, it follows that for any t1 , t2 , . . . , tk ,
k )
k
)
i=1 j=1
ti tj σ(αi , αj ) ≥ 0.
(3.17)
This property of the covariance function σ(·, ·) is called nonnegative definiteness.
A natural question is: Given functions µ : A → R and σ : A × A → R
such that σ is symmetric and satisfies (3.17), does there exist a Gaussian
process {Xα : α ∈ A} with µ(·) and σ(·; ) as its mean and covariance
functions, respectively? The answer is yes and it follows from Theorem 6.3.1
by defining the family QA of finite dimensional distributions as follows. Let
ν(α1 ,α2 ,...,αk ) be the unique probability distribution on (Rk , B(Rk )) with the
moment generating function
M(α1 ,α2 ,...,αk ) (s1 , s2 , . . . , sk )
.
-)
k
k
k
1 ))
si µ(αi ) +
si sj σ(αi , αj )
(3.18)
= exp
2 i=1 j=1
i=1
!
"
for s1 , s2 , . . . , sk in R. If the matrix Σ ≡ (σ(αi , αj )) , 1 ≤ i, j ≤ k is
positive definite, i.e., it is such that in (3.17) equality holds iff ti = 0 for
all i, then ν(α1 ,...,αk ) (·) can be shown to be a probability measure that is
measure
on Rk with density
absolutely continuous w.r.t.
mk , the
" Lebesgue
!
"
"k "k !
!
"
1
1
|Σ|− 2 e− i=1 j=1 xi −µ(αi ) σ̃ij xj −µ(αj ) /2 where Σ̃ ≡ (σ̃ij ) =
(2π)k/2
Σ−1 , the inverse of Σ and |Σ| = the determinant of Σ.
The verification of conditions (3.1) and (3.2) for this family is left as an
exercise (Problem 6.12).
Remark 6.3.4: Kolmogorov’s consistency theorem (Theorem 6.3.1) remains valid when the real line R is replaced by a complete separable metric space S. More specifically, let A be a nonempty set and for
{α1 , α2 , . . . , αk } ⊂ A, 1 ≤ k < ∞, let ν(α1 ,α2 ,...,αk ) (·) be a probability measure on (Sk , B(Sk )). If the family QA ≡ {ν(α1 ,α2 ,...,αk ) : {α1 , α2 , . . . , αk } ⊂
A, 1 ≤ k < ∞} satisfies the natural analogs of (3.1) and (3.2), then
there exists a probability measure P on (Ω ≡ SA , F ≡ (B(S))A ) and
an S-valued stochastic process {Xα : α ∈ A} on (Ω, F, P ) such that
ν(α1 ,α2 ,...,αk ) (·) = P (Xα1 , Xα2 , . . . XαK )−1 (·). Here SA is the set of all S
valued functions on A, (B(S))A is the σ-algebra generated by the cylinder
sets of the form
C = {f : f : A → S, f (αi ) ∈ Bi , i = 1, 2, . . . , k}
where {α1 , α2 , . . . , αk } ⊂ A, Bi ∈ B(S), 1 ≤ i ≤ k, 1 ≤ k < ∞, and
also Xα (ω) is the projection map Xα (ω) ≡ ω(α). The main step in the
6.3 Kolmogorov’s consistency theorem
211
proof of Theorem 6.3.1 was to establish the countable additivity of the set
function P on the algebra C of finite dimensional cylinder sets. This in turn
depended upon the fact that any probability measure µ on (Rk , B(Rk )) for
1 ≤ k < ∞ is regular, i.e., for every Borel set B in B(Rk ) and for every
# > 0, there exists a compact set G ⊂ B such that µ(B\G) < #. If S is a
Polish space, then any probability measure on (Sk , (B(S))k ), 1 ≤ k < ∞ is
regular (see Billingsley (1968)), and hence, the main steps in the proof of
Theorem 6.3.1 go through in this case.
Remark 6.3.5: (Limitations of Theorem 6.3.1). In this construction, Ω =
RA is rather large and the σ-algebra F ≡ (B(R))A is not large enough to
include many events of interest when the index set A is uncountable. In
fact, it can be shown that F coincides with the class of all sets G ⊂ Ω that
depend only on a countable number of coordinates of ω. More precisely,
the following holds.
Proposition 6.3.5: The σ-algebra
F
=
−1
{G : G = πA
(B) for some B in B(R∞ )
1
and A1 ⊂ A, A1 countable}.
(3.19)
Proof: Verify that the right side of (3.19) is a σ-algebra containing the
class C of cylinder sets and also that, it is contained in F.
!
For example, if A = [0, 1], then the set C[0, 1] of all continuous functions
from [0, 1] → R is not a member of F ≡ (B(R))A . Similarly, if M (ω) ≡
sup{|ω(α)| : α ∈ [0, 1]}, then the set {M (ω) ≤ 1} is not in F = (B(R))[0,1] .
When A is an interval in R, this difficulty can be overcome in several ways.
One approach pioneered by J.L. Doob is the notion of separable stochastic
processes (Doob (1953)). Another approach pioneered by Kolmogorov and
Skorohod is to restrict Ω to the class of all continuous functions or functions
that are right continuous and have left limits (Billingsley (1968)). For more
on stochastic processes, see Chapter 15.
Independent Random Experiments
If E1 and E2 are two random experiments with associated probability
spaces (Ω1 , F1 , P1 ) and (Ω2 , F2 , P2 ), it is possible to model the experiment
of performing both E1 and E2 independently by the product probability
space (Ω1 × Ω2 , F1 × F2 , P1 × P2 ) (see Chapter 5). The same idea carries
over to an arbitrary collection {Eα : α ∈ A} of random experiments. It
is possible to think of a grand experiment E in which all the Eα ’s are
independent components by considering the product probability space
(×α∈A Ωα , ×α∈A Fα , ×α∈A Pα ) ≡ (Ω, F, P )
(3.20)
where (Ωα , Fα , Pα ) is the probability space corresponding to Eα . Here Ω ≡
×α∈A Ωα is the collection of all functions ω on A such that ω(α) ∈ Ωα ,
212
6. Probability Spaces
F ≡ ×α∈A Fα is the σ-algebra generated by finite dimensional cylinder sets
of the form
C = {ω : ω(αi ) ∈ Bαi , i = 1, 2, . . . , k},
(3.21)
1 ≤ k < ∞, {α1 , α2 , . . . , αk } ⊂ A, Bαi ∈ Fαi and P ≡ ×α∈A Pα is the
probability measure on F such that for C of (3.21),
P (C) =
k
K
Pαi (Bαi ).
(3.22)
i=1
The proof of the existence of such a P on F is an application of the extension theorem (Theorem 1.3.3). The verification of countable additivity on
the class C of cylinder sets is not difficult. See Kolmogorov (1956).
6.4 Problems
6.1 Let µ1 = µ2 be the probability distribution on Ω = {1, 2} with
µ1 ({1}) = 1/2. Find two distinct probability distributions on Ω × Ω
with µ1 and µ2 as the set of marginals.
6.2 Let Ω = (0, 1), F = B((0, 1)) and P be the Lebesgue measure on
(0,1). Let X(ω) = − log ω, h(x) = x2 and Y = h(X). Find PX and
PY and evaluate EY by applying the change of variables formula
(Proposition 6.2.1).
6.3 In the change of variables formula, one of the three integrals is usually
easier to evaluate than the other two. In this problem, in part (a),
the first integral is easier to evaluate than the other two while in part
(b), the second one is easier.
(a) Let Z ∼ N (0, 1), X = Z 2 , and Y = e−X .
(i) Find the distributions PX and PY on (R, B(R)).
(ii) Compute the integrals
%
%
%
−z 2
−x
e φ(z)dz,
e PX (dx) and
yPY (dy),
R
R
where φ(z) = √12π e−z
three integrals agree.
2
R
/2
, −∞ < z < ∞. Verify that all
(b) Let X1 , X2 , . . . , Xn be iid N (0, 1) random variables. Let Y =
(X1 + · · · + Xk ) and Z = Y 2 .
(i) Find the distributions of Y and Z.
&
(ii) &Evaluate
(x1& + · · · + xk )2 dPX1 ,...,Xk (x1 , . . . , xk ),
Rk
2
y PY (dy), and R+ zPZ (dz).
R
6.4 Problems
213
(c) Let X1 , X2 , . . . , Xk be independent Binomial (ni , p), i =
1, 2, . . . , k random variables. Let Y = (X1 + · · · + Xk ).
(i) Find the distribution PY of Y .
&
(ii) &Evaluate Rk (x1 + · · · + xk )dPX1 ,...,Xk (x1 , . . . , xk ) and
yPY (dy).
R
6.4 Let X be a random variable such that MX (t) ≡ E(etX ) < ∞ for
|t| < # for some # > 0.
(a) Show that E(etX |X|r ) < ∞ for all r > 0 and |t| < #.
(r)
(b) Show that MX (t), the rth derivative of MX (t) for r ∈ N, satisfies
(r)
MX (t) = E(etX X r ) for |t| < #.
(c) Verify (2.25).
(Hint: (a) First show that for t1 ∈ (−#, #), there exist a t2 ∈ (−#, #)
such that |t1 | < |t2 | < # and for some C < ∞, et1 x |x|r ≤ Ce|t2 x| for
all x in R.
(b) Verify that for all x ∈ R, |ex − 1| ≤ |x|e|x| . Now use (a) and the
(1)
DCT to show that MX (t) is differentiable and MX (t) = E(etX X)
for all |t| < #. Now complete the proof by induction.)
6.5 Let X be a random variable.
(a) Show that φ(r) ≡ (E|X|r )1/r is nondecreasing on (0, ∞).
(b) Show that φ(r) ≡ log E|X|r is convex in (0, r0 ) if E|X|r0 < ∞.
(c) Let M = sup{x : P (|X| > x) > 0}. Show that
(i) lim φ(r) = M .
r↑∞
(ii) lim
n→∞
E|X|n+1
E|X|n
= M.
(Hint: For M < ∞, note that E|X|r ≥ (M − #)r P (|X| > M − #) for
any # > 0.)
6.6 Show that if equality holds in (2.20), then there exist constants a and
b such that P (Y = aX + b) = 1.
(Hint: Show that there exist a constant a such that Var(Y − aX) =
0.)
6.7 Determine C and its base B explicitly in Examples 6.3.3 and 6.3.4.
6.8 (a) Show that D in Example 6.3.5 is not a finite dimensional cylinder set.
$n
(Hint: Note that lim n1 j=1 xj is not determined by the valn→∞
ues of finitely many xi ’s.)
214
6. Probability Spaces
(b) Find three other such examples of sets D in R∞ that are not
finite dimensional cylinder sets.
6.9 Establish the assertion in Remark 6.3.3 by completing the following
steps:
(a) Show that the coordinate map fn (x) ≡ xn from R∞ to R is continuous under the metric d of (3.3). (Conclude, using Example
1.1.6, that RN ⊂ B(R∞ )).
(b) Let C1 ≡ {A : A = (a1 , b1 ) × · · · × (ak , bk ) × R∞ , −∞ ≤ ai <
bi ≤ ∞, 1 ≤ i ≤ k, for some k < ∞} and C2 ≡ {A : A is an open
ball in (R∞ , d)}. Show that σ1C2 2 ⊂ σ1C1 2.
(c) Show that σ1C2 2 = B(R∞ ) by showing that every open set in
(R∞ , d) is a countable union of open balls.
6.10 Show that the family QN defined in (3.12) satisfies the consistency
conditions (3.1) and (3.2) of Theorem 6.3.1.
6.11 Verify that the family of finite dimensional distributions defined by
the right side of (3.14) satisfies the conditions of Theorem 6.3.4.
6.12 Verify that the family of distributions defined in (3.18) satisfies conditions (3.1) and (3.2) of Theorem 6.3.1.
(Hint: Use the fact that for any k ≥ 1, any µ = (µ1 , µ2 , . . . , µk ) ∈ Rk ,
and any nonnegative definite k × k matrix Σ ≡ ((σij ))k×k , there is a
unique probability distribution ν such that for any s = (s1 , s2 , . . . , sk )
in Rk ,
%
Rk
=
exp
exp
-)
k
-)
k
i=1
i=1
.
si xk ν(dx)
si µi +
.
k
k
1 ))
si sj σij .
2 i=1 j=1
Observe that this implies that for s = (s1 , s2 , . . . , sk ) in Rk , the in$k
duced distribution (under ν) on R by the map g(x) =
i=1 si xi
$
k
k
from R → R is univariate normal with mean i=1 si µi and variance
$k $k
i=1
j=1 si sj σij .)
6.13 Show that the set D ≡ C[0, 1] of continuous functions from [0, 1] to
R is not a member of the σ-algebra F ≡ (B(R))[0,1] .
−1
(B)
(Hint: If D ∈ F, then by Proposition 6.3.5, D is of the form πA
1
∞
for some B in B(R ), where A1 ⊂ [0, 1] is countable. Show that for
any such A1 and B, there exist functions f : [0, 1] → R such that
−1
f ∈ πA
(B) but f is not continuous on [0, 1].)
1
6.4 Problems
6.14 Show that K ≡
(B(R))[0,1] .
:
ω : ω ∈ R[0,1] , sup |ω(α)| < 1
0≤α≤1
;
215
is not in F ≡
(Hint: Observe that sup |ω(α)| is not determined by the values of
0≤α≤1
ω(α) for countably many α’s.)
!
"
6.15 Let {µi }i≥1 be a sequence of probability distributions on R, B(R)
and let µ be a probability distribution on N with pi ≡ µ({i}), i ≥ 1.
$
(a) Verify that ν(·) ≡ i≥1 pi µi (·) is a probability distribution on
!
"
R, B(R) .
(b) (i) Show that there exists a probability space (Ω, F, P ) and a
collection of independent random variables {J, X1 , X2 , . . .}
on (Ω, F, P ) such that for each i ≥ 1, Xi has distribution
µi and J ∼ µ.
(ii) Let Y = XJ , i.e., Y (ω) ≡ XJ(ω) (ω). Show that Y is a
random variable on (Ω, F, P ) and Y ∼ ν.
6.16 Let F be a cdf on R and let F be decomposed as
F = αFd + βFac + γFsc
where α, β, γ ∈ [0, 1] and α + β + γ = 1 and Fd , Fac , Fsc are discrete,
absolutely continuous, and singular continuous cdfs on R (cf. (4.5.3)).
Show that there exist independent random variables X1 , X2 , X3 and
J on some probability space such that X1 ∼ Fd , X2 ∼ Fac , X3 ∼ Fsc ,
P (J = 1) = α, P (J = 2) = β, P (J = 3) = γ and XJ ∼ F , where ∼
means “has cdf”.
!
"
6.17 Let µ be a probability
on R, B(R) . Let for each x in R,
! measure
"
F (x, ·) be a cdf on R, B(R) . Let ψ(x, t) ≡ inf{y : F (x, y) ≥ t}, for
x in R, 0 < t < 1. Assume that ψ(·, ·) : R × (0, 1) → R is measurable.
Let X and U be independent random variables on some probability
space (Ω, F, P ) such that X ∼ µ and U ∼ uniform (0,1).
(a) Show that Y = ψ(X, U ) is a random variable.
&
(b) Show that P (Y ≤ y) = R F (x, y)µ(dx). (The distribution of Y
is called a mixture of distributions with µ as the mixing distribution. This is of relevance in Bayesian statistical inference.)
6.18 (a) Let (Si , Si ), i = 1, 2 be two measurable spaces. Let µ be a probability measure on (S1 , S1 ) and let Q : S1 × S2 → [0, 1] be such
that for each x in S1 , Q(x, ·) is a probability measure on (S2 , S2 )
and for each B in S2 , Q(·, B) is S1 -measurable. Define
%
Q(x, B2 )µ(dx)
ν(B1 × B2 ) ≡
B1
216
6. Probability Spaces
on C ≡ {B1 × B2 : Bi ∈ si , i = 1, 2}. Show that ν can be
extended to be a probability measure on σ1C2 ≡ S1 × S2 .
(b) Let µ and Q be as in Example 6.3.8 (cf. (3.16)). For each n ≥ 1
let νn be a set function defined by the recursive scheme
ν1 (·)
νn+1 (A × B)
=
µ(·),
%
=
Q(x, B)νn (dx), A ∈ B(Rn ), B ∈ B(R).
A
Show that for
! each n, "νn can be extended to be a probability
measure on Rn , B(Rn ) . (Thus the right side of (3.16) is defined
to be νn (B1 × B2 × · · · × Bn ).)
6.19 (Bayesian paradigm).& Consider the setup in Problem 6.18 (a). Let
λ(B) ≡ ν(S1 × B) = S1 Q(x, B)µ(dx) for all B in S2 .
(a) Verify that λ is a probability measure on (S2 , S2 ).
(b) Now fix B1 in S1 . Show that there exists a function Q̃(x, B1 ),
S2 → [0, 1] that is 1S2 , B(R)2-measurable such that
%
Q̃(x, B1 )λ(dx).
ν(B1 × B2 ) =
B2
(Hint: Apply the Radon-Nikodym theorem to the pair ν(B1 ×·)
and λ(·).)
(c) Let Ω = S1 × S2 , F = σ1C2. For ω = (s1 , s2 ), let θ(ω) = s1 and
X(ω) = s2 . Think of θ as the parameter, X as the data, Q(θ, ·)
as the distribution of X given θ, µ(·) as the prior distribution of
θ and Q̃(x, B1 ) as the posterior probability that θ is in B1 given
the data X = x. Compute Q̃(x, B1 ) when (Si , Si ) = (R, B(R)),
i = 1, 2, µ(·) ∼ N (0, 1), Q(θ, ·) ∼ N (θ, 1).
6.20 Let X be a random variable on some probability space (Ω, F, P ).
Recall that a random variable X is
(a) discrete if there is a finite or countable set D ≡ {aj : 1 ≤ j ≤
k ≤ ∞} such that P (X ∈ D) = 1,
(b) continuous if for every x ∈ R, P (X = x) = 0 or equivalently the
cdf FX (·) is continuous on all of R,
(c) absolutely continuous if there exists a nonnegative Borel measurable function fX (·) on R such that for any −∞ < a < b < ∞,
%
fX (·)dm
P (a < X ≤ b) =
(a,b]
or equivalently the induced measure P X −1 is ; m,
6.4 Problems
217
(d) singular if P X −1 ⊥m or equivalently FX (·) = 0 a.e. m,
(e) singular continuous if it is singular and continuous.
Let g : R → R be Borel measurable and Y = g(X).
(a) Show that if X is discrete then so is Y but not conversely.
(b) Show that if X is continuous and g is (1–1) on the range of X,
then Y is continuous.
(c) Show if X is absolutely continuous with pdf fX (·) and g is absolutely continuous on bounded intervals such that g $ (·) > 0 a.e.
(m), then Y is also absolutely continuous with pdf
!
"
fX g −1 (y)
fY (y) = $ ! −1 " .
g g (y)
(d) Let X be as in (c) above. Suppose g is absolutely continuous on
bounded intervals and there
# exist disjoint intervals {Ij }1≤j≤k ,
1 ≤ k ≤ ∞, such that 1≤j≤k Ij = R and for each j, either
g $ (·) > 0 a.e. (m) on Ij or g $ (·) < 0 a.e. (m) on Ij . Show that Y
is also absolutely continuous with pdf
fY (y) =
)
xj ∈D(y)
fX (xj )
|g $ (xj )|
where D{y} ≡ {xj : xj ∈ Ij , g(xj ) = y}.
(e) Use (c) to compute the pdf of Y when
(i)
(ii)
(iii)
(iv)
X
X
X
X
∼ N (0, 1), g(x) = ex .
∼ N (0, 1), g(x) = x2 .
∼ N (0, 1), g(x) = sin 2πx.
∼ exp(1), g(x) = e−x .
6.21 (Simple random sampling without replacement). Let S
≡
{1, 2, . . . , m}, 1 < m < ∞. Fix 1 ≤ n ≤ m. Choose an element
1
X1 from S such that the probability that X1 = j is m
for all j ∈ S.
Next, choose an element X2 from S − {X1 } such that the probability
1
for j ∈ S − {X1 }. Continue this procedure for n
that X2 = j is (m−1)
steps. Write the outcome as the ordered vector ω ≡ (X1 , X2 , . . . , Xn ).
(a) Identify the sample space Ω, the σ-algebra F and the probability
measure P for this experiment.
(b) Show that for any permutation σ of {1, 2, . . . , n}, the random
vector Yσ = (Xσ(1) , Xσ(2) , . . . , Xσ(n) ) has the same distribution
as (X1 , X2 , . . . , Xn ).
218
6. Probability Spaces
(c) Conclude that {Xi }1≤i≤n are identically distributed and that
EXi , Cov(Xi , Xj ), i )= j are independent of i and j and compute
them.
(d) Answer the same questions (a)–(c) if the sampling is changed
to with replacement, i.e., at each stage i, the probability that
1
for all j ∈ S.
P (Xi = j) = m
(e) In (d), let D be the number of distinct units in the sample. Find
E(D) and Var(D).
6.22 Let X be a nonnegative random variable. Show that
L
L
1 + (EX)2 ≤ E 1 + X 2 ≤ 1 + EX.
√
(Note that f (x) ≡ 1 + x2 is convex on [0, ∞) and bounded by 1+x.)
6.23 Let X and Y be nonnegative random variables defined on a probability space (Ω, F, P ). Suppose X · Y ≥ 1 w.p. 1. Show that
EX · EY ≥ 1.
√ √
(Hint: Use Cauchy-Schwarz on X Y .)
!
"
6.24 Let µ be a probability measure on R, B(R) . Show that there is a
random variable X on the Lebesgue space ([0, 1], B([0, 1]), m) such
−1
that
! k m Xk " ≡ µ where m is the Lebesgue measure. Extend this to
R , B(R ) , where k is an integer > 1.
(Note: This is true for any Polish space, i.e., a complete separable
metric space, see Billingsley (1968).)
7
Independence
7.1 Independent events and random variables
Although a probability space is nothing more than a measure space with the
measure of the whole space equal to one, probability theory is not merely
a subset of measure theory. A distinguishing and fundamental feature of
probability theory is the notion of independence.
Definition 7.1.1:
Let (Ω, F, P ) be a probability space and
{B1 , B2 , . . . , Bn } ⊂ F be a finite collection of events.
(i) B1 , B2 , . . . , Bn are called independent w.r.t. P , if
P
-,
k
j=1
Bij
.
=
k
K
P (Bij )
(1.1)
j=1
for all {i1 , i2 , . . . , ik } ⊂ {1, 2, . . . , n}, 1 ≤ k ≤ n.
(ii) B1 , B2 , . . . , Bn are called pairwise independent w.r.t. P if P (Bi ∩
Bj ) = P (Bi )P (Bj ) for all i, j, i )= j.
Note that a collection B1 , B2 , . . . , Bn of events may be independent with
respect to one probability measure P but not with respect to another measure P $ . Note also that pairwise independence does not imply independence
(Problem 7.1).
Definition 7.1.2: Let (Ω, F, P ) be a probability space. A collection of
events {Bα , α ∈ A} ⊂ F is called independent w.r.t. P if for every finite
220
7. Independence
subcollection {α1 , α2 , . . . , αk } ⊂ A, 1 ≤ k < ∞,
-,
. K
k
k
P
Bαi =
P (Bαi ).
i=1
(1.2)
i=1
Definition 7.1.3: Let (Ω, F, P ) be a probability space. Let A be a
nonempty set. For each α in A, let Gα ⊂ F be a collection of events.
Then the family {Gα : α ∈ A} is called independent w.r.t P if for every
choice of Bα in Gα for α in A, the collection of events {Bα : α ∈ A} is
independent w.r.t. P as in Definition 7.1.2.
Definition 7.1.4: Let (Ω, F, P ) be a probability space and let {Xα : α ∈
A} be a collection of random variables on (Ω, F, P ). Then the collection
{Xα : α ∈ A} is called independent w.r.t. P if the family of σ-algebras
{σ1Xα 2 : α ∈ A} is independent w.r.t. P , where σ1Xα 2 is the σ-algebra
generated by Xα , i.e.,
σ1Xα 2 ≡ {Xα−1 (B) : B ∈ B(R)}.
(1.3)
Note that the collection {Xα : α ∈ A} is independent iff for any
{α1 , α2 , . . . , αk } ⊂ A, and Bi ∈ B(R), for i = 1, 2, . . . , k, 1 ≤ k < ∞,
P (Xαi ∈ Bi , i = 1, 2, . . . , k) =
k
K
i=1
P (Xαi ∈ Bi ).
(1.4)
It turns out that if (1.4) holds for all Bi of the form Bi = (−∞, xi ], xi ∈ R,
then it holds for all Bi ∈ B(R), i = 1, 2, . . . , k. This follows from the
proposition below.
Proposition 7.1.1: Let (Ω, F, P ) be a probability space. Let A be a
nonempty set. Let Gα ⊂ F be a π-system for each α in A. Let {Gα : α ∈ A}
be independent w.r.t. P . Then the family of σ-algebras {σ1Gα 2 : α ∈ A} is
also independent w.r.t. P .
Proof: Fix 2 ≤ k < ∞, {α1 , α2 , . . . , αk } ⊂ A, Bi ∈ Gαi , i = 1, 2, . . . , k − 1.
Let
/
- k−1
.
0
K
L ≡ B : B ∈ σ1Gαk 2, P (B1 ∩ · · · ∩ Bk−1 ∩ B) =
P (Bi ) P (B) .
i=1
(1.5)
7.1 Independent events and random variables
221
It is easy to verify that L is a λ-system. By hypothesis, L contains the
π-system Gαk . Hence by the π-λ theorem (cf. Theorem 1.1.2), L = σ1Gα 2.
Iterating the above argument k times completes the proof.
!
Corollary 7.1.2: A collection {Xα : α ∈ A} of random variables
on a probability space (Ω, F, P ) is independent w.r.t. P iff for any
{α1 , α2 , . . . , αk } ⊂ A and any x1 , x2 , . . . , xk in R, the joint cdf Fα1 ,α2 ,...,αk
of (Xα1 , Xα2 , . . . , Xαk ) is the product of the marginal cdfs Fαi , i.e.,
Fα1 ,α2 ,...,αk (x1 , x2 , . . . , xk ) ≡ P (Xαi ≤ xi , i = 1, 2, . . . , k)
=
k
K
i=1
k
K
P (Xαi ≤ xi ) =
Fαi (xi ).
(1.6)
i=1
!
"
Proof: For the ‘if’ part let Gα ≡ {Xα−1 (−∞, x] : x ∈ R}, α ∈ A. Now
apply Proposition 7.1.1. The only if part is easy.
!
Remark 7.1.1: If the probability distribution of (Xα1 , Xα2 , . . . , Xαk ) is
absolutely continuous w.r.t. the Lebesgue measure mk on Rk , then (1.6)
and hence the independence of {Xα1 , Xα2 , . . . , Xαk } is equivalent to the
condition that
fα1 ,α2 ,...,αk (x1 , x2 , . . . , xk ) =
k
K
fαi (xi ),
(1.7)
i=1
a.e. (mk ), where f(α1 ,α2 ,...,αk ) is the joint density of (Xα1 , Xα2 , . . . , Xαk ),
and fαi is the marginal density of Xαi , i = 1, 2, . . . , k. See Problem 7.18.
Proposition 7.1.3:
Let (Ω, F, P ) be a probability space and let
{X1 , X2 , . . . , Xk }, 2 ≤ k < ∞ be a collection of random variables on
(Ω, F, P ).
(i) Then {X1 , X2 , . . . , Xk } is independent iff
E
k
K
hi (Xi ) =
i=1
k
K
Ehi (Xi )
(1.8)
i=1
for all bounded Borel measurable functions hi : R → R, i = 1, 2, . . . , k.
(ii) If X1 , X2 are independent and E|X1 | < ∞, E|X2 | < ∞, then
E|X1 X2 | < ∞
and
EX1 X2 = EX1 EX2 .
(1.9)
Proof:
(i) If (1.8) holds, then taking hi = IBi with Bi ∈ B(R), i =
1, 2, . . . , k yields the independence of {X1 , X2 , . . . , Xk }. Conversely,
222
7. Independence
if {X1 , X2 , . . . , Xk } are independent, then (1.8) holds for hi = IBi
for Bi ∈ B(R), i = 1, 2, . . . , k, and hence for simple functions
{h1 , h2 , . . . , hk }. Now (1.8) follows from the BCT.
(ii) Note that by the change of variable formula (Proposition 6.2.1)
%
|x1 x2 |dPX1 ,X2 (x1 , x2 ),
E|X1 X2 | =
R2
E|Xi | =
%
R
|xi |dPXi (xi ),
i = 1, 2,
where PX1 ,X2 is the joint distribution of (X1 , X2 ) and PXi is the
marginal distribution of Xi , i = 1, 2. Also, by the independence of
X1 and X2 , PX1 ,X2 is equal to the product measure PX1 ×PX2 . Hence,
by Tonelli’s theorem,
%
E|X1 X2 | =
|x1 x2 |dPX1 ,X2 (x1 , x2 )
R2
=
=
%
R2
-%
|x1 x2 |dPX1 (x1 )dPX2 (x2 )
R
|x1 |dPX1 (x1 )
.- %
R
= E|X1 |E|X2 | < ∞.
|x2 |dPX2 (x2 )
.
Now using Fubini’s theorem, one gets (1.9).
!
Remark 7.1.2: Note that the converse to (ii) above need not hold. That is,
if X1 and X2 are two random variables such that E|X1 | < ∞, E|X2 | < ∞,
E|X1 X2 | < ∞, and EX1 X2 = EX1 EX2 , then X1 and X2 need not be
independent.
7.2 Borel-Cantelli lemmas, tail σ-algebras, and
Kolmogorov’s zero-one law
In this section some basic results on classes of independent events are established. These will play an important role in proving laws of large numbers
in Chapter 8.
Definition 7.2.1: Let (Ω, F) be a measurable space and {An }n≥1 be a
sequence of sets in F. Then
.
∞ - *
,
An
(2.1)
lim sup An ≡ lim An ≡
n→∞
k=1
n≥k
7.2 Borel-Cantelli lemmas, tail σ-algebras, and Kolmogorov’s zero-one law
lim inf An
n→∞
≡
lim An ≡
∞ ,
*
223
(2.2)
An .
k=1 n≥k
Proposition 7.2.1: Both lim An and lim An ∈ F and
lim An
lim An
= {ω : ω ∈ An for infinitely many n}
= {ω : ω ∈ An for all but a finite number of n}.
Proof: Since {An }n≥1 ⊂ F and F is a σ-algebra, Bk =
+∞
each k ∈ N and hence lim An ≡ k=1 Bk ∈ F. Next,
ω ∈ lim An
⇐⇒ ω ∈ Bk
#
n≥k
An ∈ F for
for all k = 1, 2, ...
⇐⇒ for each k, there exists nk ≥ k
⇐⇒ ω ∈ An for infinitely many n.
such that
The proof for lim An is similar.
ω ∈ A nk
!
In probability theory, lim An is referred to as the event that “An happens
infinitely often (i.o.)” and lim An as the event that “all but a finitely many
An ’s happen.”
Example 7.2.1: Let Ω = R, F = B(R), and let
@ B
C
0, n1 for n odd
B
C
An =
1 − n1 , 1 for n even.
Then lim An = {0, 1}, lim An = ∅.
The following result on the probabilities of lim An and lim An is very
useful in probability theory.
Theorem 7.2.2: Let (Ω, F, P ) be a probability space and {An }n≥1 be a
sequence of events in F. Then
∞
$
(a) (The first Borel-Cantelli lemma).
If
P (An ) < ∞, then
n=1
P (lim An ) = 0.
(b) (The second Borel-Cantelli lemma). If
∞
$
n=1
P (An ) = ∞ and {An }n≥1
are pairwise independent, then P (lim An ) = 1.
Remark 7.2.1: This result is also called a zero-one law as it asserts that
for
$∞pairwise independent events {An }n≥1 , P (lim An ) = 0 or 1 according to
n=1 P (An ) < ∞ or equal to ∞.
224
7. Independence
Proof:
$n
$∞
(a) Let Zn ≡ j=1 IAj . Then Zn ↑ Z ≡ j=1 IAj and by the MCT,
$n
$∞
EZn ≡ j=1 P (Aj ) ↑ EZ. Thus, j=1 P (Aj ) < ∞ ⇒ EZ < ∞ ⇒
Z < ∞ w.p. 1 ⇒ P (Z = ∞) = 0. But the event lim An = {Z = ∞}
and so (a) follows.
(b) Without loss of generality, assume P (Aj ) > 0 for some j. Let Jn =
Zn
EZn for n ≥ j where Zn is as above. Then, EJn = 1 and by the
pairwise independence of {An }n≥1 , the variance of Jn is
n
)
P (Aj )(1 − P (Aj ))
1
.
≤
(EZn )2
(EZn )
$∞
$n
If j=1 P (Aj ) = ∞, then EZn = j=1 P (Aj ) ↑ ∞, by the MCT.
Thus EJn ≡ 1, Var(Jn ) → 0 as n → ∞. By Chebychev’s inequality,
for all # > 0,
Var(Jn ) =
j=1
Var(Jn )
→ 0 as n → ∞.
#2
Thus, Jn → 1 in probability and hence there exists a subsequence
{nk }k≥1 such that Jnk → 1 w.p. 1 (cf. Theorem 2.5.2). Since EZnk ↑
∞, this implies that Znk → ∞ w.p. 1. But {Zn }n≥1 is nondecreasing
in n and hence Zn ↑ ∞ w.p. 1. Now since lim An = {Z = ∞}, it
!
follows that P (lim An ) = P (Z = ∞) = 1.
P (|Jn − 1| > #) ≤
Proposition 7.2.3: Let {Xn }n≥1 , be a sequence of random variables on
some probability space (Ω, F, P ).
$∞
(a) If n=1 P (|Xn | > #) < ∞ for each # > 0, then
P ( lim Xn = 0) = 1.
n→∞
(b) If {Xn }n≥ are pairwise independent and P ( lim Xn = 0) = 1, then
n→∞
$∞
P
(|X
|
>
#)
<
∞
for
each
#
>
0.
n
n=1
Proof:
$∞
(a) Fix # > 0. Let An = {|Xn | > #}, n ≥ 1. Then n=1 P (An ) < ∞ ⇒
P (lim An ) = 0, by the first Borel-Cantelli lemma (Theorem 7.2.2 (a)).
But
(lim An )c
=
{ω : there exists n(ω) < ∞ such that for all
n ≥ n(ω), w )∈ An }
= {ω : there exists n(ω) < ∞ such that |Xn (ω)| ≤ #
= B$ , say.
for all n ≥ n(ω)}
7.2 Borel-Cantelli lemmas, tail σ-algebras, and Kolmogorov’s zero-one law
Thus,
that
$∞
n=1
P (An ) < ∞ ⇒ P (B$ ) = 1. Let B =
Since P (B ) ≤
c
3
+∞
r=1
225
B r1 . Now note
∞
4 ,
ω : lim |Xn (ω)| = 0 =
B r1 .
n→∞
$∞
r=1
r=1
P (B 1 ) = 0, P (B) = 1.
c
r
$∞
(b) Let {Xn }n≥1 be pairwise independent and n=1 P (|Xn | > #0 ) = ∞
for some #0 > 0. Let An = {|Xn | > #0 }. Since {Xn }n≥1 are pairwise
independent, so are {An }n≥1 . By the second Borel-Cantelli lemma
P (lim An ) = 1.
But ω ∈ lim An ⇒ lim supn→∞ |Xn | ≥ #0 and hence
P (limn→∞ |Xn | = 0) = 0. This contradicts the hypothesis that
P (lim supn→∞ |Xn | = 0) = 1.
!
Definition 7.2.2: The tail σ-algebra of a sequence of random variables
{Xn }n≥1 on a probability space (Ω, F, P ) is
T =
∞
,
n=1
σ1{Xj : j ≥ n}2
and any A ∈ T is called a tail event. Further, any T -measurable random
variable is called a tail random variable (w.r.t. {Xn }n≥1 ).
Tail events are determined by the behavior of the sequence {Xn }n≥1 for
large n and they remain unchanged if any finite subcollection of the Xn ’s
are dropped or replaced by another finite set of random variables. Events
such as {lim supn→∞ Xn < x} or {limn→∞ Xn = x}, x ∈ R, belong to T .
A remarkable result of Kolmogorov is that for any sequence of independent
random variables, any tail event has probability zero or one.
Theorem 7.2.4: (Kolmogorov’s 0-1 law ). Let {Xn }n≥1 be a sequence of
independent random variables on a probability space (Ω, F, P ) and let T be
the tail σ-algebra of {Xn }n≥1 . Then P (A) = 0 or 1 for all A ∈ T .
Remark 7.2.2: Note that in Proposition 7.2.3, the event A ≡
{limn→∞ Xn = 0} belongs to T , and hence, by the above theorem,
P (A) = 0 or 1. Thus, proving that P (A) )= 1 is equivalent to proving
P (A) = 0. Kolmogorov’s 0-1 law only restricts the possible values of tail
events like A to 0 or 1, while the Borel-Cantelli lemmas (Theorem 7.2.2)
provide a tool for ascertaining whether the value is either 0 or 1. On the
other hand, note that Theorem 7.2.2 requires only pairwise independence
of {An }n≥1 but Kolmogorov’s 0-1 law requires the full independence of the
sequence {Xn }n≥1 .
226
7. Independence
Proof: For n ≥ 1, define the σ-algebras Fn and Tn by Fn =
σ1{X1 , . . . , Xn }2 and Tn = σ1{Xn+1 , Xn+2 , . . .}2. Since Xn , n ≥ 1 are
independent,
Fn is independent of Tn for all n ≥ 1. Since, for each n,
+∞
T = m=n Fm is a sub σ-algebra
#∞of Tn , this implies Fn is independent of
T for all n ≥ 1 and hence A ≡ n=1 Fn is independent of T . It is easy to
check that A is an algebra (and hence, is a π-system). Hence, by Proposition 7.1.1, σ1A2 is independent of T . Since T is also a sub-σ-algebra of
σ1A2 = σ1{Xn : n ≥ 1}2, this implies T is independent of itself. Hence for
any B ∈ T ,
P (B ∩ B) = P (B) · P (B),
which implies P (B) = 0 or 1.
!
Definition 7.2.3: Let (Ω, F, P ) be a probability space and let X : Ω → R̄
be a 1F, B(R̄)2-measurable mapping. (Recall the definition of B(R̄) from
(2.1.4)). Then X is called an extended real-valued random variable or an
R̄-valued random variable.
Corollary 7.2.5: Let T be the tail σ-algebra of a sequence of independent random variables {Xn }n≥1 on (Ω, F, P ) and let X be a 1T , B(R̄)2measurable R̄-valued random variable from Ω to R̄. Then, there exists c ∈ R̄
such that
P (X = c) = 1.
Proof: If P (X ≤ x) = 0 for all x ∈ R, then P (X = +∞) = 1. Hence,
suppose that B ≡ {x ∈ R : P (X ≤ x) )= 0} )= ∅. Since {X ≤ x} ∈ T for all
x ∈ R, P (X ≤ x) = 1 for all x ∈ B. Define c = inf{x : x ∈ B}. Check that
P (X = c) = 1.
!
An immediate implication of Corollary 7.2.5 is that for any sequence
of independent random variables {Xn }n≥1 , the R̄-valued random variables
lim supn→∞ Xn and lim inf n→∞ Xn are degenerate, i.e., they are constants
w.p. 1.
Example 7.2.2: Let {Xn }n≥1 be a sequence of independent random
2
variables on (Ω, F, P ) with EXn = 0,
√n =−11 for all2 n ≥ 1. Let
& x EX
Sn = X1 + . . . + Xn , n ≥ 1 and Φ(x) = −∞ ( 2π) exp(−y /2)dy, x ∈ R.
√
If P (Sn ≤ nx) → Φ(x) for all x ∈ R, then
Sn
lim sup √ = +∞
n
n→∞
a.s.
(2.3)
√
To show this, let S = lim supn→∞ Sn / n. First it will be shown that
S is 1T , B(R̄)2-measurable.
For any m ≥ 1, define √
the variables Tm,n =
√
(Xm+1 + . . . + Xn )/ n and Sm,n = (X1 + . . . + Xm )/ n, n > m. Note that
for any fixed m ≥ 1, Tm,n is σ1Xm+1 , . . .2-measurable and Sm,n (ω) → 0 as
7.3 Problems
227
n → ∞ for all ω ∈ Ω. Hence, for any m ≥ 1,
S
=
lim sup (Sm,n + Tm,n )
=
lim sup Tm,n
n→∞
n→∞
is σ1X
, Xm+2 , . . .2-measurable. Thus, S is measurable with respect to
+m+1
∞
T = m=1 σ1Xm+1 , Xm+2 , . . .2. Hence, by Theorem 7.2.4, P (S = +∞) ∈
{0, 1}.
If possible, now suppose that P (S = +∞) = 0. Then, by Corollary√7.2.5,
there exists c ∈ [−∞, ∞) such that P (S = c) = 1. Let An = {Sn > nx},
n ≥ 1, with x = c + 1. Then,
0 < 1 − Φ(x)
=
≤
lim P (An )
- *
.
lim P
Am
n→∞
n→∞
= P
m≥n
-,
∞ *
∞
n=1 m=n
Am
.
(
'S
n
= P √ > x i.o.
n
≤ P (S ≥ c + 1) = 0.
This shows that P (S = +∞) must be 1. Also see Problem 7.16.
Remark 7.2.3: It will be shown in Chapter 11 that if {Xi }i≥1 are independent and identically distributed (iid) random variables with EX1 = 0
and EX12 = 1, then
'S
(
n
P √ ≤ x → Φ(x) for all x in R.
n
(This is known as the central limit theorem.) Indeed, a stronger result
known as the law of the iterated logarithm holds, which says that for such
{Xi }i≥1 ,
Sn
= +1, w.p. 1.
lim sup √
2n log log n
n→∞
7.3 Problems
7.1 Give an example of three events A1 , A2 , A3 on some probability space
such that they are pairwise independent but not independent.
(Hint: Consider iid random variables X1 , X2 , X3 with P (X1 = 1) =
228
7. Independence
1
2
= P (X1 = 0) and the events A1 = {X1 = X2 }, A2 = {X1 = X3 },
A3 = {X3 = X1 }.)
7.2 Let {Xα : α ∈ A} be a collection of independent random variables on
some probability space (Ω, F, P ). For any subset B ⊂ A, let XB ≡
{Xα : α ∈ B}.
(a) Let B be a nonempty proper subset of A. Show that the collections XB and XB c are independent, i.e., the σ-algebras σ1XB 2
and σ1XB c 2 are independent w.r.t. P .
(b) Let {Bγ : γ ∈ Γ} be a partition of A by nonempty proper subsets
Bγ . Show that the family of σ-algebras {σ1XBγ 2 : γ ∈ Γ} are
independent w.r.t. P .
7.3 Let X1 , X2 be iid standard exponential random variables, i.e.,
%
e−x dx, A ∈ B(R).
P (X1 ∈ A) =
A∩(0,∞)
Let Y1 = min(X1 , X2 ) and Y2 = max(X1 , X2 ) − Y1 . Show that Y1 and
Y2 are independent. Generalize this to the case of three iid standard
exponential random variables.
7.4 Let Ω = (0, 1), F = B((0, 1)), the Borel σ-algebra on (0,1) and P
be the Lebesgue measure on (0, 1). For each ω ∈ (0, 1), let ω =
$∞ Xi (ω)
be the nonterminating binary expansion of ω.
i=1 2i
(a) Show that {Xi }i≥1 are iid Bernouilli ( 12 ) random variables, i.e.,
is P (X1 = 0) = 12 = P (X1 = 1).
(Hint: Let si ∈ {0, 1}, i = 1, 2, . . . , k, k ∈ N. Show that the set
{ω : 0 < ω < 1, Xi (ω) = si , 1 ≤ i ≤ k} is an interval of length
2−k .)
(b) Show that
Y1
Y2
=
=
∞
)
X2i−1
i=1
∞
)
i=1
2i
X2i
2i
(3.1)
(3.2)
are independent Uniform (0,1) random variables.
(c) Using the fact that the set N × N of lattice points (m, n) is in
one to one correspondence with N itself, construct a sequence
{Yi }i≥1 of iid Uniform (0,1) random variables such that for each
j, Yj is a function of {Xi }i≥1 .
7.3 Problems
229
(d) For any cdf F , show that the random variable X(ω) ≡ F −1 (ω)
has cdf F , where
F −1 (u) = inf{x : F (x) ≥ u}
for
0 < u < 1.
(3.3)
(e) Let {Fi }i≥1 be a sequence of cdfs on R. Using (c), construct a
sequence {Zi }i≥1 of independent random variables on (Ω, F, P )
such that Zi has cdf Fi , i ≥ 1.
$∞
i
(f) Show that the cdf of the random variable W ≡ i=1 2X
3i is the
Cantor function (cf. Section 4.5).
(g) Let p > 1 be a positive integer. For each ω ∈ (0, 1) let ω ≡
$∞ Vi (ω)
be the nonterminating p-nary expansion of ω. Show
i=1 pi
that {Vi }i≥1 are iid and determine the distribution of V1 .
7.5 Let {Xi }i≥1 be a Markov chain with state space S = {0, 1} and
transition probability matrix
.
q 0 p0
where pi = 1 − qi , 0 < qi < 1, i = 0, 1 .
Q=
p1 q 1
Let τ1 = min{j : Xj = 0} and τk+1 = min{j : j > τk , Xj = 0},
k = 1, 2, . . .. Note that τk is the time of kth visit to the state 0.
(a) Show that {τk+1 − τk : k ≥ 1} are iid random variables and
independent of τ1 .
(b) Show that
Pi (τ1 < ∞) = 1
and hence Pi (τk < ∞) = 1
for all k ≥ 2, i = 0, 1 where Pi denotes the probability distribution with X1 = i w.p. 1.
$∞
(Hint: Show that k=1 P (τ1 > k | X1 = i) < ∞ for i = 0, 1
and use the Borel-Cantelli lemma.)
(c) Show also that Ei (eθ0 τ1 ) < ∞ for some θ0 > 0, i = 0, 1, where
Ei denotes the expectation under Pi .
7.6 Let X1 and X2 be independent random variables.
(a) Show that for any p > 0,
p
E|X1 + X2 |p < ∞ iff E|X1 | < ∞, E|X2 |p < ∞.
Show that this is false if X1 and X2 are not independent.
(Hint: Use Fubini’s theorem to conclude that E|X1 + X2 |p < ∞
implies that E|X1 + x2 |p < ∞ for some x2 and hence E|X1 |p <
∞.)
230
7. Independence
(b) Show that if E(X12 + X22 ) < ∞, then
Var(X1 + X2 ) = Var(X1 ) + Var(X2 ).
(3.4)
Show by an example that (3.4) need not imply the independence
of X1 and X2 . Show also that if X1 and X2 take only two values
each and (3.4) holds, then X1 and X2 are independent.
7.7 Let X1 and X2 be two random variables on a probability space
(Ω, F, P ).
(a) Show that, if
"
!
P X1 ∈ (a1 , b1 ), X2 ∈ (a2 , b2 )
" !
"
!
= P X1 ∈ (a1 , b1 ) P X2 ∈ (a2 , b2 )
(3.5)
for all a1 , b1 , a2 , b2 in a dense set D in R, then X1 and X2 are
independent.
(Hint: Show that (3.5) implies that the joint cdf of (X1 , X2 ) is
the product of the marginal cdfs of X1 and X2 and use Corollary
7.1.2.)
(b) Let fi : R → R, i = 1, 2 be two one-one functions such that
both fi and fi−1 are Borel measurable, i = 1, 2. Show that X1
and X2 are independent iff f1 (X1 ) and f2 (X2 ) are independent.
Conclude that X1 and X2 are independent iff eX1 and eX2 are
independent.
7.8 (a) Let X1 and X2 be two independent bounded random variables.
Show that
E(p1 (X1 )p2 (X2 )) = (Ep1 (X1 ))(Ep2 (X2 ))
(3.6)
where p1 (·) and p2 (·) are polynomials.
(b) Show that if X1 and X2 are bounded random variables and (3.6)
holds for all polynomials p1 (·) and p2 (·), then X1 and X2 are independent.
(Hint: Use the facts that (i) continuous functions on a bounded
closed interval [a, b] can be approximated uniformly by polynomials, and (ii) for any interval (c, d) ⊂ [a, b], any random variable
X and # > 0, there exists a continuous function f on [a, b] such
that E|f (X) − I(c,d) (X)| < #, provided P (X = c or d) = 0.)
7.9 Let {Xn }n≥1 be a sequence of iid random variables on a probability
space (Ω, F, P
$).∞Let R = R(ω) be the radius of convergence of the
power series n=1 Xn rn .
7.3 Problems
231
(a) Show that R is a tail random variable.
(Hint: Note that
R=
1
lim sup |Xn |1/n
.)
n→∞
(b) Show that if E(log |X1 |)+ = ∞, then R = 0 w.p. 1. and if
E(log |X1 |)+ < ∞, then R ≥ 1 w.p. 1.
(Hint: Apply the Borel-Cantelli lemmas to An = {|Xn | > λn }
for each λ > 1.)
7.10 Let {An }n≥1 be a sequence of events in (Ω, F, P ) such that
∞
)
n=1
P (An ∩ Acn+1 ) < ∞
and limn→∞ P (An ) = 0. Show that
P (lim sup An ) = 0.
n→∞
Show also+ that limn→∞ P (An )
limn→∞ P ( j≥n Aj ) = 0.
=
0 can be replaced by
(Hint: Let Bn = An ∩ Acn+1 , n ≥ 1, B = lim Bn , A = lim An . Show
that A ∩ B c ⊂ lim An .)
7.11 $
For any nonnegative random variable X, show that E|X| < ∞ iff
∞
n=1 P (|X| > #n) < ∞ for every # > 0.
7.12 Let {Xi }i≥1 be a sequence of pairwise independent and identically
distribution random variables.
(a) Show that limn→∞
Xn
n
= 0 w.p. 1 iff E|X1 | < ∞.
(Hint: E|X1 | < ∞ ⇐⇒
∞
$
n=1
P (|Xn | > #n) < ∞ for all # > 0.)
(b) Show that E(log |X1 |)+ < ∞ iff
(1/n
'
→1
|Xn |
w.p. 1.
7.13 Let {Xi }i≥1 be a sequence of identically distributed random variables
and let Mn = max{|Xj | : 1 ≤ j ≤ n}.
232
7. Independence
(a) If E|X1 |α < ∞ for some α ∈ (0, ∞), then show that
Mn
→ 0 w.p. 1.
n1/α
(3.7)
(Hint: Fix # > 0. Let An = {|Xn | > #n1/α }. Apply the first
Borel-Cantelli lemma.)
(b) Show that if {Xi }i≥1 are iid satisfying (3.7) for some α > 0,
then E|X1 |α < ∞.
(Hint: Apply the second Borel-Cantelli lemma.)
7.14 Let X1 and X2 be independent random variables with distributions
µ1 and µ2 . Let Y = (X1 + X2 ).
(a) Show that the distribution µ of Y is the convolution µ1 ∗ µ2 as
defined by
%
(µ1 ∗ µ2 )(A) =
µ1 (A − x)µ2 (dx)
R
(cf. Problem 5.12).
(b) Show that if X1 has a continuous distribution then so does Y .
(c) Show that if X1 has an absolutely continuous distribution then
so does Y and that the density function of Y is given by
%
' dµ (
(x) ≡ fY (x) = fX1 (x − u)µ2 (du)
dm
! 1"
where fX1 (x) ≡ dµ
dm (x), the probability density of X1 .
(d) Y has a discrete distribution iff both X1 and X2 are discrete.
7.15 (AR(1) series). Let {Xn }n≥0 be a sequence of random variables such
that for some ρ ∈ R,
Xn+1 = ρXn + #n+1 , n ≥ 0
where {#n }n≥1 are independent and independent of X0 .
(a) Show that if |ρ| < 1 and E(log |#1 |)+ < ∞, then
X̂n ≡
to a limit X̂∞ , say.
n
)
j=0
ρj #j+1
converges w.p. 1
7.3 Problems
233
(b) Show that under the hypothesis of (a), for any bounded continuous function h : R → R and for any distribution of X0
Eh(Xn ) → Eh(X̂∞ ).
(Hint: Show that for each n ≥ 1, Xn − ρn X0 and X̂n have the
same distribution.)
7.16 Establish the following generalization of Example 7.2.2. Let {Xn }n≥1
be a sequence of independent random variables on some probability
space (Ω, F, P ). Suppose there exists sequences {an }n≥1 , {xn }n≥1 ,
such that an ↑ ∞, xn ↑ ∞ and for each k < ∞, limn P (Sn ≤ an xk ) ≡
F (xk ) exists and is < 1. Show that lim supn→∞ Sann = +∞ a.s.
7.17 (a) Let {Xi }ni=1 be random variables on a probability space
(Ω, F, P ) and let P (X1 , X2 , . . . , Xn )−1 (·) be dominated by
the product measure µ × µ × · · · × µ where µ is a σfinite measure on (R, B(R)) with Radon-Nikodym derivative
f (x1 , x2 , . . . , xn ). Show<
that {Xi }ni=1 are independent w.r.t. P
n
iff f (x1 , x2 , . . . , xn ) ≡ i=1 hi (xi ) for all (x1 , x2 , . . . , xn ) ∈ R
where for each i, hi : R → R is Borel measurable.
(b) Use (a) to show that if (X1 , X2 ) has an absolutely continuous
distribution with density f (x1 , x2 ) then X1 and X2 are independent iff
f (x1 , x2 ) = f1 (x1 )f2 (x2 )
where fi (·) is the density of Xi .
(c) Using (a) or otherwise conclude that if Xi , i = 1, 2 are both
discrete random variables then X1 and X2 are independent iff
P (X1 = a, X2 = b) = P (X1 = a)P (X2 = b)
for all a and b.
7.18 Let {Xn }n≥1 be a sequence of independent random variables such
that for n ≥ 1, P (Xn = 1) = n1 = 1 − P (Xn = 0). Show that
Xn −→p 0 but not w.p. 1.
7.19 Let (Ω, F, P ) be a probability space.
(a) Suppose there exists events A1 , A2 , . . . , Ak that are independent
with 0 < P (Ai ) < 1, i = 1, 2, . . . , k. Show that |Ω| ≥ 2k where
for any set A, |A| is the number of elements in A.
(b) Let {Xi }ki=1 be independent random variables such that Xi takes
n distinct values with positive probability. Show that |Ω| ≥
<i k
j=1 ni .
234
7. Independence
(c) Show that there exists a probability space (Ω, F, P ) such that
|Ω| = 2k and k independent events A1 , A2 , . . . , Ak in F such
that 0 < P (Ai ) < 1, i = 1, 2, . . . , k.
7.20 (a) Let Ω ≡ {(x1 , x2 ) : x1 , x2 ∈ R, x21 + x22 ≤ 1} be the unit disc in
R2 . Let F ≡ B(Ω), the Borel σ-algebra in Ω and P = normalized
Lebesgue measure, i.e., P (A) ≡ m(A)
π , A ∈ F. For ω = (x1 , x2 )
let
X1 (ω) = x1 , X2 (ω) = x2 ,
!
"
and R(ω), θ(ω) be the polar representation of ω. Show that
the random variables R and θ are independent but X1 and X2
are not.
(b) Formulate and establish an extension of the above to the unit
sphere in R3 .
7.21 Let X1 , X2 , X3 be iid random variables such that P (X1 = x) = 0 for
all x ∈ R.
(a) Show that for any permutation σ of (1,2,3)
'
(
1
P Xσ(1) > Xσ(2) > Xσ(3) = .
3!
(b) Show that for any i = 1, 2, 3
'
( 1
P Xi = max Xj = .
1≤j≤3
3
(c) State and prove a generalization of (a) and (b) to random variables {Xi : 1 ≤ i ≤ n} such that the joint distribution of
(X1 , X2 , . . . , Xn ) is the same as that of (Xσ(1) , Xσ(2) , . . . , Xσ(n) )
for any permutation σ of {1, 2, . . . , n} and P (X1 = x) = 0 for
all x ∈ R.
7.22 Let f , g : R → R be monotone nondecreasing. Show that for any
random variable X
Ef (X)g(X) ≥ Ef (X)Eg(X)
provided all the expectations exist.
(Hint:
of X" with same distribution. Note that
! Let Y be independent
"!
Z = f (X) − f (Y ) g(X) − g(Y ) ≥ 0 w.p. 1.)
7.23 Let X1 , X2 , . . . , Xn be random variables on some probability space
(Ω, F, P ). Show that if P (X1 , X2 , . . . , Xn )−1 (·) ; mn , the Lebesgue
measure on Rn then for each i, P Xi−1 (·) ; m. Give an example to show that the converse is not true. Show also that if
7.3 Problems
235
P (X1 , X2 , . . . , Xn )−1 (·) ; mn then {X1<
, X2 , . . . , Xn } are indepenn
dent iff f(X1 ,X2 ,...,Xn ) (x1 , x2 , . . . , xn ) = i=1 fXi (xi ) where the f ’s
are the respective pdfs.
8
Laws of Large Numbers
When measuring a physical quantity such as the mass of an object, it
is commonly believed that the average of several measurements is more
reliable than a single one. Similarly, in applications of statistical inference
when estimating a population mean µ, a random sample {X1 , X2 , . . . , Xn }
of $
size n is drawn from the population, and the sample average X̄n ≡
n
1
i=1 Xi is used as an estimator for the parameter µ. This is based on
n
the idea that as n gets large, X̄n will be close to µ in some suitable sense. In
many time-evolving physical systems {f (t) : 0 ≤ t < ∞}, where f (t) is an
&T
element in the phase space S, “time averages” of the form T1 0 h(f (t))dt
(where h is a bounded function
on S) converge, as T gets large, to the
&
“space average” of the form S h(x)π(dx) for some appropriate measure π
on S. The above three are examples of a general phenomenon known as the
law of large numbers. This chapter is devoted to a systematic development
of this topic for sequences of independent random variables and also to
some important refinements of the law of large numbers.
8.1 Weak laws of large numbers
Let {Zn }n≥1 be a sequence of random variables on a probability space
(Ω, F, P ). Recall that the sequence {Zn }n≥1 is said to converge in probability to a random variable Z if for each # > 0,
lim P (|Zn − Z| ≥ #) = 0.
n→∞
(1.1)
238
8. Laws of Large Numbers
This is written as Zn −→p Z. The sequence {Zn }n≥1 is said to converge
with probability one or almost surely (a.s.) to Z if there exists a set A in F
such that
P (A) = 1 and for all ω in A,
lim Zn (ω) = Z(ω).
n→∞
(1.2)
This is written as Zn → Z w.p. 1 or Zn → Z a.s.
Definition 8.1.1: A sequence {Xn }n≥1 of random variables on a probability space (Ω, F, P ) is said to obey the weak law of large numbers (WLLN)
with normalizing sequences of real numbers {an }n≥1 and {bn }n≥1 if
Sn − an
−→p 0
bn
where Sn =
$n
i=1
as
n→∞
(1.3)
Xi for n ≥ 1.
The following theorem says that if {Xn }n≥1 is a sequence of iid random
variables with EX12 < ∞, then it obeys the weak law of large numbers with
an = nEX1 and bn = n.
Theorem 8.1.1: Let {Xn }n≥1 be a sequence of iid random variables such
that EX12 < ∞. Then
X̄n ≡
X1 + . . . + Xn
−→p EX1 .
n
(1.4)
Proof: By Chebychev’s inequality, for any # > 0,
P (|X̄n − EX1 | > #) ≤
where σ 2 = Var(X1 ). Since
σ2
n$2
Var(X̄n )
1 σ2
,
=
·
#2
#2 n
→ 0 as n → ∞, (1.4) follows.
(1.5)
!
Corollary 8.1.2: Let {Xn }n≥1 be a sequence of iid Bernoulli (p) random
variables, i.e., P (X1 = 1) = p = 1 − P (X1 = 0). Let
p̂n =
#{i : 1 ≤ i ≤ n, Xi = 1}
, n ≥ 1,
n
(1.6)
where for a finite set A, #A denotes the number of elements in A. Then
p̂n −→p p.
Proof: Check that EX1 = p and p̂n = X̄n .
!
This says that one can estimate the probability p of getting a “head” of
a coin by tossing it n times and calculating the proportion of “heads.” This
is also the basis of public opinion polls. Since the proof of Theorem 8.1.1
depended only on Chebychev’s inequality, the following generalization is
immediate (Problem 8.1).
8.1 Weak laws of large numbers
239
Theorem 8.1.3: Let {Xn }n≥1 be a sequence of random variables on a
probability space such that
(i) EXn2 < ∞
for all
n ≥ 1,
(ii) EXi Xj = (EXi )(EXj ) for all i )= j
(i.e., {Xn }n≥1 are uncorrelated),
$n
(iii) n12 i=1 σi2 → 0 as n → ∞, where σi2 = Var(Xi ), i ≥ 1.
Then
where µ̄n ≡
1
n
$n
i=1
EXi .
X̄n − µ̄n −→p 0
(1.7)
the above theorem
Corollary 8.1.4: Let {Xn }n≥1 satisfy (i) and (ii) of $
n
and let the sequence {σn2 }n≥1 be bounded. Let µ̄n ≡ n1 i=1 EXi → µ as
p
n → ∞. Then X̄n −→ µ.
An Application to Real Analysis
Let f : [0, 1] → R be a continuous function. K. Weierstrass showed that
f can be approximated uniformly over [0, 1] by polynomials. S.N. Bernstein
constructed a special class of such polynomials. A proof of Bernstein’s result
using the WLLN (Theorem 8.1.1) is given below.
Theorem 8.1.5: Let f : [0, 1] → R be a continuous function. Let
n
' r (-n.
)
xr (1 − x)n−r , 0 ≤ x ≤ 1
f
Bn,f (x) ≡
r
n
r=0
(1.8)
be the Bernstein polynomial of order n for the function f . Then
3
4
lim sup |f (x) − Bn,f (x)| : 0 ≤ x ≤ 1 = 0.
n→∞
Proof: Since f is continuous on the closed and bounded interval [0, 1], it
is uniformly continuous and hence for any # > 0, there exists a δ$ > 0 such
that
(1.9)
|x − y| < δ$ ⇒ |f (x) − f (y)| < #.
Fix x in [0, 1]. Let {Xn }n≥1 be a sequence of iid Bernoulli (x) random
variables. Let p̂n be as in (1.6). Then Bn,f (x) = Ef (p̂n ). Hence,
|f (x) − Bn,f (x)|
≤ E|f (p̂n ) − f (x)|
3
4
= E |f (p̂n ) − f (x)|I(|p̂n − x| < δ$ )
3
4
+ E |f (p̂n ) − f (x)|I(|p̂n − x| ≥ δ$ )
≤ # + 27f 7P (|p̂n − x| ≥ δ$ )
240
8. Laws of Large Numbers
where 7f 7 = sup{|f (x)| : 0 ≤ x ≤ 1}. But by Chebychev’s inequality,
1
Var(p̂n )
δ$2
1
x(1 − x)
≤
for all 0 ≤ x ≤ 1.
=
2
nδ$
4nδ$2
4
3
1
Thus, sup |f (x) − Bn,f (x)| : 0 ≤ x ≤ 1 ≤ # + 27f 7 4nδ
2 . Letting n → ∞
!
first and then # ↓ 0 completes the proof.
!
P (|p̂n − x| ≥ δ$ )
≤
8.2 Strong laws of large numbers
.
Definition 8.2.1: A sequence {Xn }n≥1 of random variables on a probability space (Ω, F, P ) is said to obey the strong law of large numbers (SLLN)
with normalizing sequences of real numbers {an }n≥1 and {bn }n≥1 if
where Sn =
$n
i=1
Sn − an
→0
bn
as
n→∞
w.p. 1,
(2.1)
Xi for n ≥ 1.
The following theorem says that if {Xn }n≥1 is a sequence of iid random
variables with EX14 < ∞, then the strong law of large numbers holds with
an = nEX1 and bn = n. This result is referred to as Borel’s SLLN.
Theorem 8.2.1: (Borel’s SLLN ). Let {Xn }n≥1 be a sequence of iid random variables such that EX14 < ∞. Then
X̄n ≡
X1 + X2 + . . . + Xn
→ EX1
n
w.p. 1.
(2.2)
Proof: Fix # > 0 and let An ≡ {|X̄n − EX1 | ≥ #}, n ≥ 1. To establish
(2.2), by Proposition 7.2.3 (a), it suffices to show that
∞
)
n=1
By Markov’s inequality
P (An ) < ∞.
(2.3)
E|X̄n − EX1 |4
.
(2.4)
#4
Let Yi = Xi − EX1 for i ≥ 1. Since the Xi ’s are independent, it is easy to
check that
=- n
>
) .4
1
4
E|X̄n − EX1 | =
E
Yi
n4
i=1
P (An ) ≤
8.2 Strong laws of large numbers
=
=
241
(
1'
4
2 2
nEY
+
3n(n
−
1)(EY
)
1
1
n4
−2
O(n ).
By (2.4) this implies (2.3).
!
The following two results are easy consequences of the above theorem.
Corollary 8.2.2: Let {Xn }n≥1 be a sequence of iid random variables that
are bounded, i.e., there exists a C < ∞ such that P (|X1 | ≤ C) = 1. Then
X̄n → EX1
w.p. 1.
Corollary 8.2.3: Let {Xn }n≥1 be a sequence of iid Bernoulli(p) random
variables. Then
p̂n ≡
#{i : 1 ≤ i ≤ n, Xi = 1}
→ p w.p. 1.
n
(2.5)
An application of the above result yields the following theorem on the
uniform convergence of the empirical cdf to the true cdf.
Theorem 8.2.4: (Glivenko-Cantelli). Let {Xn }n≥1 be a sequence of iid
random variables with a common cdf F (·). Let Fn (·), the empirical cdf based
on {X1 , X2 , . . . , Xn }, be defined by
n
Fn (x) ≡
Then,
1)
I(Xj ≤ x),
n j=1
˜ n ≡ sup |Fn (x) − F (x)| → 0
∆
x ∈ R.
w.p. 1.
(2.6)
(2.7)
x
Remark 8.2.1: Note that by applying Corollary 8.2.3 to the sequence of
Bernoulli random variables {Yn ≡ I(Xn ≤ x)}n≥1 , one may conclude that
Fn (x) → F (x) w.p. 1 for each fixed x. So the main thrust of this theorem
is the uniform convergence on R of Fn to F w.p. 1. It can be shown that
(2.7) holds for sequences {Xn }n≥1 that are identically distributed and only
pairwise independent. The proof is based on Etemadi’s SLLN (Theorem
8.2.7) below.
The proof of Theorem 8.2.4 makes use of the following two lemmas.
Lemma 8.2.5: (Scheffe’s theorem: A generalized version). Let (Ω, F, µ) be
a measure space and {fn }n≥1 and f be nonnegative& µ-integrable
& functions
→
f
a.e.
(µ)
and
(ii)
f
dµ
→
f dµ. Then
such
that
as
n
→
∞,
(i)
f
n
n
&
|f − fn |dµ → 0 as n → ∞.
242
8. Laws of Large Numbers
Proof: See Theorem 2.5.4.
!
For any bounded monotone function H: R → R, define
H(∞) ≡ lim H(x),
x↑∞
H(−∞) ≡ lim H(x).
x↓−∞
Lemma 8.2.6: (Polyā’s theorem). Let {Gn }n≥1 and G be a collection of
bounded nondecreasing functions on R → R such that G(·) is continuous
on R and
Gn (x) → G(x) for all x in D ∪ {−∞, +∞},
where D is dense in R. Then ∆n ≡ sup{|Gn (x) − G(x)| : x ∈ R} → 0. That
is, Gn → G uniformly on R.
Proof: Fix # > 0. By the definitions of G(∞) and G(−∞), there exist C1
and C2 in D such that
G(C1 ) − G(−∞) < #,
and G(∞) − G(C2 ) < #.
(2.8)
Since G(·) is continuous, it is uniformly continuous on [C1 , C2 ] and so there
exists a δ > 0 such that
x, y ∈ [C1 , C2 ], |x − y| < δ ⇒ |G(x) − G(y)| < #.
(2.9)
Also, there exist points a1 = C1 < a2 < . . . < ak = C2 , 1 < k < ∞, in D
such that
max{(ai+1 − ai ) : 1 ≤ i ≤ k − 1} < δ.
Let a0 = −∞, ak+1 = ∞. By the convergence of Gn (·) to G(·), on D ∪
{−∞, ∞},
∆n1
≡ max{|Gn (ai ) − G(ai )| : 0 ≤ i ≤ k + 1} → 0
(2.10)
as n → ∞. Now note that for any x in [ai , ai+1 ], 1 ≤ i ≤ k − 1, by the
monotonicity of Gn (·) and G(·), and by (2.9) and (2.10),
Gn (x) − G(x)
≤ Gn (ai+1 ) − G(ai )
≤ Gn (ai+1 ) − G(ai+1 ) + G(ai+1 ) − G(ai )
≤ ∆n1 + #,
and similarly,
Gn (x) − G(x) ≥ −∆n1 − #.
Thus
sup{|Gn (x) − G(x)| : a1 ≤ x ≤ ak } ≤ ∆n1 + #.
(2.11)
8.2 Strong laws of large numbers
243
For x < a1 , by (2.8) and (2.10),
|Gn (x) − G(x)|
|Gn (x) − Gn (−∞)| + |Gn (−∞) − G(−∞)|
+ |G(−∞) − G(x)|
≤ (Gn (a1 ) − Gn (−∞)) + |Gn (−∞) − G(−∞)| + #
≤
≤
|Gn (a1 ) − G(a1 )| + |G(a1 ) − G(−∞)|
≤
3∆n1 + 2#.
+ 2|Gn (−∞) − G(−∞)| + #
Similarly, for x > ak ,
|Gn (x) − G(x)| ≤ 3∆n1 + 2#.
Combining the above with (2.11) yields
∆n ≤ 3∆n1 + 2#.
By (2.10),
lim sup ∆n ≤ 2#,
n→∞
and # > 0 being arbitrary, the proof is complete.
!
˜ n = supx∈Q |Fn (x) − F (x)|
Proof of Theorem 8.2.4: First note that ∆
and hence, it is a random variable. Let B ≡ {bj : j ∈ J} be the set of jump
discontinuity points of F with the$
corresponding jump sizes {pj : j ∈ J},
where J is a subset of N. Let p = j∈J pj .
Note that
n
Fn (x)
=
1)
I(Xi ≤ x)
n i=1
n
=
Fnd (x) + Fnc (x), say.
$
Then, Fnd (x) = j∈J p̂nj I(bj ≤ x), where
=
n
1)
1)
I(Xi ≤ x, Xi ∈ B) +
I(Xi ≤ x, Xi )∈ B)
n i=1
n i=1
p̂nj =
(2.12)
#{i : 1 ≤ i ≤ n, Xi = bj }
.
n
$
Let p̂n = j∈J p̂nj = n1 · #{i : 1 ≤ i ≤ n, Xi ∈ B}. By Corollary 8.2.3, for
each j ∈ J,
p̂nj → pj w.p. 1 and p̂n → p w.p. 1.
Since B is countable, there exists a set A$
= 1 and
0 in F such that P (A0 )$
for all ω in A0 , p̂nj → pj for all j ∈ J and j∈J p̂nj = p̂n → p = j∈J pj .
244
8. Laws of Large Numbers
By Lemma 8.2.5 (applied with µ being the counting measure on the set J),
it follows that for ω in A0 ,
)
j∈J
Let Fd (x) ≡
$
j∈J
|p̂nj − pj | → 0.
(2.13)
pj I(bj ≤ x), x ∈ R. Then,
sup |Fnd (x) − Fd (x)| ≤
x∈R
)
j∈J
|p̂nj − pj |,
(2.14)
which → 0 as n → ∞ for all ω in A0 , by (2.13).
Next let,
Fc (x) ≡ F (x) − Fd (x), x ∈ R.
Then, it is easy to check that, Fc (·) is continuous and nondecreasing on R,
Fc (−∞) = 0 and Fc (∞) = 1 − p.
Again, by Corollary 8.2.3, there exists a set A1 in F such that P (A1 ) = 1
and for all ω in A1 ,
Fnc (x) → Fc (x)
for all rational x in R and
Fnc (∞) ≡ 1 − p̂n → 1 − p = Fc (∞).
Also, Fnc (−∞) = 0 = Fc (−∞). So by Lemma 8.2.6, with D = Q, for ω in
A1 ,
sup |Fnc (x) − Fc (x)| → 0 as n → ∞.
(2.15)
x∈R
Since P (A0 ∩ A1 ) = 1, the theorem follows from (2.12)–(2.15).
!
Borel’s SLLN for iid random variables requires that E|X1 |4 < ∞. Kolmogorov (1956) improved on this significantly by using his “3-series”
theorem and reduced the moment condition to E|X1 | < ∞. More recently, Etemadi (1981) N. improved this further by assuming only that
the {Xn }n≥1 are pairwise independent and identically distributed with
E|X1 | < ∞. More precisely, he proved the following.
Theorem 8.2.7: (Etemadi’s SLLN ). Let {Xn }n≥1 be a sequence of
pairwise independent and identically distributed random variables with
E|X1 | < ∞. Then
X̄n → EX1 w.p. 1.
(2.16)
Proof: The main steps in the proof are
(I) reduction to the nonnegative case,
8.2 Strong laws of large numbers
245
(II) proof of convergence of Ȳn along a geometrically growing subsequence
using the Borel-Cantelli lemma and Chebychev’s inequality, where
Ȳn is the average of certain truncated versions of X1 , . . . , Xn , and
extending the convergence from the geometric subsequence to the
full sequence.
Step I: Since the {Xn }n≥1 are pairwise independent and identically distributed with E|X1 | < ∞, it follows that {Xn+ }n≥1 and {Xn− }n≥1 are both
sequences of pairwise independent and identically distributed nonnegative
random variables with EX1+ < ∞ and EX1− < ∞. Also, since
-
n
1)
Xi =
X̄n =
n i=1
n
1) +
X
n i=1 i
.
−
-
.
n
1) −
X ,
n i=1 i
it is enough to prove the theorem under the assumption that the Xi ’s are
nonnegative.
Step II: Now let Xi ’s be nonnegative and let
Yi = Xi I(Xi ≤ i), i ≥ 1.
Then,
∞
)
i=1
P (Xi )= Yi )
=
=
=
∞
)
i=1
∞
)
P (Xi > i)
P (X1 > i) ≤
i=1
% ∞
0
∞ %
)
i=1
i
P (X1 > t)dt
i−1
P (X1 > t)dt
= EX1 < ∞.
Hence, by the Borel-Cantelli lemma,
P (Xi )= Yi , infinitely often) = 0.
This implies that w.p. 1, Xi = Yi for all but finitely many i’s and hence, it
suffices to show that
n
Ȳn ≡
1)
Yi → EX1 w.p. 1.
n i=1
(2.17)
Next, EYi = E(Xi I(Xi ≤ i) = E(X1 I(X1 ≤ i)) → EX1 (by the MCT)
and hence
n
1)
E Ȳn =
EYi → EX1 as n → ∞.
(2.18)
n i=1
246
8. Laws of Large Numbers
Suppose for the moment that for each fixed 1 < ρ < ∞, it is shown that
Ȳnk → EX1 as k → ∞ w.p. 1
(2.19)
where nk = <ρk = = the greatest integer less than or equal to ρk , k ∈ N.
Then, since the Yi ’s are nonnegative, for any n and k satisfying ρk ≤ n <
ρk+1 , one gets
nk+1
nk
n
1)
1 )
1)
Yi ≤ Ȳn =
Yj ≤
Yi
n i=1
n j=1
n i=1
=⇒
nk
nk+1
Ȳnk ≤ Ȳn ≤
Ȳnk+1
n
n
=⇒
1
Ȳn ≤ Ȳn ≤ ρȲnk+1 .
ρ k
From (2.19), it follows that
1
EX1 ≤ lim inf Ȳn ≤ lim sup Ȳn ≤ ρEX w.p. 1.
n→∞
ρ
n→∞
Since this is true for each 1 < ρ < ∞, by taking ρ = 1 +
it follows that
1
r
for r = 1, 2, . . .,
EX1 ≤ lim inf Ȳn ≤ lim sup Ȳn ≤ EX1 w.p. 1,
n→∞
n→∞
establishing (2.17).
It now remains to prove (2.19). By (2.18), it is enough to show that
Ȳnk − E Ȳnk → 0
as
k → ∞, w.p. 1.
(2.20)
By Chebychev’s inequality and the pairwise independence of the variables
{Yn }n≥1 , for any # > 0,
P (|Ȳnk − E Ȳnk | > #) ≤
≤
Thus,
∞
)
k=1
nk
1
1 1 )
Var(Ȳnk ) = 2 2
Var(Yi )
#2
# nk i=1
nk
1 1 )
EYi2 .
#2 n2k i=1
P (|Ȳnk − E Ȳnk | > #) ≤
=
nk
∞
1 ) 1 )
EYi2
#2
n2k i=1
k=1
- )
.
∞
1 )
1
2
.
EYi
#2 i=1
n2k
k:nk ≥i
(2.21)
8.2 Strong laws of large numbers
247
Since nk = <ρk = > ρk−1 for 1 < ρ < ∞,
)
k:nk ≥i
1
≤
n2k
)
k:ρk−1 ≥i
1
ρ(k−1)2
≤
C1
i2
(2.22)
for some constant C1 , 0 < C1 < ∞.
Next, since the Xi ’s are identically distributed,
∞
)
EY 2
i=1
i2
i
=
=
=
∞
)
EX12 I(0 ≤ X1 ≤ i)
i2
i=1
∞ )
i
)
EX12 I(j − 1 < X1 ≤ j)
i2
i=1 j=1
∞
)
!
j=1
≤
∞
)
!
j=1
∞
")
EX12 I(j − 1 < X1 ≤ j)
i−2
i=j
"
jEX1 I(j − 1 < X1 ≤ j) · C2 j −1
= C2 EX1 < ∞,
(2.23)
for some constant C2 , 0 < C2 < ∞.
Now (2.21)–(2.23) imply that
∞
)
k=1
P (|Ȳnk − E Ȳnk | > #) < ∞
for each # > 0. By the Borel-Cantelli lemma and Proposition 7.2.3 (a),
(2.20) follows and the proof is complete.
!
The following corollary is immediate from the above theorem.
Corollary 8.2.8:
(Extension to the vector case).
Let {Xn =
(Xn1 , . . . , Xnk )}n≥1 be a sequence of k-dimensional random vectors defined on a probability space (Ω, F, P ) such that for each i, 1 ≤ i ≤ k,
the sequence {Xni }n≥1 are pairwise independent and identically distributed
with E|X1i | < ∞. Let µ = (EX11 , EX12 , . . . , EX1k ) and f : Rk → R be
continuous at µ. Then
$n
(i) X̄n ≡ (X̄n1 , X̄n2 , . . . , X̄nk ) → µ w.p. 1, where X̄ni = n1 j=1 Xji for
1 ≤ i ≤ k.
(ii) f (X̄n ) → f (µ) w.p. 1.
Example 8.2.1: Let (Xn , Yn ), n = 1, 2, . . . be a sequence of bivariate iid
random vectors with EX12 < ∞, EY12 < ∞. Then the sample correlation
248
8. Laws of Large Numbers
coefficient ρ̂n , defined by,
"
Xi Yi − X̄n Ȳn
" ! 1 $n
"
2
2
i=1 (Xi − X̄n )
i=1 (Yi − Ȳn )
n
ρ̂n ≡ P! $
n
1
n
! 1 $n
i=1
n
is a strongly consistent estimator of the population correlation coefficient ρ,
defined by,
Cov(X1 , Y1 )
,
ρ= L
Var(X1 )Var(Y1 )
i.e., ρ̂n → ρ w.p. 1. This follows from the above corollary by taking f :
R5 → R to be

t 5 − t 1 t2

 L
, for t3 > t21 , t4 > t22
2 )(t − t2 )
(t
−
t
3
4
1
2
f (t1 , t2 , t3 , t4 , t5 ) =


0,
otherwise,
and the vector (Xn1 , Xn2 , . . . , Xn5 ) to be
Xn1 = Xn , Xn2 = Yn , Xn3 = Xn2 , Xn4 = Yn2 , Xn5 = Xn Yn .
Corollary 8.2.9: (Extension to the pairwise m-dependent case). Let
{Xn }n≥1 be a sequence of random variables on a probability space (Ω, F, P )
such that for an integer m, 1 ≤ m < ∞, and for each i, 1 ≤ i ≤ m, the random variables {Xi , Xi+m , Xi+2m , . . .} are identically distributed and pairwise independent with E|Xi | < ∞. Then
m
X̄n →
1 )
EXi w.p. 1.
m i=1
The proof is left as an exercise (Problem 8.2). For an application of the
above result to a discussion on normal numbers, see Problem 8.15.
Example 8.2.2: (IID Monte
& Carlo). Let (S, S, π) be a probability space,
f ∈ L1 (S, S, π) and λ = S f dπ. Let {Xn }n≥1 be a sequence of iid Svalued random variables with distribution π. Then, the IID Monte Carlo
approximation to λ is defined as
n
λ̂n ≡
1)
f (Xi ).
n i=1
Note that by the SLLN, λ̂n → λ w.p. 1.
An extension of this to the case where {Xi }i≥1 is a Markov chain, known
as Markov chain Monte Carlo (MCMC), is discussed in Chapter 14.
8.3 Series of independent random variables
249
8.3 Series of independent random variables
Let {Xn }n≥1 be a sequence of independent random variables on a probability space (Ω, F, P ). The$
goal of this section is to investigate the conver∞
gence $
of the infinite series n=1 Xn , i.e., that of the partial sum sequence,
n
Sn = i=1 Xi , n ≥ 1.
The main result of this section is Kolmogorov’s 3-series theorem (Theorem 8.3.5). The following two inequalities play a fundamental role in the
proof of this theorem and also have other important applications.
Theorem 8.3.1: Let {Xj : 1 ≤ j ≤ n} be a collection of independent
$i
random variables. Let Si = j=1 Xj for 1 ≤ i ≤ n.
(i) (Kolmogorov’s first inequality). Suppose that EXj = 0v and EXj2 <
∞, 1 ≤ j ≤ n. Then, for 0 < λ < ∞,
( Var(S )
'
n
.
(3.1)
P max |Si | ≥ λ ≤
1≤i≤n
λ2
(ii) (Kolmogorov’s second inequality). Suppose that there exists a constant
C ∈ (0, ∞) such that P (|Xj − EXj | ≤ C) = 1 for 1 ≤ j ≤ n. Then,
for any 0 < λ < ∞,
( (2C + 4λ)2
'
.
P max |Si | ≤ λ ≤
1≤i≤n
Var(Sn )
Proof: Let A = {max1≤i≤n |Si | ≥ λ} and let
A1
Aj
= {|S1 | ≥ λ},
= {|S1 | < λ, |S2 | < λ, . . . , |Sj−1 | < λ, |Sj | ≥ λ}
#n
for j = 2, . . . , n. Then A1 , . . . , An are disjoint, j=1 Aj = A and P (A) =
$n
j=1 P (Aj ). Since EXj = 0 for all j,
Var(Sn ) = ESn2
≥ E(Sn2 IA ) =
n
)
E(Sn2 IAj )
j=1
n
8'
( 9
)
=
E (Sn − Sj )2 + Sj2 + 2(Sn − Sj )Sj IAj
j=1
≥
n
)
j=1
E(Sj2 IAj ) + 2
n−1
)
j=1
(
'
E (Sn − Sj )Sj IAj .
(3.2)
$n
Note that since {X1 , . . . , Xn } are independent, (Sn − Sj ) ≡ i=j+1 Xi and
Sj IAj are independent for 1 ≤ j ≤ n − 1. Hence,
(
'
E (Sn − Sj )Sj IAj = E(Sn − Sj )E(Sj IAj ) = 0.
250
8. Laws of Large Numbers
Also on Aj , Sj2 ≥ λ2 . Therefore, by (3.2),
Var(Sn ) ≥
n
)
λ2 P (Aj ) = λ2 P (A).
j=1
This establishes (i). For a proof of (ii), see Chung (1974), p. 117.
!
Remark 8.3.1: Recall that Chebychev’s inequality asserts that for λ > 0,
n)
and thus Kolmogorov’s first inequality is signifiP (|Sn | ≥ λ) ≤ Varλ(S
2
cantly stronger. Kolmogorov’s first inequality has an extension known as
Doob’s maximal inequality to a class of dependent random variables, called
martingales (see Chapter 13). The next inequality is due to P. Levy.
Definition 8.3.1: For any random variable X, a real number c is called
a median of X if
1
P (X < c) ≤ ≤ P (X ≤ c).
(3.3)
2
Such a c always exists. It can be verified that c0 ≡ inf{x : P (X ≤ x) ≥ 12 }
is a median. Note that if c is a median of X and α is a real number, then αc
is a median of αX and α+c is a median of α+X. Further, if P (|X| ≥ α) < 12
for some α > 0, then any median c of X satisfies |c| ≤ α (Problem 8.4).
Theorem 8.3.2: (Levy’s inequality).
Let Xj , j = 1, . . . , n be independent
$n
random variables. Let Sj = j=1 Xi , and cj,n be a median of (Sn − Sj ) for
1 ≤ j ≤ n, where cn,n is set equal to 0. Then, for any 0 < λ < ∞,
(
'
(i) P max (Sj − cj,n ) ≥ λ ≤ 2P (Sn ≥ λ) ;
1≤j≤n
(ii) P
'
(
max |Sj − cj,n | ≥ λ ≤ 2P (|Sn | ≥ λ).
1≤j≤n
Proof: Let
Aj
B
B1
Bj
=
{Sj − Sn ≤ cj,n } for 1 ≤ j ≤ n,
3
4
=
max (Sj − cj,n ) ≥ λ ,
1≤j≤n
=
=
{S1 − c1,n ≥ λ}
{Si − ci,n < λ for
1 ≤ i ≤ j − 1, Sj − cj,n ≥ λ},
#n
for j = 2, . . . , n. Then B1 , . . . , Bn are disjoint and j=1 Bj = B. Since
X1 , . . . , Xn are independent, Aj and Bj are independent for each j =
1, 2, . . . , n. Also for each j, Aj = {Sj − cj,n ≤ Sn }, and hence on Aj ∩ Bj ,
Sn ≥ λ holds. Thus,
P (Sn ≥ λ)
≥
n
)
j=1
P (Aj ∩ Bj )
8.3 Series of independent random variables
=
n
)
j=1
251
P (Aj )P (Bj )
-*
n
≥
1
P
2
=
1
P (B),
2
j=1
Bj
.
proving part (i). Now part (ii) follows by applying part (i) to both {Xi }ni=1
and {−Xi }ni=1 .
!
Recall that if {Yn }n≥1 is a sequence of random variables, then {Yn }n≥1
converges w.p. 1 implies that {Yn }n≥1 converges in probability as well. A
remarkable result of P. Levy is that if {Sn }n≥1 is the sequence of partial
sums of independent random variables and {Sn }n≥1 converges in probability, then {Sn }n≥1 must converge w.p. 1 as well. The proof of this uses
Levy’s inequality proved above.
Theorem 8.3.3: $
Let {Xn }n≥1 be a sequence of independent random varin
ables. Let Sn =
j=1 Xj for 1 ≤ n < ∞ and let {Sn }n≥1 converge in
probability to a random variable S. Then Sn → S w.p. 1.
Proof: Recall that a sequence {xn }n≥1 of real numbers converges iff it is
Cauchy iff δn ≡ sup{|xk − x, | : k, 2 ≥ n} → 0 as n → ∞. Let
˜n
∆
∆n
≡ sup{|Sk − S, | : k, 2 ≥ n} and
≡ sup{|Sk − Sn | : k ≥ n}.
˜ n ≤ 2∆n and ∆
˜ n is decreasing in n. Suppose it is shown that
Then, ∆
∆n −→p 0.
(3.4)
˜ n −→p 0 and hence there is a subsequence {nk }k≥1 such that
Then, ∆
˜ n is decreasing in n, this implies that
˜
∆nk → 0 as k → ∞ w.p. 1. Since ∆
˜
∆n → 0 w.p. 1. Thus it suffices to establish (3.4). Fix 0 < # < 1. Let
Sn,,
∆n,k
= Sn+, − Sn
=
for 2 ≥ 1,
max{|Sn,, | : 1 ≤ 2 ≤ k}, k ≥ 1.
Note that for each n ≥ 1, {∆n,k }k≥1 is a nondecreasing sequence,
lim ∆n,k = ∆n and hence, for any n ≥ 1,
k→∞
P (∆n > #) = lim P (∆n,k > #).
k→∞
(3.5)
Levy’s inequality (Theorem 8.3.2) will now be used to bound P (∆n,k > #)
uniformly in k. Since Sn −→p S, for any η > 0, there exists an n0 ≥ 1 such
that for all n ≥ n0 ,
P (|Sn − S| > η) < η.
252
8. Laws of Large Numbers
This implies that for all k ≥ 2 ≥ n0 ,
P (|Sk − S, | > 2η) < 2η.
(3.6)
If 0 < η < 14 , then the medians of Sk − S, for k ≥ 2 ≥ n0 are bounded
by 2η. Hence, for n ≥ n0 and k ≥ 1, applying Levy’s inequality (i.e., the
above theorem) to {Xi : n + 1 ≤ i ≤ n + k},
'
(
P (∆n,k > #) = P max |Sn,j | > #
1≤j≤k
'
(
≤ P max |Sn,j − cn+j,n+k | ≥ # − 2η
1≤j≤k
≤ 2P (|Sn,k | ≥ # − 2η).
Now, choosing 0 < η < 4$ , (3.6) yields P (∆n,k > #) < 4η < # for all n ≥ n0 ,
k ≥ 1. Then, by (3.5), P (∆n > #) ≤ # for all n ≥ n0 . Hence, (3.4) holds. !
The following result on convergence of infinite series of independent random variables is an immediate consequence of the above theorem.
Theorem 8.3.4: (Khinchine-Kolmogorov’s 1-series theorem). Let
a probability
{Xn }n≥1 be a sequence of independent random variables
$on
∞
space (Ω, F, P$
) such that EXn = 0 for all n ≥ 1 and n=1 EXn2 < ∞.
n
Then Sn ≡
j=1 Xj converges in mean square and almost surely, as
n → ∞.
Proof: For any n, k ∈ N,
E(Sn − Sn+k )2 = Var(Sn − Sn+k ) =
$∞
n+k
)
j=n+1
Var(Xj ) =
n+k
)
EXj2 ,
j=k+1
by independence. Since n=1 EXn2 < ∞, {Sn }n≥1 is a Cauchy sequence in
L2 (Ω, F, P ) and hence converges in mean square to some S in L2 (Ω, F, P ).
This implies that Sn −→p S, and by the above theorem Sn → S w.p. 1. !
Remark 8.3.2: It is possible to give another proof of the above theorem
using Kolmogorov’s inequality. See Problem 8.5.
Theorem 8.3.5: (Kolmogorov’s 3-series theorem). Let {Xn }n≥1 be a
sequence of independent
random variables on a probability space (Ω, F, P )
$n
and let Sn = i=1 Xi , n ≥ 1. Then the sequence {Sn }n≥1 converges w.p.
1 iff the following 3-series converge for some 0 < c < ∞:
$∞
(i)
i=1 P (|Xi | > c) < ∞,
$∞
(ii)
i=1 E(Yi ) converges,
$∞
(iii)
i=1 Var(Yi ) < ∞,
8.3 Series of independent random variables
253
where Yi = Xi I(|Xi | ≤ c), i ≥ 1.
Proof: (Sufficiency). By (i) and the Borel-Cantelli lemma, P (Xi )=
Yi i.o.) = P (|Xi | > c i.o.) = 0.$
Hence {Sn }n≥1 converges w.p. 1 iff {Tn }n≥1
n
converges w.p. 1, where
T
=
i=1 Yi , n ≥ 1. By (iii) and the 1-series the$n n
orem, the sequence { i=1 (Yi − EYi )}n≥1 converges w.p. 1. Hence, by (ii),
{Tn }n≥1 converges w.p. 1 and hence {Sn }n≥1 converges w.p. 1.
(Necessity). Suppose {Sn }n≥1 converges w.p. 1. Fix 0 < c < ∞ and let
Yi = Xi I(|Xi | ≤ c), i ≥ 1. Since {Sn }n≥1 converges w.p. 1, Xn → 0 w.p.
1. Hence, w.p. 1, |Xi | ≤ c for all but a finite number of i’s. If Ai ≡ {Xi )=
Yi } = {|Xi | > c}, then by the second Borel-Cantelli lemma,
∞
)
i=1
P (Ai ) < ∞, establishing (i).
To establish (ii) and (iii), the following construction and the second inequality of Kolmogorov will be used. Without loss of generality, assume
that there is another sequence {X̃n }n≥1 of random variables on the same
probability space (Ω, F, P ) such that (a) {X̃n }n≥1 are independent, (b)
{X̃n }n≥1 is independent of {Xn }n≥1 , and (c) for each n ≥ 1, Xn =d X̃n ,
i.e., Xn and X̃n have the same distribution. Let
Ỹi
Zi
Tn
= X̃i I(|X̃i | ≤ c),
=
Yi − Ỹi , i ≥ 1,
n
)
≡
Yi ,
i=1
T̃n
≡
and
Rn ≡
n
)
Ỹi ,
i=1
n
)
i=1
Zi , n ≥ 1.
$n
Since {Sn ≡
i=1 Xi }n≥1 converges w.p. 1, and Xi = Yi for all but a
finite number of i, {Tn }n≥1 converges w.p. 1. Since {Yi }n≥1 and {Ỹi }n≥1
have the same distribution on R∞ , {T̃n }n≥1 converges w.p. 1. Thus the
difference sequence {Rn }n≥1 converges w.p. 1.
Next, note that {Zn }n≥1 are independent random variables with mean 0
and {Zn }n≥1 are uniformly bounded by 2c. Applying Kolmogorov’s second
inequality (Theorem 8.3.1 (b)) to {Zj : m < j ≤ m + n} yields
.
(2c + 4#)2
P
max |Rj − Rm | ≤ # ≤ $m+n
(3.7)
m<j≤m+n
i=m+1 Var(Zi )
for all m ≥ 1, n ≥ 1, 0 < # < ∞.
254
8. Laws of Large Numbers
Let ∆m ≡ maxm<j |Rj − Rm |. Let n → ∞ in (3.7) to conclude that
(2c + 4#)2
P (∆m ≤ #) ≤ $∞
.
i=m+1 Var(Zi )
Now suppose (iii) does not hold.$Then, since Yi and Ỹi are iid, Var(Zi ) =
∞
2Var(Yi ) for all i ≥ 1, and thus i=m+1 Var(Zi ) = ∞ and hence P (∆m ≤
#) = 0 for each m ≥ 1, 0 < # < ∞. This implies that P (∆m > #) = 1 for
each # > 0 and hence that ∆m = ∞ w.p. 1 for all m ≥ 1. This contradicts
the convergence w.p. 1 of the$
sequence {Rn }n≥1 . Hence (iii) holds.
n
(Yi − EYi )}n≥1 converges w.p. 1. Since
By
the
1-series
theorem,
{
i=1
$∞
$n
{ i=1 Yi }n≥1 converges w.p. 1, i=1 EYi converges, establishing (ii). This
completes the proof of necessity part and of the theorem.
!
Remark 8.3.3: To go from the convergence w.p. 1 of {Rn }n≥1 to (iii), it
suffices to show that if (iii) fails, then for each 0 < A < ∞, P (|Rn | ≤ A) →
0 as n → ∞. This can be established without the use of (3.7) but using the
central limit theorem (to be proved later in Chapter 11), which shows that
if Var(Rn ) → ∞, then
.
% x
2
1
Rn
≤ x → Φ(x) ≡ √
e−t /2 dt,
P L
2π −∞
Var(Rn )
for all x in R. (Also see Billingsley (1995), p. 290.)
8.4 Kolmogorov and Marcinkiewz-Zygmund SLLNs
For a sequence of independent and identically distributed random variables
{Xn }n≥1 , Kolmogorov showed that {Xn }n≥1 obeys the SLLN with bn = n
iff E|X1 | < ∞. Marcinkiewz and Zygmund generalized this result and
proved a class of SLLNs for {Xn }n≥1 when E|X|p < ∞ for some p ∈ (0, 2).
The proof uses Kolmogorov’s 3-series theorem and some results from real
analysis. This approach is to be contrasted with Etemadi’s proof of the
SLLN, which uses a decomposition of the random variables {Xn }n≥1 into
positive and negative parts and uses monotonicity of the sum to establish
almost sure convergence along a subsequence by an application of the BorelCantelli lemma. The alternative approach presented in this section is also
useful for proving SLLNs for sums of independent random variables that
are not necessarily identically distributed.
The next three are preparatory results for Theorem 8.4.4.
Lemma 8.4.1: (Abel’s summation formula). Let {an }n≥1 and {bn }n≥1
be two sequences of real numbers. Then, for all n ≥ 2,
n
)
j=1
aj bj = An bn −
n−1
)
j=1
Aj (bj+1 − bj )
(4.1)
8.4 Kolmogorov and Marcinkiewz-Zygmund SLLNs
where Ak =
$k
aj , k ≥ 1.
aj bj
=
j=1
255
Proof: Let A0 = 0. Then, aj = Aj − Aj−1 , j ≥ 1. Hence,
n
)
j=1
n
)
j=1
=
n
)
j=1
(Aj − Aj−1 )bj =
Aj bj −
n−1
)
n
)
Aj b j −
j=1
n
)
Aj−1 bj
j=1
Aj bj+1 ,
j=1
yielding (4.1).
!
be seLemma 8.4.2: (Kronecker’s lemma). Let {an }n≥1 and {bn }n≥1
$∞
quences of real numbers such that 0 < bn ↑ ∞ as n → ∞ and
j=1 aj
converges. Then,
n
1 )
aj bj −→ 0
bn j=1
as
n → ∞.
(4.2)
$k
$∞
Proof: Let Ak = j=1 aj , A ≡ j=1 aj = limk→∞ Ak and Rk = A − Ak ,
k ≥ 1. Then, by Lemma 8.4.1 for n ≥ 2,
n
)
j=1
aj bj
= An bn −
= An bn −
n−1
)
j=1
n−1
)
j=1
= An bn − A
Aj (bj+1 − bj )
(A − Rj )(bj+1 − bj )
n−1
)
j=1
(bj+1 − bj ) +
= An bn − Abn + Ab1 +
= −Rn bn + Ab1 +
n−1
)
j=1
n−1
)
j=1
n−1
)
j=1
Rj (bj+1 − bj )
Rj (bj+1 − bj )
Rj (bj+1 − bj ).
(4.3)
$∞
Since n=1 an converges, Rn → 0 as n → ∞. Hence, given any # > 0, there
exists N = N$ > 1 such that |Rn | ≤ # for all n ≥ N . Since 0 < bn ↑ ∞, for
all n > N ,
J
J
)
J −1 n−1
J
Jbn
Rj (bj+1 − bj )JJ
J
j=1
256
8. Laws of Large Numbers
≤
b−1
n
N
−1
)
j=1
= b−1
n
N
−1
)
j=1
|Rj | |bj+1 − bj | +
# b−1
n
n−1
)
j=N
(bj+1 − bj )
|Rj | |bj+1 − bj | + #.
Now letting n → ∞ and then letting # ↓ 0, yields
J
J
n−1
)
J
J
J
R
(b
−
b
)
lim sup JJb−1
j j+1
j J = 0.
n
n→∞
j=1
Hence, (4.2) follows from (4.3).
!
Lemma 8.4.3: For any random variable X,
∞
)
n=1
P (|X| > n) ≤ E|X| ≤
∞
)
P (|X| > n).
(4.4)
n=0
Proof: For n ≥ 1, let An = {n − 1 < |X| ≤ n}. Define the random
variables
∞
∞
)
)
Y =
(n − 1) IAn and Z =
n I An .
n=1
n=1
Then, it is clear that Y ≤ |X| ≤ Z, so that
EY ≤ E|X| ≤ EZ.
(4.5)
Note that
EY
=
∞
)
n=1
=
=
=
(n − 1)P (An )
∞ n−1
)
)
P (An )
n=2 j=1
∞ )
∞
)
j=1 n=j+1
∞
)
P (n − 1 < |X| ≤ n)
P (|X| > j).
j=1
Similarly, one can show that EZ =
$∞
j=0
P (|X| > j). Hence, (4.4) follows.
!
Theorem 8.4.4: (Marcinkiewz-Zygmund SLLNs). Let {Xn }n≥1 be a sequence$of identically distributed random variables and let p ∈ (0, 2). Write
n
Sn = i=1 Xi , n ≥ 1.
8.4 Kolmogorov and Marcinkiewz-Zygmund SLLNs
257
(i) If {Xn }n≥1 are pairwise independent and
Sn − nc
n1/p
converges w.p. 1
(4.6)
for some c ∈ R, then E|X1 |p < ∞.
(ii) Conversely, if E|X1 |p < ∞ and {Xn }n≥1 are independent, then (4.6)
holds with c = EX1 for p ∈ [1, 2) and with any c ∈ R for p ∈ (0, 1).
Corollary 8.4.5: (Kolmogorov’s SLLN ). Let {Xn }n≥1 be a sequence of
iid random variables. Then,
Sn − nc
→0
n
w.p. 1
for some c ∈ R iff E|X1 | < ∞, in which case, c = EX1 .
Thus, Kolmogorov’s SLLN corresponds with the special case p = 1 of
Theorem 8.4.4. Note that compared with the WLLN and Borel’s SLLN
of Sections 8.1 and 8.2, Kolmogorov’s SLLN presents a significant improvement in the moment condition, i.e., it assumes the finiteness of only
the first absolute moment. Further, both the Kolmogorov’s SLLN and the
Marcinkiewz-Zygmund SLLN are proved under minimal moment conditions, since the corresponding moment conditions are shown to be necessary.
Proof of Theorem 8.4.4: (i) Suppose that (4.6) holds for some c ∈ R.
Then,
Xn
n1/p
=
=
Sn − Sn−1
n1/p
Sn − nc Sn−1 − (n − 1)c
c
−
+ 1/p
n1/p
n1/p
n
→ 0 as n → ∞, a.s.
Hence, P (|Xn /n1/p | > 1 i.o.) = 0. By the second Borel-Cantelli lemma
and by the pairwise independence of {Xn }n≥1 , this implies
.
∞
)
|Xn |
P
>
1
< ∞,
n1/p
n=1
i.e.,
∞
)
!
"
P |X1 |p > n < ∞.
n=1
Hence, by Lemma 8.4.3, E|X1 |p < ∞.
258
8. Laws of Large Numbers
To prove (ii), suppose that E|X1 |p < ∞ for some p ∈ (0, 2). For 1 ≤
p < 2, w.l.o.g. assume that EX1 = 0. Next, define the variables Zn =
Xn I(|Xn |p ≤ n), n ≥ 1. Then, by Lemma 8.4.3,
∞
)
=
n=1
∞
)
n=1
P (Xn )= Zn )
P (|Xn |p > n) =
∞
)
n=1
P (|X1 |p > n) ≤ E|X1 |p < ∞.
Hence, by the Borel-Cantelli lemma,
P (Xn )= Zn i.o.) = 0.
(4.7)
Note that, in view of (4.7), (4.6) holds with c = 0 if and only if
n1/p
n
)
i=1
Zi → 0
as
n → ∞,
w.p. 1.
(4.8)
Note that for any j ∈ N, θ > 1 and β ∈ (−∞, 0)\{−1},
∞
)
n=j
n−θ
≤
j −θ +
n=j+1
= j −θ +
≤
and similarly,
j
)
nβ
n=1
≤
≤
∞ %
)
n
x−θ dx
n−1
1
· j −(θ−1)
θ−1
θ
· j −(θ−1)
θ−1
C
B
β + j (β+1) /(β + 1)
j β+1
β
I(β < −1) +
I(−1 < β < 0).
β+1
β+1
Now,
∞
)
≤
=
n=1
∞
)
Var(Zn /n1/p )
EX12 I(|X1 |p ≤ n) · n−2/p
n=1
n
∞ )
)
n=1 j=1
(4.9)
EX12 I(j − 1 < |X1 |p ≤ j) · n−2/p
(4.10)
8.4 Kolmogorov and Marcinkiewz-Zygmund SLLNs
=
∞ -)
∞
)
j=1
≤
≤
2
2−p
2
2−p
n−2/p
n=j
∞
)
.
259
· EX12 I(j − 1 < |X1 |p ≤ j)
!
"
2
j −( p −1) · EX12 I (j − 1) < |X1 |p ≤ j
(by (4.9))
j=1
∞
)
2
j −( p −1) · E|X1 |p I(j − 1 < |X1 |p ≤ j) · (j 1/p )2−p
j=1
2
E|X1 |p < ∞.
=
2−p
$∞
1/p
converges w.p. 1. By
Hence, by Theorem 8.3.4,
n=1 (Zn − EZn )/n
Kronecker’s lemma (viz. Lemma 8.4.2),
n−1/p
n
)
j=1
(Zj − EZj ) → 0
as
w.p. 1.
n → ∞,
(4.11)
Now consider the case p = 1. In this case, E|X1 | < ∞ and $
by the DCT,
n
EZn = EX1 I(|X1 | ≤ n) → EX1 = 0 as n → ∞. Hence, n−1 i=1 EZi →
0. Part (ii) of the theorem now follows from (4.8) and (4.11) for p = 1.
Next consider the case p ∈ (0, 2), p )= 1. Using (4.9) and (4.10), one can
show (cf. Problem 8.12) that
n
−1/p
n
)
j=1
EZj → 0
as
(4.12)
n → ∞.
Hence, by (4.8), (4.11), and (4.12), one gets (4.6) with c = 0 for p ∈
(0, 2)\{1}. Finally, note that for p ∈ (0, 1), and for any c ∈ R,
Sn − nc
n1/p
=
Sn
nc
− 1/p
n1/p
n
→ 0 as n → ∞,
a.s.,
whenever Sn /n1/p → 0 as n → ∞, w.p. 1. Hence, (4.6) holds with an
arbitrary c ∈ R for p ∈ (0, 1). This completes the proof of part (ii) for
p ∈ (0, 2)\{1} and hence of the theorem.
!
The next result gives a SLLN for independent random variables that are
not necessarily identically distributed.
Theorem
Let {Xn }n≥1 be a sequence of independent random vari$8.4.6:
∞
ables. If n=1 E|Xn |αn /nαn < ∞ for some αn ∈ [1, 2], n ≥ 1, then
n−1
n
)
j=1
(Xj − EXj ) → 0 as
n → ∞,
w.p. 1.
(4.13)
260
8. Laws of Large Numbers
Proof: W.l.o.g. suppose that EXn = 0 for all n ≥ 1. Let Yn =
Xn I(|Xn | ≤ n)/n. Note that |EYn | = |n−1 (EXn − EXn I(|Xn | > n))| =
n−1 |EXn I(|Xn | > n)|, n ≥ 1. Since 1 ≤ αn ≤ 2,
∞
)
n=1
{P (|Xn | > n) + |EYn |}
≤ 2
≤
2
and
∞
)
∞
)
n=1
∞
)
n=1
Var(Yn )
n=1
≤
≤
n−1 E|Xn |I(|Xn | > n)
E|Xn |αn /nαn < ∞
∞
)
n=1
∞
)
n=1
n−2 EXn2 I(|Xn | ≤ n)
n−αn EXnαn < ∞.
$∞
Hence, by Kolmogorov’s 3-series theorem, n=1 (Xn /n) converges w.p. 1.
Now the theorem follows from Lemma 8.4.2.
!
Corollary 8.4.7: Let {Xn }n≥1 be a$sequence of independent random vari∞
ables such that for some α ∈ [1, 2], n=1 (n−α E|Xn |α ) < ∞. Then (4.13)
holds.
8.5 Renewal theory
8.5.1
Definitions and basic properties
Let {Xn }n≥0 be a sequence of nonnegative random variables that are independent
and, for i ≥ 1, identically distributed with cdf F . Let Sn =
$n
i=0 Xi for n ≥ 0. Imagine a system where a component in operation at
time t = 0 lasts X0 units of time and then is replaced by a new one that
lasts X1 units of time, which, at failure, is replaced by yet another new one
that lasts X2 units of time and so on. The sequence {Sn }n≥0 represents
the sequence of epochs when ‘renewal’ takes place and is called a renewal
sequence. Assume that P (X1 = 0) < 1. Then, since P (X1 < ∞) = 1, it follows that for each n, P (Sn < ∞) = 1 and limn→∞ Sn = ∞ w.p. 1 (Problem
8.16). Now define the counting process {N (t) : t ≥ 0} by the relation
N (t) = k
if Sk−1 ≤ t < Sk
for k = 0, 1, 2, . . .
(5.1)
where S−1 = 0. Thus N (t) counts the number of renewals up to time t.
8.5 Renewal theory
261
Definition 8.5.1: The stochastic process {N (t) : t ≥ 0} is called a renewal
process with lifetime distribution F . The renewal sequence {Sn }n≥0 and the
renewal process {N (t) : t ≥ 0} are called nondelayed or standard if X0 has
the same distribution as X1 and are called delayed otherwise.
Since P (X1 ≥ 0) = 1, {Sn }n≥0 is nondecreasing in n and for each t ≥ 0,
the event {N (t) = k} = {Sk−1 ≤ t < Sk } belongs to the σ-algebra σ1{Xj :
0 ≤ j ≤ k}2 and hence N (t) is a random variable. Using the nontriviality
hypothesis that P (X1 = 0) < 1, it is shown below that for each t > 0, the
random variable N (t) has finite moments of all order.
Proposition 8.5.1: Let P (X1 = 0) < 1. Then there exists 0 < λ < 1 (not
depending on t) and a constant C(t) ∈ (0, ∞) such that
P (N (t) > k) ≤ C(t)λk
for all
k > 0.
(5.2)
Proof: For t > 0, k ∈ N,
P (N (t) > k)
= P (Sk ≤ t)
"
!
for θ > 0
= P e−θSk ≥ e−θt
! −θS "
θt
k
≤ e E e
(by Markov’s inequality)
'
" !
"( k
!
.
= eθt E e−θX0 E e−θX1
By BCT, limθ↑∞ E(e−θX1 ) = P (X1 = 0) < 1. Hence, there exists a θ large
!
such that λ ≡ E(e−θX1 ) is less than one, thus, completing the proof.
Corollary 8.5.2: There exists an s0 > 0 such that the moment generating
function (m.g.f.) E(esN (t) ) < ∞ for all s < s0 and t ≥ 0.
!
"
Proof: From (5.2), for any t > 0, it follows that P N (t) = k = O(λk ) as
" $∞
!
!
k "→ ∞ for some 0 < λ < 1 and hence E esN (t) = k=0 (es )k P N (t) =
!
k < ∞ for any s such that es λ < 1, i.e., for all s < s0 ≡ − log λ.
From (5.1), it follows that for t > 0,
SN (t)−1 ≤ t < SN (t)
⇒
'S
(
' N (t) − 1 ( S
t
N (t)−1
N (t)
≤
≤
.
N (t)
(N (t) − 1)
N (t)
N (t)
(5.3)
Let A be the event that Snn → EX1 as n → ∞ and let B be the event
that N (t) → ∞ as t → ∞. Since Sn → ∞ w.p. 1, it follows that P (B) = 1.
Also, by the SLLN, P (A) = 1. On the event C = A ∩ B, it holds that
SN (t)
→ EX1
N (t)
as
t → ∞.
262
8. Laws of Large Numbers
This together with (5.3) yields the following result.
Proposition 8.5.3: Suppose that P (X1 = 0) < 1. Then,
lim
t→∞
1
N (t)
=
t
EX1
w.p. 1.
(5.4)
Definition 8.5.2: The function U (t) ≡ EN (t) for the nondelayed process
is called the renewal function.
An explicit expression for U (·) is given by (5.13) below.
Next consider the convergence of EN (t)/t. By (5.4) and Fatou’s lemma,
one gets
1
EN (t)
≥
.
(5.5)
lim inf
t→∞
t
EX1
It turns out that the lim inf t→∞ in (5.5) can be replaced by limt→∞ and
≥ by equality. To do this it suffices to show that the family { N t(t) : t ≥ k}
is uniformly integrable for some k < ∞. This can be done by showing
E( N t(t) )2 is bounded in t (see Chung (1974), Chapter 5). An alternate
approach is to bound the lim sup. For this one can use an identity known
as Wald’s equation (see also Chapter 13).
8.5.2
Wald’s equation
Let {Xj }j≥1 be independent
$n random variables with EXj = 0 for all j ≥ 1.
Also, let S0 = 0, Sn = j=1 Xj , n ≥ 1.
Definition 8.5.3: A positive integer valued random variable N is called
a stopping time with respect to {Xj }j≥1 if for every j ≥ 1, the event
{N = j} ∈ σ1{X1 , . . . , Xj }2. A stopping time N is called bounded if there
exists a K < ∞ such that P (N ≤ K) = 1.
$n
Example 8.5.1: N ≡ min{n : j=1 Xj ≥ 25} is a stopping time w.r.t.
$n
{Xj }j≥1 , but M ≡ max{n : j=1 Xj ≥ 25} is not.
Proposition 8.5.4: Let {Xj }j≥1 be independent random variables with
EXj = 0. Let N be a bounded stopping time w.r.t. {Xj }j≥1 . Then
E(|SN |) < ∞
and
ESN = 0.
$K
Proof: Let K ∈ N be such that P (N ≤ K) = 1. Then |SN | ≤ j=1 |Xi |
$K
and hence E|SN | < ∞. Next, SN = j=1 Xj I(N ≥ j) and hence
ESN =
K
)
!
"
E Xj I(N ≥ j) .
j=1
8.5 Renewal theory
263
But the event {N ≥ j} = {N ≤ j − 1}c ∈ σ1{X1 , X2 , . . . , Xj−1 }2. Since
Xj is independent of σ1X1 , X2 , . . . , Xj−1 2,
!
"
E Xj I(N ≥ j) = 0 for 1 ≤ j ≤ K.
Thus ESN = 0.
!
Corollary 8.5.5: Let {Xj }j≥1 be iid random variables with E|X1 | < ∞.
Let N be a bounded stopping time w.r.t. {Xj }j≥1 . Then
ESN = (EN )EX1 .
Corollary 8.5.6: Let {Xj }j≥1 be iid nonnegative random variable with
E|X1 | < ∞. Let N be a stopping time w.r.t. {Xj }j≥1 . Then
ESN = (EN )EX1 .
Proof: Let Nk = N ∧ k, k = 1, 2, . . .. Then Nk is a bounded stopping
time. By Corollary 8.5.5,
E(SNk ) = (ENk )EX1 .
Let k ↑ ∞. Then 0 ≤ SNk ↑ SN and Nk ↑ N . By the MCT, ESNk ↑ ESN
and ENk ↑ EN .
!
Theorem 8.5.7: (Wald’s equation). Let {Xj }j≥1 be iid random variables
with E|X1 | < ∞. Let N be a stopping time w.r.t. {Xj }j≥1 such that EN <
∞. Then
ESN = (EN )EX1 .
Proof: Let Tn =
by Corollary 8.5.5,
$n
j=1
|Xj |, n ≥ 1. Let Nk = N ∧ k, k = 1, 2, . . .. Then
E(SNk ) = (ENk )EX1 .
Also, |SNk | ≤ TNk and
ETNk = (ENk )E|X1 |.
Further, as k → ∞, Nk → N , SNk → SN , TNk → TN , and
ETNk → ETN = (EN )E|X1 | < ∞.
So, by the extended DCT (Theorem 2.3.11)
ESNk → ESN
i.e.,
i.e.,
(ENk )EX1 → ESN
ESN = (EN )EX1 .
!
264
8. Laws of Large Numbers
8.5.3
The renewal theorems
In this section, two versions of the renewal theorem will be proved. For this,
the notation and concepts introduced in Sections 8.5.1 and 8.5.2 will be used
without further explanation. Note that for each t > 0 and j = 0, 1, 2, . . .,
the event {N (t) = j} = {Sj−1 ≤ t < Sj } belongs to σ1{X0 , . . . , Xj }2.
Thus, by Wald’s equation (Theorem 8.5.7 above)
!
"
E(SN (t) ) = EN (t) EX1 + EX0 .
Let m ∈ (0, ∞) and X̃i = min{Xi , m}, i ≥ 0. Let {S̃n }n≥0 and {Ñ (t)}t≥0
be the associated renewal sequence and renewal process, respectively.
Again, by Wald’s equation,
" !
"
!
E S̃Ñ (t) = E Ñ (t) E X̃1 + E X̃0 .
But since S̃Ñ (t)−1 ≤ t < S̃Ñ (t) , it follows that S̃Ñ (t) ≤ t + m and hence
(E Ñ (t))E X̃1 + E X̃0 ≤ t + m.
This yields
lim sup
t→∞
1
E Ñ (t)
≤
.
t
E X̃1
Clearly, for all t > 0, Ñ (t) ≥ N (t) and hence
lim sup E
t→∞
1
N (t)
≤
.
t
E X̃1
(5.6)
Since this is true for each m ∈ (0, ∞) and by the MCT, E X̃1 → EX1 as
m → ∞, it follows that
lim sup
t→∞
1
EN (t)
≤
.
t
EX1
Combining this with (5.5) leads to the following result.
Theorem 8.5.8: (The weak renewal theorem).
Let {N (t) : t ≥ 0} be a
&
renewal process with distribution F . Let µ = [0,∞) xdF (x) ∈ (0, ∞). Then,
1
EN (t)
= .
t→∞
t
µ
(5.7)
lim
The above result is also valid when µ = ∞ when
1
µ
is interpreted as zero.
Definition 8.5.4: A random variable X (and its probability distribution)
is called arithmetic (or lattice) if there exists a ∈ R and d > 0 such that X−a
d
8.5 Renewal theory
265
is integer valued. The largest such d is called the span of (the distribution
of) X.
Definition 8.5.5: A random variable X (and its distribution distribution)
is called nonarithmetic (or nonlattice) if it is not arithmetic.
The weak renewal theorem (Theorem
! 8.5.8) implies" that EN (t) = t/µ +
o(t) as t → ∞. This suggests that E N (t + h) − N (t) = (t + h)/µ − t/µ +
o(t) = h/µ + o(t). A strengthening of the above result is as follows.
Theorem 8.5.9: (The strong renewal theorem). Let {N (t) : t ≥ 0} be a
renewal process with a nonarithmetic distribution F with a finite positive
mean µ. Then, for each h > 0,
!
" h
lim E N (t + h) − N (t) = .
t→∞
µ
(5.8)
Remark 8.5.1: Since
N (t) =
k−1
)
j=0
!
"
N (t − j) − N (t − j − 1) + N (t − k)
where k ≤ t < k + 1, the weak renewal theorem follows from the strong
renewal theorem.
The following are the “arithmetic versions” of Theorems 8.5.8 and 8.5.9.
Let {Xi }i≥0 be independent positive integer valued random
$nvariables such
that {Xi }i≥1 are iid with distribution {pj }j≥1 . Let Sn = j=0 Xj , n ≥ 0,
S−1 = 0. Let Nn = k if Sk−1 ≤ n < Sk , k = 0, 1, 2, . . .. Let
un
= P (there is a renewal at time n)
= P (Sk = n for some k ≥ 0).
Theorem 8.5.10: Let µ =
n
$∞
j=1
jpj ∈ (0, ∞). Then
1
1)
uj →
n j=0
µ
as
n → ∞.
(5.9)
$∞
Theorem 8.5.11: Let µ = j=1 jpj ∈ (0, ∞) and g.c.d. {k : pk > 0} = 1.
Then
1
as n → ∞.
(5.10)
un →
µ
For proofs of Theorems 8.5.9 and 8.5.11, see Feller (1966) for an analytic
proof or Lindvall (1992) for a proof using the coupling method. The proof
of Theorem 8.5.10 is similar to that of Theorem 8.5.8.
266
8. Laws of Large Numbers
8.5.4
Renewal equations
The above strong renewal theorems have many applications. These are via
what are known as renewal equations.
Let F (·) be a cdf such that F (0) = 0. Let B 0 ≡ {f | f : [0, ∞) → R, f
is Borel measurable and bounded on bounded intervals}.
Definition 8.5.6: A function a(·) is said to satisfy the renewal equation
with distribution F (·) and forcing function b(·) ∈ B 0 if a ∈ B 0 and
%
a(t − u)dF (u) for t ≥ 0.
(5.11)
a(t) = b(t) +
(0,t]
Theorem 8.5.12: Let F be a cdf such that F (0) = 0 and let b(·) ∈ B 0 .
Then there is a unique solution a0 (·) ∈ B 0 to (5.11) given by
%
b(t − u)U (du)
(5.12)
a0 (t) =
[0,t]
where U (·) is the Lebesgue-Stieltjes measure induced by the nondecreasing
function
∞
)
F (n) (t),
(5.13)
U (t) ≡
n=0
with F
(n)
(·), n ≥ 0 being defined by the relations
%
(n)
F (n−1) (t − u)dF (u), t ∈ R, n ≥ 1,
F (t) =
(0,t]
F (0) (t)
=
/
1
0
if t ≥ 0
t < 0.
It will be shown below that the function U (·) defined in (5.13) is the
renewal function EN (t) as in Definition 8.5.2.
Proof: For any function b ∈ B 0 and any nondecreasing right continuous
function G : [0, ∞) → R, let
%
b(t − u)dG(u).
(b ∗ G)(t) ≡
[0,t]
Then since F (0) = 0, the equation (5.11) can be rewritten as
a = b + a ∗ F.
(5.14)
Let {Xi }i≥1 be iid random variables with cdf F . Then
$n it is easy to verify
that F (n) (t) = P (Sn ≤ t), where S0 = 0, and Sn = i=1 Xi for n ≥ 1. Let
8.5 Renewal theory
267
{N (t) : t ≥ 0} be as defined by (5.1). Then, for t ∈ (0, ∞),
EN (t) =
∞
∞
∞
)
)
!
" )
P N (t) ≥ j =
P (Sj−1 ≤ t) =
F (n) (t) = U (t).
j=1
n=0
j=1
By Proposition 8.5.1, U (t) < ∞ for all t > 0 and is nondecreasing.
Since b ∈ B 0 for each 0 < t < ∞, a0 defined by (5.12) is well-defined. By
definition a0 = b ∗ U and by (5.13), a0 satisfies (5.14) and hence (5.11). If
a1 and a2 from B 0 are two solutions to (5.14) then ã ≡ a1 − a2 satisfies
ã = ã ∗ F
and hence
This implies
ã = ã ∗ F (n)
for all n ≥ 1.
M (t) ≡ sup{|ã(u)| : 0 ≤ u ≤ t} ≤ M (t)F (n) (t).
But F (n) (t) → 0 as n → ∞. Hence |ã| = 0 on (0, t] for each t. Thus
!
a0 = b ∗ U is the unique solution to (5.11).
The discrete or arithmetic analog of the renewal equation (5.11) is as
follows. Let {Xi }i≥1 be iid positive integer valued
$n random variables with
1. Let un =
distribution {pj }j≥1 . Let S0 = 0, and Sn = i=1 Xi for n ≥$
n
P (Sj = n for some j ≥ 0). Then, u0 = 1 and un satisfies un = j=1 pj un−j
for n ≥ 1. For any sequence {bj }j≥0 , the equation
an = bn +
n
)
an−j pj , n = 0, 1, 2, . . .
(5.15)
j=1
is called the discrete renewal equation. As in the general case, it can be
shown (Problem 8.17 (a)) that the unique solution to (5.15) is given by
an =
n
)
bn−j uj .
(5.16)
j=0
The following convergence results are easy to establish from Theorem 8.5.11
(Problem 8.17 (b)).
Theorem 8.5.13: (The key renewal theorem, discrete
$∞ case). Let {pj }j≥1
be aperiodic, i.e., g.c.d. {k : pk > 0} = 1 and µ ≡ j=1 jpj ∈ (0, ∞). Let
renewal sequence associated with {pj }j≥1 . That
{un }n≥0 be
$the
$∞ is, u0 = 1
n
and un = j=1 pj un−j for n ≥ 1. Let {bj }j≥0 be such that j=1 |bj | < ∞.
Let {an }n≥0 satisfy a0 = b0 and
an = bn +
∞
)
j=1
an−j pj n ≥ 1.
(5.17)
268
8. Laws of Large Numbers
Then an =
$∞
j=0 bj un−j , n ≥ 0 and lim an =
n→∞
∞
1)
bj .
µ j=0
The nonarithmetic analog of the above is as follows.
Definition 8.5.7: A function $
b(·) ∈ B 0 is directly Riemann integrable (dri)
∞
on [0, ∞) iff (i) $
for all h > 0, n=0 sup{|b(u)|
: nh ≤ u ≤ (n + 1)h} < ∞,
"
∞
and (ii) limh→0 n=0 h(mn (h) − mn (h) = 0 where
mn (h)
=
mn (h)
=
sup{b(u) : nh ≤ u ≤ (n + 1)h}
inf{b(u) : nh ≤ u ≤ (n + 1)h}.
Theorem 8.5.14: (The key renewal theorem, nonarithmetic case).
Let
F (·) be a nonarithmetic $
distribution with F (0) = 0 and µ =
&
∞
(n)
udF
(u)
<
∞.
Let
U
(·)
=
(·) be the renewal function asson=0 F
[0,∞)
ciated with F . Let b(·) ∈ B 0 be directly Riemann integrable.
Then the unique solution to the renewal equation
a=b+a∗F
(5.18)
is given by a = b ∗ U and
lim a(t) =
t→∞
where c(b) ≡ lim
h→0
∞
)
c(b)
µ
(5.19)
hmn (h).
n=0
Remark 8.5.2: A sufficient condition for b(·) to be dri is that it is Riemann integrable on bounded intervals and that there exists a nonincreasing
integrable function h(·) on [0, ∞) and a constant C such that |b(·)| ≤ Ch(·)
(Problem 8.18 (b)).
8.5.5
Applications
Here are two important applications of the above two theorems to a class
of stochastic processes known as regenerative processes.
Definition 8.5.8:
(a) A sequence of random variables {Yn }n≥0 is called regenerative if there
exists a renewal sequence {T!j }j≥0 such that the random cycles"and
cycle length variables ηj = {Yi : Tj ≤ i < Tj+1 }, Tj+1 − Tj for
j = 0, 1, 2, . . . are iid.
(b) A stochastic process {Y (t) : t ≥ 0} is called regenerative if there
exists a renewal sequence {Tj }j≥0 such that the random cycles and
8.5 Renewal theory
269
cycle length variables ηj ≡ {Y (t) : Tj ≤ t < Tj+1 , Tj+1 − Tj } for
j = 0, 1, 2, . . . are iid.
(c) In both (a) and (b), the sequence {Tj }j≥0 are called the regeneration
times.
Example 8.5.2: Let {Yn }n≥0 be a countable state space Markov chain
(see Chapter 14) that is irreducible and recurrent. Fix a state ∆. Let
T0
=
min{n : n > 0, Yn = ∆}
Tj+1
=
min{n : n > Tj , Yn = ∆}, n ≥ 0.
Then {Yn }n≥0 is regenerative (Problem 8.19).
Example 8.5.3: Let {Y (t) : t ≥ 0} be a continuous time Markov chain (see
Chapter 14) with a countable state space that is irreducible and recurrent.
Fix a state ∆. Let
T0
Tj+1
=
inf{t : t > 0, Y (t) = ∆}
=
inf{t : t > Tj , Y (t) = ∆}.
Then {Y (t) : t ≥ 0} is regenerative (Problem 8.19).
Theorem 8.5.15: Let {Yn }n≥0 be a regenerative sequence of random variables with some state space (S, S) where S is a σ-algebra on S with regeneration times {Tj }j≥0 . Let f : S → R be bounded and 1S, B(R)2-measurable.
Let
≡ Ef (Yn+T0 ),
an
bn
≡ Ef (YT0 +n )I(T1 > T0 + n).
(5.20)
Let µ = E(T1 − T0 ) ∈ (0, ∞) and g.c.d. {j : pj ≡ P (T1 − T0 = j) > 0} = 1.
Then
%
(i)
an → f (y)π(dy)
where π(A) ≡
(ii) In particular,
1
µ
E
'$
S
T1 −1
j=T0 IA (Yj )
(
, A ∈ S.
7P (Yn ∈ ·) − π(·)7 → 0
as
n → ∞,
(5.21)
where 7 · 7 denotes the total variation norm.
Proof: By the regenerative property, {an }n≥1 satisfies the renewal equation
n
)
an = bn +
an−j pj
j=0
270
8. Laws of Large Numbers
and $
hence, part (i) of the theorem follows from Theorem 8.5.13 and the
∞
fact n=0 bn = µπ(A).
prove (ii) note that ãn ≡ Ef (Yn ) = E(f (Yn )I(T0 > n)) +
$To
n
a
j=0 n−j P (T0 = j) and by DCT limn→∞ ãn = limn→∞ an .
It is not difficult to show that for any two probability measures µ and ν
on (S, S), the total variation norm
%
J
4
3J %
J
J
7µ − ν7 = sup J f dµ − f dν J : f ∈ B(S, R)
where B(S, R) = {f : f : S → R, F measurable, sup{|f (s)| : s ∈ S} ≤ 1}
(Problem 4.10 (b)). Thus,
7P (Yn+T0 ∈ ·) − π(·)7
%
J
3J
4
J
J
≤ sup JEf (Yn0 +T ) − f dπ J : f ∈ B(S, R) .
(5.22)
Now, for any f ∈ B(S, R) and any integer K ≥ 1, from Theorem 8.5.13,
%
J
J
J
J
JEf (Yn0 +T ) − f dπ J
≤
J
1 JJ
J
bj Jun−j − J + 2
µ
j=0
K
)
∞
)
j=(K+1)
P (T1 − T0 > j) ≡ δn , say (5.23)
where {bj } is defined in (5.20). Since E(T1 − T0 ) < ∞, given # > 0, there
exists a K such that
∞
)
j=(K+1)
By Theorem (8.5.11), un →
(5.22), (ii) follows.
P (T1 − T0 > j) < #/2.
1
µ.
Thus, in (5.23), lim δn ≤ # and so from
!
Theorem 8.5.16: Let {Y (t) : t ≥ 0} be a regenerative stochastic process
with state space (S, S) where S is a σ-algebra on S. Let f : S → R be
bounded and 1S, B(R)2-measurable. Let
=
a(t)
b(t) ≡
Ef (YT0 +t ), t ≥ 0,
Ef (YT0 +t )I(T1 > T0 + t), t ≥ 0.
Let µ = E(T1 − T0 ) ∈ (0, ∞) and the distribution of T1 − T0 be nonarithmetic. Then
%
(i)
a(t) → f (y)π(dy)
where π(A) =
1
µ
E
'&
T
T0
S
(
IA (Y (u))du , A ∈ S.
8.6 Ergodic theorems
271
(ii) In particular,
7P (Yt ∈ ·) − π(·)7 → 0
as
t→∞
(5.24)
where 7 · 7 is the total variation norm.
The proof of this is similar to that of the previous theorem but uses
Theorem 8.5.14.
!
8.6 Ergodic theorems
8.6.1
Basic definitions and examples
The law of large numbers proved in Section 8.2 states that if {Xi }i≥1
are pairwise independent and identically distributed and if h(·) is a Borel
measurable function, then
the time average, i.e.,
n
1 )
h(Xi )
n i=1
→ Eh(X1 ), i.e., space average w.p. 1
(6.1)
as n → ∞, provided E|h(X1 )| < ∞.
The goal of this section is to investigate how far the independence assumption can be relaxed.
Definition 8.6.1: (Stationary sequences). A sequence of random variables
{Xi }i≥1 on a probability space (Ω, F, P ) is called strictly stationary if for
each k ≥ 1 the joint distribution of (Xi+j : j = 1, 2, . . . , k) is the same for
all i ≥ 0.
Example 8.6.1: {Xi }i≥1 iid.
Example 8.6.2: Let {Xi }i≥1 be iid. Fix 1 ≤ 2 < ∞. Let h : R, → R be
a Borel function and Yi = h(Xi , Xi+1 , . . . , Xi+,−1 ), i ≥ 1. Then {Yi }i≥1 is
strictly stationary.
Example 8.6.3: Let {Xi }i≥1 be a Markov chain with a stationary distribution π. If X1 ∼ π then {Xi }i≥1 is strictly stationary (see Chapter
14).
It will be shown that if {Xi }i≥1 is a strictly stationary sequence that is
not a mixture of two other strictly stationary sequences, then (6.1) holds.
This is known as the ergodic theorem (Theorem 8.6.1 below).
Definition 8.6.2: (Measure preserving transformations). Let (Ω, F, P )
be a probability space and T : Ω → Ω be 1F, F 2 measurable. Then, T is
272
8. Laws of Large Numbers
called P -preserving (or simply measure preserving on (Ω, F, P )) if for all
A ∈ F, P (T −1 (A)) = P (A). That is, the random point T (ω) has the same
distribution as ω.
Let X be a real valued random variable on (Ω, F, P ). Let Xi ≡
X(T (i−1) (ω)) where T (0) (ω) = ω, T (i) (ω) = T (T (i−1) (ω)), i ≥ 1. Then
{Xi }i≥1 is a strictly stationary sequence.
It turns out that every strictly stationary sequence arises this way. Let
{Xi }i≥1 be a strictly stationary sequence defined on some probability space
(Ω, !F, P ). Let P̃ be the probability
measure induced by X̃ ≡ {Xi (ω)}i≥1
"
∞
∞
on Ω̃ ≡ R , F̃ ≡ B(R ) where R∞ is the space of all sequences of
real numbers and B(R∞ ) is the σ-algebra generated by finite dimensional
cylinder sets of the form {x : (xj : j = 1, 2, . . . , k) ∈ Ak }, 1 ≤ k < ∞, Ak ∈
B(Rk ).! Let T ": R∞ → R∞ be the unilateral (one sided) shift to the right,
i.e., T (xi )i≥1 = (xi )i≥2 . Then T is measure preserving on (Ω̃, F̃, P̃ ). Let
Y1 (ω̃) = x1 , and Yi (ω̃) = Y1 (T i−1 ω̃) = xi for i ≥ 2 if ω̃ = (x1 , x2 , x3 , . . .).
Then {Yi }i≥1 is a strictly stationary sequence on (Ω̃, F̃, P̃ ) and has the
same distribution as {Xi }i≥1 .
Example 8.6.4: Let Ω = [0, 1], F = B([0, 1]), P = Lebesgue measure.
Let T ω ≡ 2ω mod 1, i.e.,


2ω
if 0 ≤ ω < 12
2ω − 1 if 12 ≤ ω < 1
Tω =

0
ω = 1.
Then T is measure preserving since P ({ω : a < T ω < b}) = (b − a) for all
0 < a < b < 1 (Problem 8.20).
This example is an equivalent version of the iid sequence {δi }i≥1 of
$∞
be the biBernoulli (1/2) random variables. To see this, let ω = i=1 δi2(ω)
i
nary expansion of ω. Then {δi }i≥1 is iid Bernoulli (1/2) and T ω = 2ω mod
$∞
(ω)
(cf. Problem 7.4). Thus T corresponds with the unilateral
1 = i=2 δ2ii−1
shift to right on the iid sequence {δi }i≥1 . For this reason, T is called the
Bernoulli shift.
Example 8.6.5: (Rotation). Let Ω = {(x, y) : x2 + y 2 = 1} be !the unit
circle. Fix θ0 in" [0, 2π). If ω = (cos θ, sin θ), θ in [0, 2π) set T ω = cos(θ +
θ0 ), sin(θ + θ0 ) . That is, T rotates any point ω on Ω counterclockwise
through an angle θ0 . Then T is measure preserving w.r.t. the Uniform
distribution on [0, 2π].
Definition 8.6.3: Let (Ω, F, P ) be a probability space and T : Ω → Ω
be a 1F, F 2 measurable map. A set A ∈ F is T-invariant if A = T −1 A.
A set A ∈ F is almost T -invariant w.r.t. P if P (A + T −1 A) = 0 where
A1 + A2 = (A1 ∩ Ac2 ) ∪ (Ac1 ∩ A2 ) is the symmetric difference of A1 and A2 .
8.6 Ergodic theorems
273
It can be shown that A is almost T -invariant w.r.t. P iff there exists a
set A$ that is T -invariant and P (A + A$ ) = 0 (Problem 8.21).
Examples of T -invariant sets: are A1$
= {ω : T j ω ∈ A0 for infinitely many
;
n
j
i ≥ 1} where A0 ∈ F; A2 = ω : n1
j=1 h(T ω) converges as n → ∞
where h : Ω → R is a F measurable function.
On the" other hand, the event
!
{x : x1 ≤ 0} is not shift invariant in R∞ , B(R∞ ) nor is it almost shift
invariant if P̃ corresponds to the iid case with a nondegenerate distribution.
The collection I of T -invariant sets is a σ-algebra and is called the invariant σ-algebra. A function h : Ω → R is I-measurable iff h(ω) = h(T ω)
for all ω (Problem 8.22).
Definition 8.6.4: A measure preserving transformation T on a probability
space (Ω, F, P ) is ergodic or irreducible (w.r.t. P ) if A is T -invariant implies
P (A) = 0 or 1.
Definition 8.6.5: A stationary sequence of random variables {Xi }i≥1
is ergodic if the unilateral shift T is ergodic on the sequence space
(R∞ , B(R∞ ), P̃ ) where P̃ is the measure on R∞ induced by {Xi }i≥1 .
Example 8.6.6: Consider the above sequence space. Then A ∈ F̃ is invariant with respect
+∞ to the unilateral shift implies that A is in the tail
σ-algebra T ≡ n=1 σ(X̃j (ω), j ≥ n) (Problem 8.23). If {Xi }i≥1 are independent then by the Kolmogorov’s zero-one law, A ∈ T implies P (A) = 0
or 1. Thus, if {Xi }i≥1 are iid then it is ergodic.
On the other hand, mixtures of iid sequences are not ergodic as seen
below.
Example 8.6.7: Let {Xi }i≥1 and {Yi }i≥1 be two iid sequences with different distributions. Let δ be Bernoulli (p), 0 < p < 1 and independent of
both {Xi }i≥1 and {Yi }i≥1 . Let Zi ≡ δXi + (1 − δ)Yi , i ≥ 1. Then {Zi }i≥1
is a stationary sequence and is not ergodic (Problem 8.24).
The above example can be extended to mixtures of irreducible positive
recurrent discrete state space Markov chains (Problem 8.25 (a)). Another
example is Example 8.6.5, i.e., rotation of the circle when θ is rational
(Problem 8.25 (b)).
Remark 8.6.1: There is a simple example of a measure preserving transformation T that is ergodic but T 2 is not. Let Ω = {ω1 , ω2 }, ω1 )= ω2 . Let
T ω1 = ω2 , T ω2 = ω1 , P be the distribution P ({ω1 }) = P ({ω2 }) = 12 . Then
T is ergodic but T 2 is not (Problem 8.26).
274
8. Laws of Large Numbers
8.6.2
Birkhoff ’s ergodic theorem
Theorem 8.6.1: Let (Ω, F, P ) be a probability space, T : Ω → Ω be a
measure preserving ergodic map on (Ω, F, P ) and X ∈ L1 (Ω, F, P ). Then
%
n−1
1)
X(T j ω) → EX ≡
XdP
n j=0
Ω
(6.2)
w.p. 1 and in L1 as n → ∞.
Remark 8.6.2: A more general version is without the assumption of
T being ergodic. In this case, the right side of (6.2) is a random variable
Y (ω) that
is T -invariant, i.e., Y (ω) = Y (T (ω)) w.p. 1 and satisfies
&
&
XdP
=
Y
dP
for all T -invariant sets A. This Y is called the condiA
A
tional expectation of X given I, the σ-algebra of invariant sets (Chapter
13).
For a proof of this version, see Durrett (2004).
The proof of Theorem 8.6.1 depends on the following inequality.
Lemma 8.6.2: (Maximal ergodic inequality). Let T be measure preserving
on a probability space (Ω, F, P ) and X ∈ L1 (Ω, F, P ). Let S0 (ω) = 0,
$n−1
Sn (ω) = j=0 X(T j ω), n ≥ 1, Mn (ω) = max{Sj (ω) : 0 ≤ j ≤ n}. Then
""
!
!
E X(ω)I Mn (ω) > 0 ≥ 0.
Proof: By definition of Mn (ω), Sj (ω) ≤ Mn (ω) for 1 ≤ j ≤ n. Thus
X(ω) + Mn (T ω) ≥ X(ω) + Sj (T ω) = Sj+1 (ω).
Also, since Mn (T ω) ≥ 0,
X(ω) ≥ X(ω) − Mn (T ω) = S1 (ω) − Mn (T ω).
;
:
Thus X(ω) ≥ max Sj (ω) : 1 ≤: j ≤ n − Mn (T ω). ;For ω such
that Mn (ω) > 0, Mn (ω) = max Sj (ω) : 1 ≤ j ≤ n and hence
X(ω) ≥ Mn (ω) − Mn (T ω). Also, since X ∈ L1 (Ω, F, P ) it follows that
Mn ∈ L1 (Ω, F, P ) for all n ≥ 1. Taking expectations yields
""
!
!
E X(ω)I Mn (ω) > 0
!
!
""
≥ E Mn (ω) − Mn (T ω)I Mn (ω) > 0
!
""
!
≥ E Mn (ω) − Mn (T ω)I Mn (ω) ≥ 0 (since Mn (T ω) ≥ 0)
"
!
= E Mn (ω) − Mn (T ω) = 0,
since T is measure preserving.
!
8.6 Ergodic theorems
275
Remark 8.6.3: Note that the measure preserving property of T is used
only at the last step.
Proof of Theorem 8.6.1: W.l.o.g. assume that EX = 0. Let Z(ω) ≡
lim supn→∞ Snn(ω) . Fix # > 0 and set A$ ≡ {ω : Z(ω) > #}. It will be shown
that P (Aε ) = 0. Clearly, A$ is T invariant. Since T is ergodic, P (A$ ) = 0 or
1. Suppose P (A$ ) = 1. Let Y (ω) = X(ω)−#. Let Mn,Y (ω) ≡ max{Sj,Y (ω) :
$j−1
0 ≤ j ≤ n} where S0,Y (ω) ≡ 0, Sj,Y (ω) ≡ k=0 Y (T k ω), j ≥ 1. Then by
Lemma 8.6.2 applied to Y (ω)
""
!
!
E Y (ω)I Mn,Y (ω) > 0 ≥ 0.
But Bn ≡ {ω : Mn,Y (ω) > 0} = {ω : sup1≤j≤n 1j Sj,Y (ω) > 0}. Clearly,
Bn ↑ B ≡ {ω : sup1≤j<∞ 1j Sj,Y (ω) > 0}. Since 1j Sj,Y (ω) = 1j Sj (ω) − #
for j ≥ 1, B ⊃ A$ and since P (A$ ) = 1, it follows that P (B) = 1. Also
|Y | ≤ |X| + # ∈ L1 (Ω, F, P ). So by the bounded convergence theorem,
0 ≤ E(Y IBn ) → E(Y IB ) = EY = 0 − # < 0, which is a contradiction. Thus P (A$ ) = 0. This being true for every # > 0 it follows that
P (limn→∞ Snn(ω) ≤ 0) = 1. Applying this to −X(ω) yields
P
'
(
Sn (ω)
≥0 =1
n
n→∞
lim
"
!
and hence P limn→∞ Snn(ω) = 0 = 1.
To prove L1 -convergence, note that applying the above to X + and X −
yields
n
1) + i
fn (ω) ≡
X (T ω) → EX + (ω) w.p. 1.
n i=1
&
+
=
EX
Since T is measure preserving
fn (ω)dP
& (ω) for
all n. So by Scheffe’sJ theorem
(Lemma 8.2.5),
|fn (ω) −
J
+
J 1 $n X + (T i ω) − EX + J → 0. Similarly,
(ω)|dP
→
0,
i.e.,
E
EX
i=1
n
J $n
J
E J n1 i=1 X − (T i ω) − EX − J → 0. This yields L1 convergence.
!
Corollary 8.6.3: Let {Xi }i≥1 be a stationary ergodic sequence of Rk
valued random variables on some probability space (Ω, F, P ). Let h : Rk →
R be Borel measurable and let E|h(X1 , X2 , . . . , Xk )| < ∞. Then
n
1)
h(Xi , Xi+1 , . . . , Xi+k−1 ) → Eh(X1 , X2 , . . . , Xk )
n i=1
w.p. 1.
"
!
Proof: Consider the probability space Ω̃ = (Rk )∞ , F̃ ≡ B (Rk )∞ and
P̃ the probability measure induced by the map ω → (Xi (ω))i≥1 and the
unilateral shift map T̃ on Ω̃ defined by T̃ (xi )i≥1 = (xi )i≥2 . Then T̃ is
276
8. Laws of Large Numbers
measure preserving and ergodic. So the corollary follows from Theorem
8.6.1.
!
Remark 8.6.4: This corollary is useful in statistical time series analysis.
If {Xi }i≥1 is a real valued stationary ergodic sequence, then the mean m ≡
EX1 , variance Var(X1 ), and covariance Cov(X1 , X2 ) can all be estimated
consistently by the corresponding sample functions
- )
.2
n
n
1
1) 2
Xi −
Xi ,
n i=1
n i=1
.2
n
n
1)
1)
Xi Xi+1 −
Xi .
n i=1
n i=1
n
1)
Xi ,
n i=1
and
Further, the joint distribution of (X1 , X2 , . . . , Xk ) for any k ≥ 1, can
be estimated consistently
the corresponding empirical measure, i.e.,
$by
n
Ln (A1 , A2 , . . . , Ak ) ≡ n1 i=1 I(Xi+k ∈ Ak , j = 1, 2, . . . , k), which converges to
P (X1 ∈ A1 , X2 ∈ A2 , . . . , Xk ∈ Ak ) w.p. 1
where Ai ∈ B(R), i = 1, 2, . . . , k.
The next three results (Theorems 8.6.4–8.6.6) are consequences and extensions of the ergodic theorem, Theorem 8.6.1. For proofs, see Durrett
(2004).
The first one is the following result on the behavior of the log-likelihood
function of a stationary ergodic sequence of random variables with a finite
range.
Theorem 8.6.4: (Shannon-McMillan-Breiman theorem). Let {Xi }i≥1 be
a stationary ergodic sequence of random variables with values in a finite
set S ≡ {a1 , a2 , . . . , ak }. For each n, x1 , x2 , . . . , xn in S, let
p(xn | xn−1 , xn−2 , . . . , x1 ) = P (Xn = xn | Xj = xj , 1 ≤ j ≤ n − 1)
P (Xj = xj : 1 ≤ j ≤ n)
≡
P (Xj = xj : 1 ≤ j ≤ n − 1)
whenever the denominator is positive and let p(x1 , x2 , . . . , xn ) = P (X1 =
x1 , X2 = x2 , . . . , Xn = xn ). Then
1
log p(X1 , X2 , . . . , Xn ) = −H exists w.p. 1
n
!
"
where H ≡ limn→∞ E − log p(Xn | Xn−1 , Xn−2 , . . . , X1 ) is called the
entropy rate of {Xi }i≥1 .
lim
n→∞
Remark 8.6.5: In the iid case this is a consequence
of the strong law of
$k
large numbers, and H can be identified as j=1 (− log pj )pj where pj =
8.6 Ergodic theorems
277
P (X1 = aj ), 1 ≤ j ≤ k. This is called the Kolmogorov-Shannon entropy of
the distribution {pj : 1 ≤ j ≤ k}.
If {Xi }i≥1 is a stationary ergodic Markov chain, then again it is a consequence of the strong law of large numbers, and H can be identified with
k
k
)
" )
!
πi
(− log pij )pij
E − log p(X2 | X1 ) =
i=1
j=1
!
"
where π ≡ {πi : 1 ≤ i ≤ k} is the stationary distribution and P ≡ (pij ) is
the transition probability matrix of the Markov chain {Xi }i≥1 . See Problem
8.27.
A more general version of the ergodic Theorem 8.6.1 is the following.
Theorem 8.6.5: (Kingman’s subadditive ergodic theorem). Let {Xm,n :
0 ≤ m < n}n≥1 be a collection of random variables such that
(i) X0,m + Xm,n ≥ X0,n for all 0 ≤ m < n, n ≥ 1.
(ii) For all k ≥ 1, {Xnk,(n+1)k }n≥1 is a stationary sequence.
(iii) The sequence {Xm,m+k , k ≥ 1} has a distribution that does not depend on m ≥ 0.
+
(iv) EX0,1
< ∞ and for all n,
EX0,n
n
≥ γ0 , where γ0 > −∞.
Then
(i) lim
n→∞
(ii) lim
n→∞
EX0,n
n
X0,n
n
= inf
n≥1
EX0,n
n
≡ γ.
≡ X exists w.p. 1 and in L1 , and EX = γ.
(iii) If {Xnk,(n+1)k }n≥1 is ergodic for each k ≥ 1, then X ≡ γ w.p. 1.
A nice application of this is a result on products of random matrices.
Theorem 8.6.6: Let {Ai }i≥1 be a stationary sequence of k × k random
matrices with nonnegative entries. Let αm,n (i, j) be the (i, j)th entry in
Am+1 , · · · , An . Suppose E| log α1,2 (i, j)| < ∞ for all i, j. Then
1
n→∞ n
(i) lim
log α0,n (i, j) = η exists w.p. 1.
1
n→∞ n
log 7Am+1 · · · , An 7 = η w.p. 1, where for any k×k
: $k
;
matrix B ≡ ((bij )), 7B7 = max
j=1 |bij | : 1 ≤ i ≤ k .
(ii) For any m, lim
278
8. Laws of Large Numbers
Remark 8.6.6: A concept related to ergodicity is that of mixing. A measure preserving transformation T on a probability space (Ω, F, P ) is mixing
if for all A, B ∈ B
J
J
lim JP (A ∩ T −n B) − P (A)P (T −n B)J = 0.
n→∞
A stationary sequence of random variables {Xi }i≥1 is mixing if the unilateral shift on the sequence space R∞ induced by {Xi }i≥1 is mixing. If T is
mixing and A is T -invariant, then taking B = A in the above yields
P (A) = P 2 (A)
i.e., P (A) = 0 or 1. Thus, if T is mixing, then T is ergodic. Conversely, if
T is ergodic, then by Theorem 8.6.1, for any B in B
n
1)
IB (T j ω) → P (B) w.p. 1.
n j=1
$n
Integrating both sides over A w.r.t. P yields n1 j=1 P (A ∩ T −j B) →
P (A)P (B), i.e., T is mixing in an average sense, i.e., the Cesaro sense. A
sufficient condition for a stationary sequence to be mixing is that the tail
σ-algebra be trivial. If {Xi }i≥1 is a stationary irreducible Markov chain
with a countable state space, then it is mixing iff it is aperiodic.
For proofs of the above results, see Durrett (2004).
8.7 Law of the iterated logarithm
2
Let {Xn }n≥1 be a sequence of iid random variables with
$nEX1 = 0, EX1 =
1
1. The SLLN asserts that the sample mean X̄n = n i=1 Xi → 0 w.p. 1.
The central limit theorem
(to be proved later) asserts that for all −∞ <
√
a < b < ∞, P (a ≤ nX̄n ≤ b) → Φ(b)
$n− Φ(a) where Φ(·) is the standard
√
Normal cdf. This suggests that Sn = i=1 Xi is of the order magnitude n
Sn
get as a function
for large n. This raises the question of how large does √
n
√
of n. It turns out that it is of the order 2n log log n. More precisely, the
following holds:
Theorem 8.7.1: (Law of the iterated logarithm). Let {Xi (ω)}i≥1 be iid
random variables on a probability space
$n(Ω, F, P ) with mean zero and variance one. Let S0 (ω) = 0, Sn (ω) =
i=1 Xi (ω),
3
4 n ≥ 1. For each ω, let
A(ω) be the set of limit points of
[−1, +1]} = 1.
For a proof, see Durrett (2004).
S (ω)
√ n
2n log log n
n≥1
. Then P {ω : A(ω) =
8.8 Problems
279
A deep generalization of the above was obtained by Strassen (1964).
Theorem 8.7.2: Under the setup of Theorem 8.7.1, the following holds:
Sj (ω)
, j = 0, 1, 2, . . . , n and Yn (t, ω) be the function
Let Yn ( nj ; ω) = √2n log
log n
obtained by linearly interpolating the above values on [0, 1]. For each ω, let
B(ω) be the set of limit points of {Yn (·, ω)}n≥1 in the function space C[0, 1]
of all continuous functions on [0, 1] with the supnorm. Then
P {ω : B(ω) = K} = 1
:
where K ≡ f : f : [0, 1] → R, f is continuously differentiable, f (0) = 0
;
&1
and 12 0 (f $ (t))2 dt ≤ 1 .
8.8 Problems
8.1 Prove Theorem 8.1.3 and Corollary 8.1.4.
(Hint: Use Chebychev’s inequality.)
8.2 Let {Xn }n≥1 be a sequence of random variables on a probability
space (Ω, F, P ) such that for some m ∈ N and for each i = 1, . . . , m,
{Xi , Xi+m , Xi+2m , . . .} are identically distributed and pairwise independent. Furthermore, suppose that E(|X1 | + · · · + |Xm |) < ∞. Show
that
m
1 )
X n −→
EXi , w.p. 1.
m i=1
(Hint: Reduce the problem to nonnegative Xn ’s and apply Theorem
8.2.7 for each i = 1, . . . , m.)
8.3 Let f be a bounded measurable function
on [0,1] that is continuous
& 1 ' x1 +x2 +···+xn (
&1&1
1
dx1 dx2 . . . dxn .
at 2 . Evaluate lim 0 0 · · · 0 f
n
n→∞
8.4 Show that if P (|X| > α) < 12 for some real number α, then any
median of X must lie in the interval [−α, α].
8.5 Prove Theorem 8.3.4 using Kolmogorov’s first inequality (Theorem
8.3.1 (a)).
(Hint: Apply Theorem 8.3.1 to ∆n,k defined in the proof of Theorem
8.3.3 to establish (3.4).)
8.6 Let {Xn }n≥1 be a sequence of iid random variables with E|X1 |α < ∞
for some α > 0. Derive a necessary and sufficient
condition on α
$∞
for almost sure convergence of the series
X
sin 2πnt for all
n
n=1
t ∈ (0, 1).
280
8. Laws of Large Numbers
8.7 Show that for any given sequence of random variables {Xn }n≥1 , there
n
exists a sequence of real numbers {an }n≥1 ⊂ (0, ∞) such that X
an → 0
w.p. 1.
8.8 Let {Xn }n≥1 be a sequence of independent random variables with
P (Xn = 2) = P (Xn = nβ ) = an , P (Xn = an ) = 1 − 2an
1
for some
n ∈ (0, 3 ) and β ∈ R. Show that
$a∞
only if n=1 an < ∞.
$∞
n=1
Xn converges if and
8.9 Let {Xn }n≥1 be a sequence of iid random variables with E|X1 |p = ∞
$n
for some p ∈ (0, 2). Then P (lim sup |n−1/p i=1 Xi | = ∞) = 1.
n→∞
8.10 $
For any random variable X and any r ∈ (0, ∞), E|X|r < ∞ iff
∞
r−1
(log n)r P (|X| > n log n) < ∞.
n=1 n
$m
(Hint: Check that n=1 nr−1 (log n)r ∼ r−1 mr (log m)r as m → ∞.)
8.11 Let {Xn }n≥1 be a sequence$of independent random variables with
n
EXn = 0, EXn2 = σn2 , s2n = j=1 σj2 → ∞. Then, show that for any
a > 12 ,
n
)
2 −a
Xi → 0 w.p. 1.
s−2
n (log sn )
i=1
8.12 Show that for p ∈ (0, 2), p )= 1, (4.12) holds.
$∞
$∞
1/p
| ≤
(Hint: For p ∈ (1, 2),
n=1 |EZn /n
n=1 E|X1 |I(|X1 | >
$
$
∞
j
−1/p
p
n
·
E|X
|I(j
<
|X
≤ j + 1) ≤
n)n−1/p =
1
1|
j=1
n=1
$
∞
p
p
1/p
< ∞, by (4.10). For p ∈ (0, 1),
| ≤
n=1 |EZn /n
p−1 E|X1 |
$
∞ $∞
1
−1/p
p
p
(
n
)E|X
|I(j
−
1
<
|X
|
≤
j)
≤
E|X
|
,
by
1
1
1
j=1
n=j
1−p
(4.9).)
8.13 Let Yi = xi β + #i , i ≥ 1 where {#n }n≥1 is a sequence of iid random
vectors, {xn }n≥1 is a sequence of constants, and β ∈ R is a constant
$n
$n
2
(the regression parameter). Let β̂n = i=1
$nxi Yi /2 i=1 xi denote (the
−1
least squares) estimator of β. Let n
i=1 xi → c ∈ (0, ∞) and
E#1 = 0.
(a) If E|#1 |1+δ < ∞ for some δ ∈ (0, ∞), then show that
β̂n −→ β
as
n → ∞,
w.p. 1.
(8.1)
(b) Suppose sup{|xi | : i ≥ 1} < ∞ and E|#1 | < ∞. Show that (8.1)
holds.
8.8 Problems
281
8.14 (Strongly consistent estimation.) Let {Xi }i≥1 be random variables on
some probability space (Ω, F, P ) such that (i) for some integer m ≥ 1
the collections {Xi : i ≤ n} and {Xi : i ≥ n + m} are independent
for each n ≥ 1, and (ii) the distribution of {Xi+j ; 0 ≤ j ≤ k} is
independent of i, for all k ≥ 0.
(a) Show that for every 2 ≥ 1 and h : R, → R with
E|h(X1 , X2 , . . . , X, )| < ∞, there are functions {fn : Rn →
R}n≥1 such that fn (X1 , X2 , . . . , Xn ) → λ ≡ Eh(X1 , X2 , . . . , X, )
w.p. 1. In this case, one says λ is estimable from {Xi }i≥1 in a
strongly consistent manner.
(b) Now suppose the distribution µ(·) of X1 is a mixture of the
$k
form µ =
i=1 αi µi . Suppose there exist disjoint Borel sets
{Ai }1≤i≤k in R such that µ&i (Ai ) = 1 for each i. Show that
all the αi ’s as well as λi ≡ hi (x)dµi where hi ∈ L1 (µi ) are
estimable from {Xi }i≥1 in a strongly consistent manner.
8.15 (Normal numbers). Recall that in Section 4.5 it was shown that for
any positive integer p > 1 and for any 0 ≤ ω ≤ 1, it is possible to
write ω as
∞
)
Xi (ω)
ω=
(8.2)
pi
i=1
where for each i, Xi (ω) ∈ {0, 1, 2, . . . , p − 1}. Recall also that such an
expansion is unique except for ω of the form q/pn , q = 1, 2, . . . , pn −1,
n ≥ 1 in which case there are exactly two expansions, one of which is
recurring. In what follows, for such ω’s the recurrent expansion will
be the one used in (8.2). A number ω in [0,1] is called normal w.r.t.
the integer p if for every finite pattern a1 a2 . . . ak where k ≥ 1 is a
positive integer
$nand ai ∈ {0, 1, 2, . . . , p − 1} for 1 ≤ i ≤ k the relative
frequency n1 i=1 δi (ω) where
δi (ω) =
/
1 if Xi+j (ω) = aj+1 , j = 0, 1, 2, . . . , k − 1
0
otherwise
converges to p−k as n → ∞. A number ω in [0,1] is called absolutely
normal if it is normal w.r.t. p for every integer p > 1. Show that
the set A of all numbers ω in [0,1] that are absolutely normal has
Lebesgue measure one.
(Hint: Note that in (8.2), the function {Xi (ω)}i≥1 are iid random
variables. Now use Problem 8.14 repeatedly.)
8.16 Show that for the renewal sequence {Sn }∞
n=0 , if P (X1 > 0) > 0, then
lim Sn = ∞ w.p. 1.
n→∞
282
8. Laws of Large Numbers
8.17 (a) Show that {an }n≥0 of (5.16) is the unique solution to (5.15) by
using generating functions (cf. Section 5.5).
(b) Deduce Theorems 8.5.13 and 8.5.14 from Theorems 8.5.11 and
8.5.12, respectively.
(Hint: For Theorems 8.5.13 use the DCT , and for Theorem
8.5.14, show first that
k
)
!
"
mn (h) U ((n + 1)h) − U (nh)
k
)
!
"
mn (h) U ((n + 1)h) − U (nh) .)
n=0
≤ a(kh)
≤
n=0
8.18 (a) Let b(·) : [0, ∞) → R be dri. Show that b(·) is Riemann integrable on every bounded interval. Conclude that if b(·) is dri it
must be continuous almost everywhere w.r.t. Lebesgue measure.
(b) Let b(·) : [0, ∞) → R be Riemann integrable on [0, K] for each
K < ∞. Let h(·) : [0, ∞) → R+ be nonincreasing and integrable
w.r.t. Lebesgue measure and |b(·)| ≤ h(·) on [0, ∞). Show that
b(·) is dri.
8.19 Verify that the sequence {Yn }n≥0 in Example 8.5.2 and the process
{Y (t) : t ≥ 0} in Example 8.5.3 are both regenerative.
8.20 Show that the map T in Example 8.6.4 in Section 8.6 is measure
preserving.
!
"
(Hint: Show that for 0 < a < b < 1, P ω : T ω ∈ (a, b) = (b − a).)
8.21 Let T be a measure preserving map on a probability space (Ω, F, P ).
Show that A is almost T -invariant w.r.t. P iff there exists a set A1
such that A1 = T −1 A1 and P (A + A1 ) = 0.
#∞
(Hint: Consider A1 = n=0 T −n A. )
8.22 Show that a function h : Ω → R is I-measurable iff h(ω) = h(T ω)
for all ω where I is the σ-algebra of T -invariant sets.
"
!
8.23 Consider the sequence space R∞ , B(R∞ ) . Show that A ∈ B(R∞ )
is invariant w.r.t. the unilateral shift T implies that A is in the tail
σ-algebra.
8.24 In Example 8.6.7 of Section 8.6, show that {Zi }i≥1 is a stationary
sequence that is not ergodic.
(Hint: Assuming it is ergodic, derive a contradiction using the ergodic Theorem 8.6.1.)
8.8 Problems
283
8.25 (a) Extend Example 8.6.7 to the Markov chain case with two disjoint
irreducible positive recurrent subsets.
(b) Show that in Example 8.6.5, if θ0 is rational, then T is not
ergodic.
8.26 (a) Verify that in Remark 8.6.1, T is ergodic but T 2 is not.
(b) Construct a Markov chain with four states for which T is ergodic
but T 2 is not.
8.27 In Remark 8.6.5, prove the Shannon-McMillan-Breiman theorem directly for the Markov chain case.
.
- n−1
<
pXi Xi+1 p(X1 ).)
(Hint: Express p(X1 , X2 , . . . , Xn ) as
i=1
8.28 Let {Xi }i≥1 be iid Bernoulli (1/2) random variables. Let
W1
W2
=
=
∞
)
2X2i
i=1
∞
)
i=1
4i
X2i−1
.
4i
(a) Show that W1 and W2 are independent.
(b) Let A1 = {ω : ω ∈ (0, 1) such that in the expansion of ω in
base 4 only the digits 0 and 2 appear} and A2 = {ω : ω ∈
(0, 1) such that in the expansion of ω in base 4 only the digits
0 and 1 appear}. Show that m(A1 ) = m(A2 ) = 0 where m(·)
is Lebesgue measure and hence that the distribution of W1 and
W2 are singular w.r.t. m(·).
(c) Let W ≡ W1 + W2 . Then show that W has uniform (0,1) distribution.
(Hint: For (b) use the SLLN.)
Remark: This example shows that the convolution of two singular
probability measures can be absolutely continuous w.r.t. Lebesgue
measure.
8.29 Let {Xn }n≥1 be a sequence of pairwise independent and identically
distributed random variables with P (X1 ≤ x) = F (x), x ∈ R. Fix
0 < p < 1. Suppose that F (ζp + #) > p for all # > 0 where
ζp = F −1 (p) ≡ inf{x : F (x) ≥ p}.
Show that ζ̂n ≡ Fn−1$
(p) ≡ inf{x : Fn (x) ≥ p} converges to ζp w.p. 1
n
where Fn (x) ≡ n−1 i=1 I(Xi ≤ x), x ∈ R is the empirical distribution function of X1 , . . . , Xn .
284
8. Laws of Large Numbers
8.30 Let {Xi }i≥1$be random variables such that$EXi2 < ∞ for all i ≥ 1.
n
n
Suppose n1 i=1 EXi → 0 and an ≡ n12 j=0 (n − j)v(j) → 0 as
J
J
n → ∞ where v(j) = supi JCov(Xi , Xi+j )J.
(a) Show that X̄n −→p 0.
$∞
(b) Suppose further that n=1 an < ∞. Show that X̄n → 0 w.p. 1.
(c) Show that as n → ∞, v(n) → 0 implies an → 0 but the converse
need not hold.
8.31 Let
i }i≥1 be iid random variables with cdf F (·). Let Fn (x) ≡
${X
n
1
i=1 I(Xi ≤ x) be the empirical cdf. Suppose xn → x0 and F (·)
n
is continuous at x0 . Show that Fn (xn ) → F (x0 ) w.p. 1.
8.32 Let p be a positive integer > 1. Let {δi }i≥1 be iid random variable
$p−1
with distribution
$∞ δi P (δ1 = j) = pj , 0 ≤ j ≤ p − 1, pj ≥ 0, 0 pj = 1.
Let X = i=1 pi . Show that
(a) P (X ∈ (0, 1)) = 1.
(b) FX (x) ≡ P (X ≤ x) is continuous and strictly increasing in (0,1)
if 0 < pj < 1 for any 0 ≤ j ≤ p − 1.
(c) FX (·) is absolutely continuous iff pj =
which case FX (x) ≡ x, 0 ≤ x ≤ 1.
1
j
for all 0 ≤ j ≤ p − 1 in
8.33 (Random AR-series). Let {Xn }n≥0 be a sequence of random variables
such that
Xn+1 = ρn+1 Xn + #n+1 , n ≥ 0
where the sequence {(ρn , #n )}n≥1 are iid and independent of X0 .
(a) Show that if E(log |ρ1 |) < 0 and E(log |#1 |)+ < ∞ then
X̂n ≡
n
)
ρ1 ρ2 . . . ρj , #j+1
converges w.p. 1.
j=0
(b) Show that under the hypothesis of (a), for any bounded continuous function h : R → R and for any distribution of X0
Eh(Xn ) → Eh(X̂∞ ).
(Hint: Show by SLLN that there is a 0 < λ < 1 such that
ρ1 , ρ2 , . . . , ρj = 0(λj ) w.p. 1 as j → ∞ and by Borel-Cantelli
|#j | = 0(λ$j ) for some λ$ > 0 A λ$ λ < 1.)
8.34 (Iterated random functions). Let (S, ρ) be a complete separable metric space. Let (G, G) be a measurable space. Let f : G × S → S be
8.8 Problems
285
1G × B(S), B(S)2 measurable function. Let (Ω, F, P ) be a probability space and {θi }i≥1 be iid G-valued random variables on (Ω, F, P ).
Let X0 be an S-valued random variable on (Ω, F, P ) independent of
{θi }i≥1 . Define {Xn }n≥0 by the random iteration scheme,
X0 (x, ω) ≡ x
!
"
Xn+1 (x, ω) = f θn+1 (ω), Xn (x, ω) n ≥ 0.
(a) Show that for each n ≥ 0, the map Xn = S × Ω → S is 1B(S) ×
F, B(S)2 measurable.
(b) Let fn (x) ≡ fn (x, ω) ≡ f (θn (ω), x). Let X̂n (x, ω) =
f1 (f2 , . . . , fn (x)). Show that for each x and n, X̂n (x, ω) and
Xn (x, ω) have the same distribution.
(c) Now assume that for all ω, f (θ1 (ω), x) is Lipschitz from S to S,
i.e.,
d(f (θi (ω), x), f (θi (ω), y))
< ∞.
2i (ω) ≡ sup
d(x, y)
x.=y
Show that 2i (ω) is a random variable on (Ω, F, P ), i.e. that 2i (·) :
Ω → R+ is 1F, B(R)2 measurable.
(d) Suppose that E| log 21 (ω)| < ∞ and E log 21 (ω) < 0,
E| log d(f (θ1 , x), x)| < ∞ for all x. Show that limn X̂n (x, ω) =
X̂∞ (ω) exists w.p. 1 and is independent of x w.p. 1.
(Hint: Use Borel-Cantelli to show
{X̂n (x, ω)}n≥1 is Cauchy in (S, ρ).)
that
for
each
x,
(e) Under the hypothesis in (d) show that for any bounded continuous h : S → R and for any x ∈ S, limn→∞ Eh(Xn (x, ω)) =
Eh(X̂∞ (ω)).
(f) Deduce the results in Problems 7.15 and 8.33 as special cases.
8.35 (Extension of Gilvenko-Cantelli (Theorem 8.2.4) to the multivariate case). Let {Xn }n≥1 be a sequence of pairwise independent
k
and identically !distributed random vectors taking values
" in R with
cdf F (x) ≡ P X11 ≤ x1 , X12 ≤ x2 , . . . , X1k ≤ xk where X1 =
(X$
11 , X12 , . . . , X1k ) and x = (x1 , x2 , . . . , xk ) ∈ R. Let Fn (x) ≡
n
1
i=1 I(Xi ≤ x) be the empirical cdf based on {Xi }1≤i≤n . Show
n
that sup{|Fn (x) − F (x)| : x ∈ R} → 0 w.p. 1.
(Hint: First prove an extension of Polyā’s theorem (Lemma 8.2.6) to
the multivariate case.)
9
Convergence in Distribution
9.1 Definitions and basic properties
In this section, the notion of ‘convergence in distribution’ of a sequence of
random variables is discussed. The importance and usefulness of this notion lie in the following observation: if a sequence of random variables Xn
converges in distribution to a random variable X, then one may approximate the probabilities P (Xn ∈ A) by P (X ∈ A) for large n for a large
class of sets A ∈ B(R). In many situations, exact evaluation of P (Xn ∈ A)
is a more difficult task than the evaluation of P (X ∈ A). As a result, one
may work with the limiting value P (X ∈ A) instead of P (Xn ∈ A), when
n is large. As an example, consider the following problem from statistical
inference. Let Y1 , Y2 , . . . be a collection of iid random variables with a finite second moment. Suppose that one is interested in finding the observed
level of significance or the p-value for a statistical test of the hypotheses
H1 : µ )= 0 about the population mean
H0 : µ = 0 against an alternative$
n
−1
=
n
µ. If the test statistic
Ȳ
n
i=1 Yi is used and the test rejects H0
√
for large values of | nȲn |, √
then the p-value of the test can be found using
the function ψn (a) ≡ P0 (| nȲn | > a), a ∈ [0, ∞), where P0 denotes the
joint distribution of {Yn }n≥1 under µ = 0. Note that here, finding ψn (·) is
difficult, as it depends on the √
joint distribution of Y1 , . . . , Yn . If, however,
one knows that under µ = 0, nȲn converges in distribution to a normal
random variable Z (which is in fact guaranteed by the central limit theorem, see Chapter 11), then one may approximate ψn (a) by P (|Z| > a),
which can be found, e.g., by using a table of normal probabilities.
288
9. Convergence in Distribution
The formal definition of ‘convergence in distribution’ is given below.
Definition 9.1.1: Let Xn , n ≥ 0 be a collection of random variables and
let Fn denote the cdf of Xn , n ≥ 0. Then, {Xn }n≥1 is said to converge in
distribution to X0 , written as Xn −→d X0 , if
lim Fn (x) = F0 (x) for every
n→∞
x ∈ C(F0 )
(1.1)
where C(F0 ) = {x ∈ R : F0 is continuous at x}.
Definition 9.1.2: Let {µn }n≥0 be probability measures on (R, B(R)).
denoted
Then {µn }n≥1 is said to converge to µ0 weakly! or in distribution,
"
by µn −→d µ0 , if (1.1) holds with Fn (x) ≡ µn (−∞, x] , x ∈ R, n ≥ 0.
Unlike the notions of convergence in probability and convergence almost
surely, the notion of convergence in distribution does not require that the
random variables Xn , n ≥ 0 be defined on a common probability space.
Indeed, for each n ≥ 0, Xn may be defined on a different probability space
(Ωn , Fn , Pn ) and {Xn }n≥1 may converge in distribution to X0 . In such
a context, the notions of convergence of {Xn }n≥1 to X0 in probability
or almost surely are not well defined. Definition 9.1.1 requires only the
cdfs of Xn ’s to converge to that of X0 at each x ∈ C(F0 ) ⊂ R, but does
not require the (almost sure or in probability) convergence of the random
variables Xn ’s themselves.
Example 9.1.1: For n ≥ 1, let Xn

 0
nx
Fn (x) =

1
∼ Uniform (0, n1 ), i.e., Xn has the cdf
if x ≤ 0
if 0 < x <
if x ≥ n1
1
n
and let X0 be the degenerate random variable taking the value 0 with
probability 1, i.e., the cdf of X0 is
/
0 if x < 0
F0 (x) =
1 if x ≥ 0.
Note that the function F0 (x) is discontinuous only at x = 0. Hence,
C(F0 ) = R\{0}. It is easy to verify that for every x )= 0,
Fn (x) → F0 (x)
Hence, Xn −→d X0 .
as
n → ∞.
Example 9.1.2: Let {an }n≥1 and {bn }n≥1 be sequences of real numbers
such that 0 < bn < ∞ for all n ≥ 1. Let Xn ∼ N (an , bn ), n ≥ 1. Then, the
cdf of Xn is given by
'x − a (
n
, x∈R
(1.2)
Fn (x) = Φ
bn
9.1 Definitions and basic properties
289
&x
where Φ(x) = −∞ φ(t)dt and φ(x) = (2π)−1/2 exp(−x2 /2), x ∈ R. If
X0 ∼ N (a0 , b0 ) for some a0 ∈ R, b0 ∈ [0, ∞), then using (1.2), one can
show that Xn −→d X0 if and only if an → a0 and bn → b0 as n → ∞
(Problem 9.8).
Next some simple implications of Definition 9.1.1 are considered.
Proposition 9.1.1: If Xn −→p X0 , then Xn −→d X0 .
Proof: Let Fn denote the cdf of Xn , n ≥ 0. Fix x ∈ C(F0 ). Then, for any
# > 0,
P (Xn ≤ x) ≤ P (X0 ≤ x + #) + P (Xn ≤ x, X0 > x + #)
≤
P (X0 ≤ x + #) + P (|Xn − X0 | > #)
(1.3)
and similarly,
P (Xn ≤ x) ≥ P (X0 ≤ x − #) − P (|Xn − X0 | > #).
(1.4)
Hence, by (1.3) and (1.4),
F0 (x − #) − P (|Xn − X0 | > #) ≤ Fn (x) ≤ F0 (x + #) + P (|Xn − X0 | > #).
Since Xn −→p X0 , letting n → ∞, one gets
F0 (x − #) ≤ lim inf Fn (x) ≤ lim sup Fn (x) ≤ F0 (x + #)
n→∞
(1.5)
n→∞
for all # ∈ (0, ∞). Note that as x ∈ C(F0 ), F0 (x−) = F0 (x). Hence, letting
# ↓ 0 in (1.5), one has limn→∞ Fn (x) = F0 (x). This proves the result. !
As pointed out before, the converse of Proposition 9.1.1 is false in general.
The following is a partial converse. The proof follows from the definitions
of convergence in probability and convergence in distribution and is left as
an exercise (Problem 9.1).
Proposition 9.1.2: If Xn −→d X0 and P (X0 = c) = 1 for some c ∈ R,
then Xn −→p c.
Theorem 9.1.3: Let Xn , n ≥ 0 be a collection of random variables with
respective cdfs Fn , n ≥ 0. Then, Xn −→d X0 if and only if there exists a
dense set D in R such that
lim Fn (x) = F0 (x)
n→∞
for all
x ∈ D.
(1.6)
Proof: Since C(F0 )c has at most countably many points, the ‘only if’ part
follows. To prove the ‘if’ part, suppose that (1.6) holds. Fix x ∈ C(F0 ).
Then, there exist sequences {xn }n≥1 , {yn }n≥1 in D such that xn ↑ x and
yn ↓ x as n → ∞. Hence, for any k, n ∈ N,
Fn (xk ) ≤ Fn (x) ≤ Fn (yk ).
290
9. Convergence in Distribution
By (1.6), for every k ∈ N,
F0 (xk )
=
≤
lim Fn (xk ) ≤ lim inf Fn (x)
n→∞
n→∞
lim sup Fn (x) ≤ lim Fn (yk ) = F0 (yk ).
n→∞
n→∞
(1.7)
Since x ∈ C(F0 ), limk→∞ F0 (xk ) = F0 (x) = limk→∞ F0 (y). Hence, by
(1.7), limn→∞ Fn (x) exists and equals F0 (x). This completes the proof of
Theorem 9.1.3.
!
Theorem 9.1.4: (Polyā’s theorem). Let Xn , n ≥ 0 be random variables
with respective cdfs Fn , n ≥ 0. If F0 is continuous on R, then
J
J
sup JFn (x) − F0 (x)J → 0 as n → ∞.
x∈R
Proof: This is a special case of Lemma 8.2.6 and uses the following proposition.
!
Proposition 9.1.5: If a cdf F is continuous on R, then it is uniformly
continuous on R.
The proof of Proposition 9.1.5 is left as an exercise (Problem 9.2).
Theorem 9.1.6: (Slutsky’s theorem). Let {Xn }n≥1 and {Yn }n≥1 be two
sequences of random variables such that for each n ≥ 1, (Xn , Yn ) is defined
on a probability space (Ωn , Fn , Pn ). If Xn −→d X and Yn −→p a for some
a ∈ R, then
(i) Xn + Yn −→d X + a,
(ii) Xn Yn −→d aX, and
(iii) Xn /Yn −→d X/a, provided a )= 0.
Proof: Only a proof of part (i) is given here. The other parts may be
proved similarly. Let F0 denote the cdf of X. Then, the cdf of X + a is
given by F (x) = F0 (x − a), x ∈ R. Fix x ∈ C(F ). Then, x − a ∈ C(F0 ).
For any # > 0 (as in the derivations of (1.3) and (1.4)),
P (Xn + Yn ≤ x) ≤ P (|Yn − a| > #) + P (Xn + a − # ≤ x)
(1.8)
P (Xn + Yn ≤ x) ≥ P (Xn + a + # ≤ x) − P (|Y − a| > #).
(1.9)
and
Now fix # > 0 such that x − a − #, x − a + # ∈ C(F0 ). This is possible since
R\C(F0 ) is countable. Then, from (1.8) and (1.9), it follows that
lim sup P (Xn + Yn ≤ x)
n→∞
9.2 Vague convergence, Helly-Bray theorems, and tightness
≤
8
9
lim P ((Yn − a) > #) + P (Xn ≤ x − a + #)
291
n→∞
= F0 (x − a + #)
(1.10)
and similarly,
lim inf P (Xn + Yn ≤ x) ≥ F0 (x − a − #).
n→∞
(1.11)
Now letting # → 0+ in such a way that x − a ± # ∈ C(F0 ), from (1.10)
and (1.11), it follows that
F0 ((x − a)−) ≤ lim inf P (Xn + Yn ≤ x)
n→∞
≤ lim sup P (Xn + Yn ≤ x)
n→∞
≤ F0 (x − a).
Since x − a ∈ C(F0 ), (i) is proved.
!
9.2 Vague convergence, Helly-Bray theorems, and
tightness
One version of the Bolzano-Weirstrass theorem from real analysis states
that if A ⊂ [0, 1] is an infinite set, then there exists a sequence {xn }n≥1 ⊂ A
such that limn→∞ xn ≡ x exists in [0, 1]. Note that x need not be in A
unless A is closed. There is an analog of this for sub-probability measures
on (R, B(R)), i.e., for measures µ on (R, B(R)) such that µ(R) ≤ 1. First,
one needs a definition of convergence of sub-probability measures.
Definition 9.2.1: Let {µn }n≥1 , µ be sub-probability measures on
(R, B(R)). Then {µn }n≥1 is said to converge to µ vaguely, denoted by
µn −→v µ, if there exists a set D ⊂ R such that D is dense in R and
µn ((a, b]) → µ((a, b])
as n → ∞
for all a, b ∈ D.
(2.1)
Example 9.2.1: Let {Xn }n≥1 , X be random variables such that Xn
converges to X in distribution, i.e.,
Fn (x) ≡ P (Xn ≤ x) → F (x) ≡ P (X ≤ x)
(2.2)
for all x ∈ C(F ), the set of continuity points of F . Since the complement
of C(F ) is at most countable, (2.2) implies that µn −→v µ where µn (·) ≡
P (Xn ∈ ·) and µ(·) ≡ P (X ∈ ·).
Remark 9.2.1: It follows from above that if {µn }n≥1 , µ are probability
measures, then
µn −→d µ ⇒ µn −→v µ.
(2.3)
292
9. Convergence in Distribution
Conversely, it is not difficult to show that (Problem 9.4) if µn −→v µ
and µn and µ are probability measures, then µn −→d µ.
Example 9.2.2: Let µn be the probability measure corresponding to the
Uniform distribution on [−n, n], n ≥ 1. It is easy to show that µn −→v µ0 ,
where µ0 is the measure that assigns zero mass to all Borel sets. This shows
that if µn −→v µ, then µn (R) need not converge to µ(R). But if µn (R) does
converge to µ(R) and µ(R) > 0 and if µn −→v µ, then it can be shown
µ
n
that µ$n −→d µ$ where µ$n = µnµ(R)
and µ$ = µ(R)
.
Theorem 9.2.1: (Helly’s selection theorem). Let A be an infinite collection of sub-probability measures on (R, B(R)). Then, there exist a sequence
{µn }n≥1 ⊂ A and a sub-probability measure µ such that µn −→v µ.
Proof: Let D ≡ {rn }n≥1 be a countable dense set in R (for example, one
may take D = Q, the set of rationals or D = Dd , the set of all dyadic
rationals of the form {j/2n : j an integer, n a positive integer}). Let for
each x, A(x) ≡ {µ((−∞, x]) : µ ∈ A}. Then A(x) ⊂ [0, 1] and so by the
Bolzano-Weirstrass theorem applied to the set A(r1 ), one gets a sequence
{µ1n }n≥1 ⊂ A such that limn→∞ F1n (r1 ) ≡ F (r1 ) exists, where F1i (x) ≡
µ1i ((−∞, x]), x ∈ R. Next, applying the Bolzano-Weirstrass theorem to
{F1n (r2 )}n≥1 yields a further subsequence {µ2n }n≥1 ⊂ {µ1n }n≥1 ⊂ A
such that limn→∞ F2n (r2 ) ≡ F (r2 ) exists, where F2i (x) ≡ µ2i ((−∞, x]),
i ≥ 1. Continuing this way, one obtains a sequence of nested subsequences
{µjn }n≥1 , j = 1, 2, . . . such that for each j, limn→∞ Fjn (rj ) ≡ F (rj ) exists.
In particular, for the subsequence {µnn }n≥1 ,
lim Fnn (rj ) = F (rj )
(2.4)
F̃ (x) ≡ inf{F (r) : r > x, r ∈ D}.
(2.5)
n→∞
exists for all j. Now set
Then, F̃ (·) is a nondecreasing right continuous function on R (Problem 9.5)
and it equals F (·) on D. Let µ be the Lebesgue-Stieltjes measure generated
by F̃ . Since Fnn (x) ≤ 1 for all n and x, it follows that F̃ (x) ≤ 1 for all x
and hence that µ is a sub-probability measure. Suppose it is shown that
(2.4) also implies that
lim Fnn (x) = F̃ (x)
(2.6)
n→∞
for all x ∈ CF̃ , the set of continuity points of F̃ . Then, all a, b ∈ CF̃ ,
µnn ((a, b]) ≡ Fnn (b) − Fnn (a) → F̃ (b) − F̃ (a) = µ((a, b]) and hence that
µn −→v µ. To establish (2.6), fix x ∈ CF̃ and # > 0. Then, there is a δ > 0
such that for all x−δ < y < x+δ, F̃ (x)−# < F̃ (y) < F̃ (x)+#. This implies
that there exist x − δ < r < x < r$ < x + δ, r, r$ ∈ D and F̃ (x) − # <
F (r) ≤ F̃ (x) ≤ F (r$ ) < F̃ (x) + #. Since Fnn (r) ≤ Fnn (x) ≤ Fnn (r$ ), it
9.2 Vague convergence, Helly-Bray theorems, and tightness
293
follows that
F̃ (x) − # ≤ lim Fnn (x) ≤ lim Fnn (x) ≤ F̃ (x) + #,
n→∞
n→∞
establishing (2.6).
!
Next, some characterization results on vague convergence and convergence in distribution will be established. These can then be used to define
the notions of convergence of sub-probability measures on more general
metric spaces.
Theorem 9.2.2: (The first Helly-Bray theorem or the Helly-Bray theorem
for vague convergence). Let {µn }n≥1 and µ be sub-probability measures on
(R, B(R)). Then µn −→v µ iff
%
%
f dµn → f dµ
(2.7)
for all f ∈ C0 (R) ≡ {g | g : R → R is continuous and lim|x|→∞ g(x) = 0}.
Proof: Let µn −→v µ and let f ∈ C0 (R). Given # > 0, choose K large
such that |f (x)| < # for |x| > K. Since µn −→v µ, there exists a dense set
D ⊂ R such that µn ((a, b]) → µ((a, b]) for all a, b ∈ D. Now choose a, b ∈ D
such that a < −K and b > K. Since f is uniformly continuous on [a, b] and
D is dense in R, there exist points x0 = a < x1 < x2 < · · · < xm = b in D
such that supxi ≤x≤xi+1 |f (x) − f (xi )| < # for all 0 ≤ i < m. Now
%
f dµn =
%
(−∞,a]
f dµn +
m−1
)%
i=0
(xi ,xi+1 ]
f dµn +
%
(b,∞)
f dµn
and so
J%
J
m−1
)
J
J
J f dµn −
J
f
(x
)µ
((x
,
x
])
i n
i
i+1 J < 2# + # · µn ((a, b]) < 3#.
J
i=0
A similar approximation holds for
measures, it follows that
&
f dµ. Since µn , µ are sub-probability
J%
J
%
m
)
J
J
J
J
J f dµn − f dµJ < 6# + 7f 7
Jµn ((xi , xi+1 ]) − µ((xi , xi+1 ])J,
J
J
i=0
where 7f 7 = sup{|f (x)| : x ∈ R}. Letting n → ∞ and noting that µn −→v
µ and {xi }m
i=0 ⊂ D, one gets
J%
J
%
J
J
J
lim sup J f dµn − f dµJJ ≤ 6#.
n→∞
294
9. Convergence in Distribution
Since # > 0 is arbitrary, (2.7) follows and the proof of the “only if” part is
complete.
To prove the converse, let D be the set of points {x : µ({x}) = 0}. Fix
a, b ∈ D, a < b. Let # > 0. Let f1 be the function defined by

 1 if a ≤ x ≤ b
0 if x < a − # or x > b + #
f1 (x) =

linear on a − # ≤ x < a, b ≤ x ≤ b + #.
Then, f1 ∈ C0 (R) and by (2.7),
%
%
f1 dµn → f1 dµ.
&
&
f1 dµn and
f1 dµ ≤ µ((a − #, b + #]). Thus,
But µn ((a, b]) ≤
lim supn→∞ µn ((a, b]) ≤ µ((a − #, b + #]). Letting # ↓ 0 and noting that
a, b ∈ D, one gets
(2.8)
lim sup µn ((a, b]) ≤ µ((a, b]).
n→∞
A similar argument with f2 = 1 on [a + #, b − #] and 0 for x ≤ a and ≥ b
and linear in between, yields
lim inf µn ((a, b]) ≥ µ((a, b]).
n→∞
This with (2.8) completes the proof of the “if” part.
!
Theorem 9.2.3: (The second Helly-Bray theorem or the Helly-Bray theorem for weak convergence). Let {µn }n≥1 , µ be probability measures on
(R, B(R)). Then, µn −→d µ iff
%
%
f dµn → f dµ
(2.9)
for all f ∈ CB (R) ≡ {g | g : R → R, g is continuous and bounded}.
Proof: Let µn −→d µ. Let # > 0 and f ∈ CB (R) be given. Choose K large
such that µ((−K, K]) > 1 − #. Also, choose a < −K and b > K such that
µ({a}) = µ({b}) = 0, a, b ∈ D. Let a = x0 < x1 < < xm = b be chosen
so that x0 , . . . , xm ∈ D and
sup
xi ≤x≤xi+1
|f (x) − f (xi )| < #
&
&
&
&
for all i = 1, . . . , m−1. Since f dµn − f dµ = (−∞,a] f dµn − (−∞,a] f dµ+
&
&
&
$m−1 &
i=1 ( (xi ,xi+1 ] f dµn − (xi ,xi+1 ] f dµ) + (b,∞) f dµn − (b,∞) f dµ, it follows
that
J%
J
%
8!
J
J
"
J f dµn − f dµJ < 7f 7 µn ((−∞, a]) + µ((−∞, a])
J
J
9.2 Vague convergence, Helly-Bray theorems, and tightness
+
m−1
)
i=0
295
J
J
Jµn ((xi , xi+1 ]) − µ((xi , xi+1 ])J
9
+ µn ((b, ∞)) + µ((b, ∞)) .
Since, a, b, x0 , x1 , . . . , xm ∈ D,
J%
J
%
J
J
lim sup JJ f dµn − f dµJJ ≤ 7f 72(1 − µ((a, b])) ≤ 7f 72#.
n→∞
Since # > 0 is arbitrary, the “only if” part is proved.
Next consider the “if” part. Since C0 (R) ⊂ CB (R), (2.9) and Theorem
9.2.2 imply that µn −→v µ. As noted in Remark 9.2.1, if {µn }n≥1 , µ are
probability measures then µn −→v µ iff µn −→d µ. So the proof is complete.
!
Definition 9.2.2:
(a) A sequence of probability measures {µn }n≥1 on (R, B(R)) is called
tight if for any # > 0, there exists M = M$ ∈ (0, ∞) such that
!
"
sup µn [−M, M ]c < #.
(2.10)
n≥1
(b) A sequence of random variables {Xn }n≥1 is called tight or stochastically bounded if the sequence of probability distributions {µn }n≥1 of
{Xn }n≥1 is tight, i.e., given any # > 0, there exists M = M$ ∈ (0, ∞)
such that
(2.11)
sup P (|Xn | > M ) < #.
n≥1
Remark 9.2.3: In Definition 9.2.2 (b), the random variables Xn , n ≥ 1
need not be defined on a common probability space. If Xn is defined on a
probability space (Ωn , Fn , Pn ), n ≥ 1, then (2.11) needs to be replaced by
sup Pn (|Xn | > M ) < #.
n≥1
Example 9.2.3: Let Xn ∼ Uniform(n, n + 1). Then, for any given M ∈
(0, ∞),
P (|Xn | > M ) ≥ P (Xn > M ) = 1 for all n > M.
Consequently, for any M ∈ (0, ∞),
sup P (|Xn | > M ) = 1
n≥1
and the sequence {Xn }n≥1 cannot be stochastically bounded.
296
9. Convergence in Distribution
Example 9.2.4: For n ≥ 1, let
Xn ∼ Uniform(an , 2 + an ),
(2.12)
where an = (−1)n . Then, {Xn }n≥1 is stochastically bounded. Indeed,
|Xn | ≤ 3 for all n ≥ 1 and therefore, for any # > 0, (2.11) holds with
M = 3. Note that in this example, the sequence {Xn }n≥1 does not converge in distribution to a random variable X. From (2.12), it follows that
as k → ∞,
X2k −→d Uniform(1, 3), X2k−1 −→d Uniform(−1, 1).
(2.13)
Examples 9.2.3 and 9.2.4 highlight two important characteristics of a
tight sequence of random variables or probability measures. First, the notion of tightness of probability measures or random variables is analogous
to the notion of boundedness of a sequence of real numbers. For a sequence
of bounded real numbers {xn }n≥1 , all the xn ’s must lie in a bounded interval [−M, M ], M ∈ (0, ∞). For a sequence of random variables {Xn }n≥1 ,
the condition of tightness requires that given # > 0 arbitrarily small, there
exists an M = M$ in (0, ∞) such that for each n, Xn lies in [−M, M ] with
probability at least 1 − #. Thus, for a tight sequence of random variables,
no positive mass can escape to ±∞, which is contrary to what happens
with the random variables {Xn }n≥1 of Example 9.2.3.
The second property illustrated by Example 9.2.4 is that like a bounded
sequence of real numbers, a tight or stochastically bounded sequence of
random variables may not converge in distribution, but has one or more
convergent subsequences (cf. (2.13)). Indeed, the notion of tightness can
be characterized by this property, as shown by the following result. For
consistency with the other results in this section, it is stated in terms of
probability measures instead of random variables.
Theorem 9.2.4: Let {µn }n≥1 be a sequence of probability measure
on (R, B(R)). The sequence {µn }n≥1 is tight iff given any subsequence
{µni }i≥1 of {µn }n≥1 , there exists a further subsequence {µmi }i≥1 of
{µni }i≥1 and a probability measure µ on (R, B(R)) such that
µmi −→d µ
as
i → ∞.
(2.14)
Proof: Suppose that {µn }n≥1 is tight. Given any subsequence {µni }i≥1
of {µn }n≥1 , by Helly’s selection theorem (Theorem 9.2.1), there exists a
sub-probability measure µ and a further subsequence {µmi }i≥1 of {µni }i≥1
such that
(2.15)
µmi −→v µ.
Next, fix # ∈ (0, 1). Since {µn }n≥1 is tight, there exists M ∈ (0, ∞) such
that
!
"
sup µn [−M, M ]c < #.
(2.16)
n≥1
9.2 Vague convergence, Helly-Bray theorems, and tightness
297
By (2.15) and (2.16), there exists a, b ∈ D, a < −M , b > M such that
!
"
!
"
µ (a, b] = lim µmi (a, b]
i→∞
!
"
≥ lim inf µmi [−M, M ]
i→∞
!
"
= 1 − lim sup µmi [−M, M ]
n→∞
≥
1 − #.
Since # ∈ (0, 1) is arbitrary, this shows that µ is a probability measure and
hence, the ‘only if’ part is proved. Next, consider the ‘if part.’ Suppose
{µn }n≥1 is not tight. Then, there exists #0 ∈ (0, 1) such that for all M ∈
(0, ∞),
!
"
sup µn [−M, M ]c > #0 .
n≥1
Hence, for each k ∈ N, there exists nk ∈ N such that
!
"
µnk [−k, k]c ≥ #0 .
(2.17)
Since any finite collection of probability measures on (R, B(R)) is tight,
it follows that {µnk : k ∈ N} is a countable infinite set. Hence, by the
hypothesis, there exist a subsequence {µmi }i≥1 in {µnk : k ∈ N} and a
probability measure µ such that
µmi −→d µ
as i → ∞.
(2.18)
!
"
Let a, b ∈ R be such that µ({a}) = 0 = µ({b}) and µ (a, b]c < #0 /2. By
(2.18), there exists i0 ≥ 1 such that for all i ≥ i0 ,
!
"
!
"
µmi (a, b]c < µ (a, b]c + #0 /2 < #0 .
Since (a, b]c ⊃ [−k, k]c for all k > max{|a|, |b|} and {µmi : i ≥ i0 } ⊂ {µnk :
!
k ∈ N}, this contradicts (2.17). Hence, {µn }n≥1 is tight.
Theorem 9.2.5: Let {µn }n≥1 , µ be probability measures on (R, B(R)). If
µn −→d µ, then {µn }n≥1 is tight.
Proof: Fix #! ∈ (0, ∞).
" Then, there exists a, b ∈ R such that µ({a}) = 0 =
µ({b}) and µ (a, b]c < #/2. Since µn −→d µ, there exists n0 ≥ 1 such that
for all n ≥ n0 ,
J !
"
!
"J
Jµn (a, b] − µ (a, b] J < #/2.
Thus, for all n ≥ n0 ,
!
"
!
"
µn (a, b]c ≤ µ (a, b]c + #/2 < #.
Also, for each n = 1, . . . , n0 , there exist Mi ∈ (0, ∞) such that
!
"
µi [−Mi , Mi ]c < #.
(2.19)
(2.20)
298
9. Convergence in Distribution
Let M = max{Mi : 0 ≤ i ≤ n0 }, where M0 = max{|a|, |b|}. Then by (2.19)
and (2.20),
!
"
sup µn [−M, M ]c < #.
n≥1
Thus, {µn }n≥1 is tight.
!
An easy consequence of Theorems 9.2.4 and 9.2.5 is the following characterization of weak convergence.
Theorem 9.2.6: Let {µn }n≥1 be a sequence of probability measures on
(R, B(R)). Then µn −→d µ iff {µn }n≥1 is tight and all weakly convergent
subsequences of {µn }n≥1 converge to the same limiting probability measure
µ.
Proof: If µn −→d µ, then any weakly convergent subsequence of {µn }n≥1
converges to µ and by Theorem 9.2.5, {µn }n≥1 is tight. Hence, the ‘only if’
part follows. To prove the ‘if’ part, suppose that {µn }n≥1 is tight and that
all weakly convergent subsequences of {µn }n≥1 converges to µ. Let {Fn }n≥1
and F denote the cdfs corresponding to {µn }n≥1 and µ, respectively. If
possible, suppose that {µn }n≥1 does not converge
!
" in distribution to µ.
Then, by definition, there exists x0 ∈ R with µ {x0 } = 0 such that Fn (x0 )
does not converge to F (x0 ) as n → ∞. Then, there exist #0 ∈ (0, 1) and a
subsequence {ni }i≥1 such that
J
J
JFni (x0 ) − F (x0 )J ≥ #0
for all i ≥ 1.
(2.21)
Since {µn }n≥1 is tight, there exists a subsequence {mi }i≥1 ⊂ {ni }i≥1 and
a probability measure µ0 such that
µmi −→d µ0
as
i → ∞.
(2.22)
!
"
By hypothesis, µ0 = µ. Hence µ0 ({x0 }) = µ {x0 } = 0 and by (2.22),
Fmi (x0 ) → F (x0 )
as
i → ∞,
contradicting (2.21). Therefore, µn −→d µ.
!
For another proof of the ‘if’ part, see Problem 9.6.
Note that by Slutsky’s theorem, if Xn −→d X and Yn −→p 0, then
Xn Yn −→p 0. The following result gives a refinement of this.
Proposition 9.2.7: If {Xn }n≥1 is stochastically bounded and Yn −→p 0,
then Xn Yn −→p 0.
The proof is left as an exercise (Problem 9.7).
9.3 Weak convergence on metric spaces
299
9.3 Weak convergence on metric spaces
The Helly-Bray theorems proved above suggest the following definitions of
vague convergence and convergence in distribution for measures on metric
spaces. Recall that (S, d) is called a metric space, if S is a nonempty set
and d is a function from S × S → [0, ∞) such that
(i) d(x, y) = d(y, x) for all x, y ∈ S,
(ii) d(x, y) = 0 iff x = y for all x, y ∈ S,
(iii) d(x, z) ≤ d(x, y) + d(y, z) for all x, y, z ∈ S.
A common example of a metric space is given by S = Rk and d(x, y),
the Euclidean distance. A set G ⊂ S is open if for all x ∈ G, there exists
an # > 0 such that for any y in S, d(x, y) < # ⇒ y ∈ G. The set
B(x, #) = {y : d(x, y) < #}
is called the open ball of radius # with center at x, x ∈ S, # > 0. Recall that
f : S → R is continuous if f −1 ((a, b)) is open for every −∞ < a < b < ∞.
A family G of open sets in S is called an open cover for a set B ⊂ S if for
each x ∈ B, there exists a G ∈ G such that x ∈ G. A set K ⊂ S is called
compact if given any open cover G for K, there is a finite subfamily G1 ⊂ G
such that G1 is an open cover for K.
Let S be the Borel σ-algebra on S, i.e., let S be the σ-algebra generated
by the open sets in S. A measure on the measurable space (S, S) is often
simply referred to as a measure on (S, d).
Definition 9.3.1: Let {µn }n≥1 and µ be sub-probability measures on a
metric space (S, d), i.e., {µn }n≥1 and µ are measures on (S, S) such that
µn (S) ≤ 1 for all n ≥ 1 and µ(S) ≤ 1. Then {µn }n≥1 converges vaguely to
µ (written as µn −→v µ) if
%
%
f dµn → f dµ
(3.1)
for all f ∈ C0 (S), where C0 (S) ≡ {f | f : S → R, f is continuous and
for every # > 0, there exists a compact set K such that |f (x)| < # for all
x )∈ K}.
Definition 9.3.2: Let {µn }n≥1 and µ be probability measures on a metric
space (S, d). Then {µn }n≥1 converges in distribution or converges weakly
to µ (written as µn −→d µ) if
%
%
(3.2)
f dµn → f dµ
for all f ∈ CB (S) ≡ {f | f : S → R, f is continuous and bounded}.
300
9. Convergence in Distribution
Recall that a sequence {xn }n≥1 in a metric space (S, d) is called Cauchy
if for every # > 0, there exists N$ such that n, m > N$ ⇒ d(xn , xm ) < #.
A metric space (S, d) is complete if every Cauchy sequence {xn }n≥1 in S
converges in S, i.e., given a Cauchy sequence {xn }n≥1 , there exists an x in
S such that d(xn , x) → 0 as n → ∞.
Example 9.3.1: For any k ∈ N, Rk with the Euclidean metric is complete
but the set of all rational vectors Qk with the Euclidean metric d(x, y) ≡
7x − y7 is not complete. The set C[0, 1] of all continuous functions on [0, 1]
is complete with the supremum metric d(f, g) = sup{|f (u) − g(u)| : 0 ≤
u ≤ 1} but the set of all polynomials on [0, 1] is not complete under the
same metric.
Recall that a set D is called dense in (S, d) if B(x, #) ∩ D )= ∅ for all
x ∈ S and for all # > 0, where B(x, #) is the open ball with center at x and
radius #. Also, (S, d) is called separable if there exists a countable dense set
D ⊂ S.
Definition 9.3.3: A metric space (S, d) is called Polish if it is complete
and separable.
Example 9.3.2: All Euclidean spaces with the Euclidean metric as well
as with the Lp metric for 1 ≤ p ≤ ∞, are complete. The space C[0, 1] of
continuous functions on [0,1] with the supremum metric is complete. All
Lp -spaces over measure spaces with a σ-finite measure and a countably
generated σ-algebra, 1 ≤ p ≤ ∞, are complete (cf. Chapter 3).
The following theorem gives several equivalent conditions for weak convergence of probability measures on a Polish space.
Theorem 9.3.1: Let (S, d) be Polish and {µn }n≥1 , µ be probability measures. Then the following are equivalent:
(i) µn −→d µ.
(ii) For any open set G, lim inf µn (G) ≥ µ(G).
n→∞
(iii) For any closed set C, lim sup µn (C) ≤ µ(C).
n→∞
(iv) For all B ∈ S such that µ(∂B) = 0,
lim µn (B) = µ(B),
n→∞
where ∂B is the boundary of B, i.e., ∂B = {x : for all # > 0,
B(x, #) ∩ B )= ∅, B(x, #) ∩ B c )= ∅}.
(v) &For every &uniformly continuous and bounded function f : S → R,
f dµn → f dµ.
9.3 Weak convergence on metric spaces
301
The proof uses the following fact.
Proposition 9.3.2: For every open set G in a metric space (S, d), there
exists a sequence {fn }n≥1 of bounded continuous functions from S to [0,1]
such that as n ↑ ∞, fn (x) ↑ IG (x) for all x ∈ S.
Proof: Let Gn ≡ {x : d(x, Gc ) > n1 } where for any set A in (S, d),
d(x, A) ≡ inf{d(x, y) : y ∈ A}. Then since G is open, d(x, Gc ) > 0 for all x
in G. Thus Gn ↑ G. Let for each n ≥ 1,
fn (x) ≡
d(x, Gc )
, x ∈ S.
d(x, Gc ) + d(x, Gn )
(3.3)
Check that (Problem 9.10) for each n, fn (x) is continuous on S, fn (x) = 1
on Gn and 0 on Gc , 0 ≤ fn (x) ≤ 1 for all x in S. Further fn (·) ↑ IG (·). !
Proof of Theorem 9.3.1: (i) ⇒ (ii): Let G be open. Choose {fn }n≥1 as
in Proposition 9.3.2. Then for j ∈ N,
%
%
%
µn (G) ≥ fj dµn ⇒ lim inf µn (G) ≥ lim inf fj dµn = fj dµ
n→∞
(by (i)). But limj→∞
Hence (ii) holds.
&
n→∞
fj dµ = µ(G), by the bounded convergence theorem.
(ii) ⇔ (iii): Suppose (ii) holds. Let C be closed. Then G = C c is open. So
by (ii),
lim inf µn (C c ) ≥ µ(C c ) ⇒ lim sup µn (C) ≤ µ(C),
n→∞
n→∞
since µn and µ are probability measures. Thus, (iii) holds. Similarly, (iii)
⇒ (ii).
(iii) ⇒ (iv): For any B ∈ S, let B 0 and B̄ denote, respectively, the interior
and the closure of B. That is, B 0 = {y : B(y, #) ⊂ B for some # > 0} and
B̄ = {y : for some {xn }n≥1 ⊂ B, limn→∞ xn = y}. Then, for any n ≥ 1,
and by (ii) and (iii),
µn (B 0 ) ≤ µn (B) ≤ µn (B̄)
µ(B 0 ) ≤ lim inf µn (B) ≤ lim sup µn (B) ≤ µ(B̄).
n→∞
n→∞
But ∂B = B̄ \ B 0 and so µ(∂B) = 0 implies µ(B 0 ) = µ(B̄). Thus,
limn→∞ µn (B) = µ(B).
(iv) ⇒ (v) ⇒ (i): This will be proved for the case where S is the real line.
For the general Polish case, see Billingsley (1968). Let F (x) ≡ µ((−∞, x])
and Fn (x) ≡ µn ((−∞, x]), x ∈ R, n ≥ 1. Let x be a continuity point of F .
Then µ({x}) = 0. Since if B = (−∞, x], then ∂B = {x}, by (iv),
Fn (x) = µn ((−∞, x]) → µ((−∞, x]) = F (x).
302
9. Convergence in Distribution
Thus, µn −→d µ. By Theorem 9.2.3, (i) holds and hence (v) holds.
(v) ⇒ (i): Note that in the proof of Theorem 9.2.2, the approximating
functions f1 and f2 were both uniformly continuous. Hence, the assertion
follows from Theorem 9.2.2 and Remark 9.2.1. This completes the proof of
Theorem 9.3.1.
!
The following example shows that the inequality can be strict in (ii) and
(iii) of the above theorem.
Example 9.3.3: Let X be a random variable. Set Xn = X+ n1 , Yn = X− n1 ,
n ≥ 1. Since Xn and Yn both converge to X w.p. 1, the distributions of
Xn and Yn converge to that of X.
Now suppose that there is a value x0 such that P (X = x0 ) > 0. Then,
!
"
≡ P (Xn < x0 )
µn (−∞, x0 )
'
1(
= P X < x0 −
n!
"
→ P (X < x0 ) = µ (−∞, x0 ) ,
!
"
µn (−∞, x0 ]
and
!
"
νn (−∞, x0 )
=
P (Xn ≤ x0 )
'
!
"
1(
= P X ≤ x0 −
→ P (X < x0 ) < µ (−∞, x0 ] ,
n
≡ P (Yn < x0 )
'
1(
= P X < x0 +
→ P (X ≤ x0 ) > P (X < x0 ).
n
Note that µn and νn both converge in distribution to µ. However, for the
closed set (−∞, x0 ],
!
"
!
"
lim sup µn (−∞, x0 ] < µ (−∞, x0 ]
n→∞
and for the open set (−∞, x0 ),
!
"
!
"
lim inf νn (−∞, x0 ) > µ (−∞, x0 ) .
n→∞
Remark 9.3.1: Convergent sequences of probability distributions arise
in a natural way in parametric families in mathematical statistics. For example, let µ(·; θ) denote the normal distribution with mean θ and variance
1. Then, θn → θ ⇒ µn (·) ≡ N (θn , 1) −→d N (θ, 1) ≡ µ(·). Similarly, let
θ = (λ, Σ), where λ ∈ Rk and Σ is a k×k positive definite matrix. Let µ(·; θ)
be the k-variate normal distribution with mean λ and variance covariance
9.4 Skorohod’s theorem and the continuous mapping theorem
303
matrix Σ. Then, µ(·; θ) is continuous in θ in the sense that if θn → θ in
the Euclidean metric, then µ(·; θn ) −→d µ(·; θ). Most parametric families
in mathematical statistics possess this continuity property.
Definition 9.3.4: Let {µn }n≥1 be a sequence of probability measures on
(S, S), where S is a Polish space and S is the Borel σ-algebra on S. Then
{µn }n≥1 is called tight if for any # > 0, there exists a compact set K such
that
sup µn (K c ) < #.
(3.4)
n≥1
A sequence of S-valued random variables {Xn }n≥1 is called tight or stochastically bounded if the sequence {µXn }n≥1 is tight, where µXn is the probability distribution of Xn on (S, S).
If S = Rk , k ∈ N, and {Xn }n≥1 is a sequence of k-dimensional random
vectors, then, by Definition 9.3.4, {Xn }n≥1 is tight if and only if for every
# > 0, there exists M ∈ (0, ∞) such that
sup P (7Xn 7 > M ) < #,
(3.5)
n≥1
where 7 · 7 denotes the usual Euclidean norm on Rk . Furthermore, if Xn =
(Xn1 , . . . , Xnk ), n ≥ 1, then the tightness of {Xn }n≥1 is equivalent to the
tightness of the k sequences of random variables {Xnj }n≥1 , j = 1, . . . , k
(Problem 9.9).
An analog of Theorem 9.2.4 holds for probability measures on (S, S)
when S is Polish.
Theorem 9.3.3: (Prohorov-Varadarajan theorem). Let {µn }n≥1 be a sequence of probability measures on (S, S) where S is a Polish space and S
is the Borel σ-algebra on S. Then, {µn }n≥1 is tight iff given any subsequence {µni }i≥1 ⊂ {µn }n≥1 , there exist a further subsequence {µmi }i≥1 of
{µni }i≥1 and a probability measure µ on (S, S) such that
µmi −→d µ
as i → ∞.
(3.6)
For a proof of this result, see Section 1.6 of Billingsley (1968). This result
is useful for proving weak convergence in function spaces (e.g., see Chapter
11 where a functional central limit theorem is stated).
9.4 Skorohod’s theorem and the continuous
mapping theorem
If {Xn }n≥1 is a sequence of random variables that converge to a random
variable X in probability, then Xn does converge in distribution to X (cf.
304
9. Convergence in Distribution
Proposition 9.1.1). Here is another proof of this fact using Theorem 9.2.3.
Let f : R → R be bounded and continuous. Then Xn → X in probability
implies that f (Xn ) → f (X) in probability (Problem 9.13) and by the BCT,
%
%
f dµn = Ef (Xn ) → Ef (X) = f dµ,
where µn (·) = P (Xn ∈ ·), n ≥ 1 and µ(·) = P (X ∈ ·). Hence, µn −→d µ. In
particular, it follows that if Xn → X w.p. 1, then Xn −→d X. Skorohod’s
theorem is a sort of converse to this. If µn −→d µ, then there exist random
variables Xn , n ≥ 1 and X such that Xn has distribution µn , n ≥ 1 and
X has distribution µ and Xn → X w.p. 1.
Theorem 9.4.1: (Skorohod’s theorem). Let {µn }n≥1 , µ be probability
measures on (R, B(R)) such that µn −→d µ. Let
Xn (ω) ≡ sup{t : µn ((−∞, t]) < ω}
X(ω) ≡ sup{t : µ((−∞, t]) < ω}
for 0 <
! ω " < 1. Then, Xn and X are random variables on
((0, 1), B (0, 1) , m) where m is the Lebesgue measure. Furthermore, Xn
has distribution µn , n ≥ 1, X has distribution µ and Xn → X w.p. 1.
Proof: For any cdf F (·), let F −1 (u) ≡ sup{t : F (t) < u}. Then for any
u ∈ (0, 1) and t ∈ R, it can be verified that F −1 (u) ≤ t ⇒ F (t) ≥ u ⇒
F −1 (u) ≤ t and hence, if U is a Uniform (0,1) random variable (Problem
9.11),
P (F −1 (U ) ≤ t) = P (U ≤ F (t)) = F (t),
implying that
F −1 (U )
has cdf F (·).
This shows that Xn , n ≥ 1 and X have the asserted distributions. It
remains to show that
Xn (ω) → X(ω)
w.p. 1
Fix ω ∈ (0, 1) and let y < X(ω) be such that µ({y}) = 0. Now y <
X(ω) ⇒ µ((−∞, y]) < ω. Since µn −→d µ and µ({y}) = 0, µn ((−∞, y]) →
µ((−∞, y]) and so µn ((−∞, y]) < ω for large n. This implies that Xn (ω) ≥
y for large n and hence lim inf n→∞ Xn (ω) ≥ y. Since this is true for all
y < X(ω) with µ({y}) = 0, and since the set of all such y’s is dense in R,
it follows that
lim inf Xn (ω) ≥ X(ω)
n→∞
for all ω
in (0, 1).
Next fix # > 0 and y > X(ω + #), and µ({y}) = 0. Then µ((−∞, y]) ≥
ω + #. Since µ({y}) = 0, µn ((−∞, y]) → µ((−∞, y]). Thus, for large n,
9.4 Skorohod’s theorem and the continuous mapping theorem
305
µn (−∞, y] ≥ ω. This implies that Xn (ω) ≤ y for large n and hence that
lim supn→∞ Xn (ω) ≤ y. Since this is true for all y > X(ω + #), µ({y}) = 0,
it follows that
lim sup Xn (ω) ≤ X(ω + #) for every
#>0
n→∞
and hence that
lim sup Xn (ω) ≤ X(ω+).
n→∞
Thus it has been shown that for all 0 < ω < 1,
X(ω) ≤ lim inf Xn (ω) ≤ lim sup Xn (ω) ≤ X(ω+).
n→∞
n→∞
Since X(ω) is a nondecreasing function on (0, 1), it has at most a countable
number of discontinuities and so
lim Xn (ω) = X(ω)
n→∞
w.p. 1.
!
An immediate consequence of the above theorem is the continuity of
convergence in distribution under continuous transformations.
Theorem 9.4.2: (The continuous mapping theorem). Let {Xn }n≥1 , X be
random variables such that Xn −→d X. Let f : R → R be Borel measurable
such that P (X ∈ Df ) = 0, where Df is the set of discontinuities of f . Then
f (Xn ) −→d f (X). In particular, this holds if f : R → R is continuous.
Remark 9.4.1: It can be shown that for any f : R → R, the set Df =
{x : f is discontinuous at x} ∈ B(R) (Problem 9.12). Thus, {X ∈ Df } ∈ F,
and P (X ∈ Df ) is well defined.
Proof: By Skorohod’s theorem, there exist random variables {X̃n }n≥1 ,
X̃ defined on the Lebesgue space (Ω = (0, 1), B((0, 1)), m = Lebesgue
measure) such that X̃n =d Xn for n ≥ 1, X̃ =d X, and
X̃n → X̃
w.p. 1.
Let A = {ω : X̃n (ω) → X̃(ω)} and B = {ω : X̃(ω) )∈ Df }. Then,
P (A) = 1 = P (B) and so, for ω ∈ A ∩ B,
f (X̃n (ω)) → f (X̃(ω)).
Thus, f (X̃n ) → f (X̃) w.p. 1 and hence f (Xn ) −→d f (X).
!
Another easy consequence of Skorohod’s theorem is the Helly-Bray Theorem 9.2.3. Since X̃n → X̃ w.p. 1 and f is a bounded continuous function,
then f (X̃n ) → f (X̃) w.p. 1 and so by the bounded convergence theorem
Ef (X̃n ) → Ef (X̃).
306
9. Convergence in Distribution
Since X̃n =d Xn for n ≥ 1 and X̃ =d X, this is the same as saying that
&
That is, f dµn →
P (X ∈ ·).
&
Ef (Xn ) → Ef (X).
f dµ, where µn (·) = P (Xn ∈ ·), n ≥ 1 and µ(·) =
Remark 9.4.2: Skorohod’s theorem is valid for any Polish space. Suppose
that S is a Polish space and {µn }n≥1 and µ are probability measures on
(S, S), where S is the Borel σ-algebra on S, such that µn −→d µ. Then
on the Lebesgue space
!there exist random variables Xn and X defined
"
(0, 1), B((0, 1)), m = the Lebesgue measure such that for all n ≥ 1, Xn
has distribution µn , X has distribution µ and Xn → X w.p. 1. For a proof,
see Billingsley (1968).
9.5 The method of moments and the moment
problem
9.5.1
Convergence of moments
Let {Xn }n≥1 and X be random variables such that Xn converges to
X in distribution. Suppose for some k > 0, E|Xn |k < ∞ for each
n ≥ 1. A natural question is: When does this imply E|X|k < ∞ and
limn→∞ E|Xn |k = EX k ?
By Skorohod’s theorem, one can assume w.l.o.g. that Xn → X w.p. 1.
Then the results from Section 2.5 yield the following.
Theorem 9.5.1: Let {Xn }n≥1 and X be a collection of random variables such that Xn −→d X. Then, for each 0 < k < ∞, the following are
equivalent:
(i) E|Xn |k < ∞ for each n ≥ 1, E|X|k < ∞ and E|Xn |k → E|X|k .
(ii) {|Xn |k }n≥1 are uniformly integrable, i.e., for every # > 0, there exists
an M$ ∈ (0, ∞) such that
sup E(|Xn |k I(|Xn | > M$ )) < #.
n≥1
Remark 9.5.1: Recall that a sufficient condition for the uniform integrability of {|Xn |k }n≥1 is that
sup E|Xn |, < ∞
n≥1
for some
2 ∈ (k, ∞).
Example 9.5.1: Let Xn have the distribution P (Xn = 0) = 1 − n1 ,
P (Xn = n) = n1 for n = 1, 2, . . .. Then Xn −→d 0 but EXn = 1 does not
go to 0. Note that {Xn }n≥1 is not uniformly integrable here.
9.5 The method of moments and the moment problem
307
Remark 9.5.2: In Theorem 9.5.1, under hypothesis (ii), it follows that
E|Xn |, → E|X|,
for all real numbers 2 ∈ (0, k)
and
EXnp → EX p
9.5.2
for all positive integers p, 0 < p ≤ k.
The method of moments
Suppose that {Xn }n≥1 are random variables such that limn→∞ EXnk =
mk < ∞ exists for all integers k = 0, 1, 2, . . .. Does there exist a random
variable X such that Xn −→d X? The answer is ‘yes’ provided that the
moments {mk }k≥1 determine the distribution of the random variable X
uniquely.
Theorem 9.5.2: (Frechét-Shohat theorem). Let {Xn }n≥1 be a sequence
of random variables such that for each k ∈ N, limn→∞ EXnk ≡ mk exists
and is finite. If the sequence {mk }k≥1 uniquely determines the distribution
of a random variable X, then Xn −→d X.
Proof: Suppose that for some subsequence {nj }j≥1 , the probability
distributions {µnj }j≥1 of {Xnj }j≥1 converge vaguely to some µ. Since
{EXn2j }j≥1 is a bounded sequence, {µnj }j≥1 is tight. Hence µ must be
a probability distribution and by Theorem 9.5.1, the moments of µ must
coincide with {mk }k≥1 . Since the sequence {mk }k≥1 determines the distribution uniquely, µ is unique and is the unique vague limit point of {µn }n≥1
and by Theorem 9.2.6, µn −→d µ. So if X is a random variable with distribution µ, then Xn −→d X.
!
The above “method of moments” used to be a tool for proving convergence in distribution, e.g., for proving asymptotic normality of the Binomial
(n, p) distribution. Since it requires existence of all moments, this method
is too restrictive and is of limited use. However, the question of when do
the moments determine a distribution is an interesting one and is discussed
next.
9.5.3
The moment problem
Suppose {mk }k≥1 is a sequence of real numbers such that there is at least
one probability measure µ on (R, B(R)) such that for all k ∈ N
%
mk = xk µ(dx).
Does the sequence {mk }k≥1 determine µ uniquely? This is a part of
the Hamburger-moment problem, which includes seeking conditions under
308
9. Convergence in Distribution
which a given sequence {mk }k≥1 is the moment sequence of a probability
distribution.
The answer to the uniqueness question posed above is ‘no,’ as the following example shows.
Example 9.5.2: Let Y be a standard normal random variable and let
X = exp(Y ). Then X is said to have the log-normal distribution (which
is a misnomer as a more appropriate name would be something like exponormal). Then X has the probability density function
/ 1 1
√
exp(−[log x]2 /2)
x>0
2π x
(5.1)
f (x) =
0
otherwise.
Consider now the family of functions
fα (x) = f (x)(1 + α sin(2π log x))
with |α| ≤ 1. It is clear that fα (x) ≥ 0. Further, it is not difficult to check
that for any α ∈ [−1, 1],
%
%
xr fα (x)dx = xr f (x)dx
&
for all r = 0, 1, 2, . . .. Thus, the sequence of moments mk ≡ xk f (x)dx
does not determine the log-normal distribution (5.1).
A sufficient condition for uniqueness is Carleman’s condition:
∞
)
−1/2k
m2k
k=1
= ∞.
(5.2)
For a proof, see Feller (1966) or Shohat and Tamarkin (1943).
Remark 9.5.3: A special case of the above is when
1/2k
lim sup m2k
k→∞
= r ∈ [0, ∞).
(5.3)
In particular, if {mk }k≥1 is a moment sequence, then within the class of
probability distributions µ that have bounded support and have {mk }k≥1
as their moment sequence, µ is uniquely determined. This is so since if
M ≡ sup{x : µ([−x, x]) < 1}, then (Problem 9.27)
1/2k
m2k
→M
as
k → ∞.
(5.4)
generally, if µ is a probability distribution on R such that
& More
etx dµ(x) < ∞ for all |t| < δ for some δ > 0, then all its moments are
finite and (5.2) holds and hence µ is uniquely determined by its moments
9.6 Problems
309
(Problem 9.28). In particular, the normal and Gamma distributions are
determined by their moments.
Remark 9.5.4: If {mk }k≥1 is a moment sequence of a distribution µ
concentrated on [0, ∞), the problem of determining µ uniquely is known as
the Stieltjes
√ moment problem. If X is a random variable with distribution µ,
let Y = δ X, where δ is independent of X and takes two values {−1, +1}
with equal probability. Then Y has a symmetric distribution and for all
k ≥ 1,
E|Y |2k = E|X|k .
√
The distribution of Y is uniquely determined (and hence that of X and
hence that of X) if
(EY 2k )1/2k
lim sup
<∞
2k
k→∞
i.e.,
1/2k
lim sup
k→∞
mk
2k
< ∞.
9.6 Problems
9.1 If Xn −→d X0 and P (X0 = c) = 1 for some c ∈ R, then Xn −→p c.
9.2 If a cdf is continuous on R, then it is uniformly continuous on R.
(Hint: Use the facts that (i) given any # > 0, there exists M ∈ (0, ∞)
such that F (−x) + 1 − F (x) < # for all x > M , and (ii) F is uniformly
continuous on [−M, M ].)
9.3 Prove parts (ii) and (iii) of Theorem 9.1.6.
9.4 Let {µn }n≥1 , µ be probability measures on (R, B(R)) such that
µn −→v µ. Show that µn −→d µ.
9.5 Show that the function F̃ (·), defined in (2.5), is nondecreasing and
right continuous and that the function F (x) ≡ inf{F (r) : r ≥ x, r ∈
D} is nondecreasing and left continuous.
9.6 Give another proof of the ‘if’ part of Theorem 9.2.6 by using Theorem 9.2.1 and showing that for any f : R → R continuous and
bounded and any subsequence {ni&}i≥1 , there exist& a further subsequence
& {mj }j≥1 such that amj = f dµmj → a = f dµ and hence,
an ≡ f dµn → a.
9.7 If {Xn }n≥1 is stochastically bounded and Yn −→p 0, then show that
Xn Yn −→p 0.
310
9. Convergence in Distribution
9.8 (a) Let Xn ∼ N (an , bn ) for n ≥ 0, where bn > 0 for n ≥ 1, b0 ∈
[0, ∞) and an ∈ R for all n ≥ 0.
(i) Show that if an → a0 , bn → b0 as n → ∞, then Xn −→d X0 .
(ii) Show that if Xn −→d X0 as n → ∞, then an → a0 and
bn → b0 .
(Hint: First show that {bn }n≥1 is bounded and then that
{an }n≥1 is bounded and finally, that a0 and b0 are the only
limit points of {an }n≥1 and {bn }n≥1 , respectively.)
(b) For n ≥ 1, let Xn ∼ N (an , Σn ) where an ∈ Rk and Σn is a k × k
positive definite matrix, k ∈ N. Then, {Xn }n≥1 is stochastically
bounded if and only if {7an 7}n≥1 and {7Σn 7}n≥1 are bounded.
9.9 Let {Xjn }n≥1 , j = 1, . . . , k, k ∈ N be sequences of random variables.
Let Xn = (X1n , . . . , Xkn ), n ≥ 1. Show that the sequence of random
vectors {Xn }n≥1 is tight in Rk iff for each 1 ≤ j ≤ k, the sequence of
random variables {Xjn }n≥1 is tight in R.
9.10 Let (S, d) be a metric space.
(a) For any set A ⊂ S, let
d(x, A) ≡ inf{d(x, y) : y ∈ A}.
Show that for each A, d(·, A) is continuous on S.
(b) Let fn (·) be as in (3.3). Show that fn (·) is continuous on S and
fn (·) ↑ IG (·).
(Hint: Note that d(x, Gc ) + d(x, Gn ) > 0 for all x in S. )
9.11 For any cdf F , let F −1 (u) ≡ sup{t : F (t) < u}, 0 < u < 1. Show that
for any 0 < u0 < 1 and t0 in R,
F −1 (u0 ) ≤ t0 ⇔ F (t0 ) ≥ u0 .
(Hint: For ⇒, use the right continuity of F and for ⇐, use the
definition of sup.)
9.12 For a function f : Rk → R (k ∈ N), define Df = {x ∈ Rk : f is
discontinuous at x}. Show that Df ∈ B(Rk ).
9.13 If Xn −→p X and f : R → R is continuous, then f (Xn ) −→p f (X).
9.14 (The Delta method). Let {Xn }n≥1 be a sequence of random variables
and {an }n≥1 ⊂ (0, ∞) be a sequence of constants such that an → ∞
as n → ∞ and
an (Xn − θ) −→d Z
9.6 Problems
311
for some random variable Z and for some θ ∈ R. Let H : R → R be
a function that is differentiable at θ with derivative c. Show that
"
!
an H(Xn ) − H(θ) −→d cZ.
(Hint: By Taylor’s expansion, for any x ∈ R,
H(x) = H(θ) + c(x − θ) + R(x)(x − θ)
where R(x) → 0 as x → θ. Now use Problem 9.7 and Slutsky’s
theorem.)
9.15 Let X be a random variable with P (X = c) > 0 for some c ∈ R.
Give examples of two sequences {Xn }n≥1 and {Yn }n≥1 satisfying
Xn −→d X and Yn −→d X such that
lim P (Xn ≤ c) = P (X ≤ c)
n→∞
but
lim P (Yn ≤ c) )= P (X ≤ c).
n→∞
(Hint: Take Xn =d X, n ≥ 1 and Yn =d X + n1 , n ≥ 1, say.)
9.16 Let {µn }n≥1 , µ be probability measures on (R, B(R)) such that
%
%
f dµn → f dµ for all f ∈ F
for some collection F of functions from R to R specified below. Does
µn −→d µ if
(a) F = {f | f : R → R, f is bounded and continuously differentiable on R with a bounded derivative} ?
(b) F = {f | f : R → R, f is bounded and infinitely differentiable
on R} ?
(c) &F ≡ {f | &f is a polynomial with real coefficients} and
|x|k µ(dx) + |x|k dµn (dx) < ∞ for all n, k ∈ N ?
9.17 For any two cdfs F , G on R, define
dL (F, G)
=
inf{# > 0 : G(x − #) − # < F (x)
< G(x + #) + # for all x ∈ R}.
(6.1)
Verify that dL defines a metric on the collection of all probability
distributions on (R, B(R)). The metric dL is called the Levy metric.
9.18 Let {µn }n≥1 , µ be probability measures on (R, B(R)), with the corresponding cdfs {Fn }n≥1 and F . Show that µn −→d µ iff
dL (Fn , F ) → 0
as
n → ∞.
312
9. Convergence in Distribution
9.19 (a) Show that for any two cdfs F , G on R,
where
dL (F, G) ≤ dK (F, G),
(6.2)
dK (F, G) = sup |F (x) − G(x)|
(6.3)
x∈R
(dK is called the Kolmogorov distance or metric between F and
G).
(b) Give examples where (i) equality holds in (6.2), and (ii) where
strict inequality holds in (6.2).
9.20 Let {µn }n≥1 , µ be probability measures on (R, B(R)) such that
µn −→d µ. Let {fa : a ∈ R} be a collection of bounded functions
from R → R such that µ(Dfa ) = 0 for all a ∈ R and |fa (x) − fb (x)| ≤
h(x)|b−a|
for all a, b ∈ R and for some h : R → (0, ∞) with µ(Dh ) = 0
&
and |h|dµ < ∞. Show that
%
J%
J
J
J
sup J fa dµn − fa dµJ → 0 as n → ∞.
a∈R
9.21 Let {Xn }n≥1 , X be k-dimensional random vectors such that Xn −→d
X. Let {An }n≥1 be a sequence of r × k-matrices of real numbers and
{bn }n≥1 ⊂ Rr , r ∈ N. Define Yn = An Xn + bn and Zn = An Xn XnT
where XnT denotes the transpose of X. Suppose that An → A and
bn → b. Show that
(a) Yn −→d Y , where Y =d AX + b,
(b) Zn −→d Z, where Z =d AXX T .
(Note: Here convergence in distribution of a sequence of r×k matrixvalued random variables may be interpreted by considering the corresponding rk-dimensional random vectors obtained by concatenating
the rows of the r × k matrix side-by-side and using the definition of
convergence in distribution for random vectors.)
9.22 Let µn , µ be probability measures on a countable set D ≡ {aj }j≥1 ⊂
R. Let pnj = µn ({aj }), j ≥ 1, n ≥ 1 and pj =
$µ({aj }). Show that, as
n → ∞, µn −→d µ iff for all j, pnj → pj iff j |pnj − pj | → 0.
9.23 Let Xn ∼ Binomial(n, pn ), n ≥ 1. Suppose npn → λ, 0 < λ < ∞.
Show that Xn → X, where X ∼ Poisson(λ).
9.24 (a) Let Xn ∼ Geo(pn ), i.e. P (Xn = r) = qnr−1 pn , r ≥ 1, where
0 < pn < 1 and qn = 1 − pn . Show that as n → ∞ if pn → 0
then
(6.4)
pn Xn −→d X,
where X ∼ Exponential (1).
9.6 Problems
313
(b) Fix a positive integer k. Let for n ≥ 1,
.
r − 1 r−1 r−k
p q
, r≥k
pnr =
k−1 n n
where 0 < pn < 1, qn = 1 − pn .
(i) Verify
for each n, {pnr }r≥k is a probability distribution,
$that
∞
i.e., r=k pnr = 1.
(ii) Let Yn be a random variable with distribution P (Yn = r) =
pnr , r ≥ k. Show that as n → ∞ if pn → 0 then {pn Yn }n≥1
converges in distribution and identify the limit.
9.25 Let {Fn }n≥1 and {Gn }n≥1 be two sequences of cdfs on R such that,
as n → ∞, Fn −→d F , Gn −→d G where F and G are cdfs on R.
(a) Show that for each n ≥ 1,
Hn (x) ≡ (Fn ∗ Gn )(x) ≡
%
R
Fn (x − y)dG(y)
is a cdf on R.
(b) Show that, as n → ∞, Hn −→d H where H = F ∗ G, by direct
calculation and by Skorohod’s theorem (i.e., Theorem 9.4.1) and
Problem 7.14.
9.26 Let Yn have discrete uniform distribution on the integers {1, 2, . . . , n}.
Show that Xn ≡ Ynn and let X ∼ Uniform (0,1) random variable.
Show that Xn −→d X using three different methods as follows:
(a) Helly-Bray theorem,
(b) the method of moments,
(c) using the cdfs.
9.27 Establish (5.4) in Remark 9.5.3.
1/2k
(Hint: Show that for any # > 0, m2k
"1/2k
.)
M − #})
!
≥ (M − #) µ({x : |x| >
9.28 Let
& t|x|µ be a probability distribution on R such that φ(t0 ) ≡
e dµ(x) < ∞ for some t0 > 0. Show that Carleman’s condition
(5.2) is satisfied.
(Hint: Show that by Cramer’s inequality (Corollary 3.1.5)
% ∞
x2k−1 e−t0 x dx
m2k ≤ 2k φ(t0 )
0
= φ(t0 )2k t−2k
(2k − 1)!
0
√
and then use Stirling’s approximation: ‘n! ∼ 2π nn+1/2 e−n as n →
∞’ (Feller (1968)).)
314
9. Convergence in Distribution
9.29 (Continuity theorem for mgfs). Let {Xn }n≥1 and X be random variables such that for some δ > 0, the mgf MXn (t) ≡ E(etXn ) and
MX (t) ≡ E(etX ) are finite for all |t| < δ. Further, let MXn (t) →
MX (t) for all |t| < δ. Show that Xn −→d X.
(Hint: Show first that {Xn }n≥1 is tight and the fact that by Remark
9.5.3, the distribution of X is determined by MX (·).)
9.30 Let Xn ∼ Binomial(n, pn ). Suppose npn → ∞. Let Yn = √Xn −npn ,
npn (1−pn )
n ≥ 1. Show that Yn −→d Y , where Y ∼ N (0, 1).
(Hint: Use Problem 9.29.)
9.31 Use the continuity theorem for mgfs to establish (6.4) and the convergence in distribution of {pn Yn }n≥1 in Problem 9.24 (b)(ii).
9.32 Let {Xj , Vj : j ≥ 1} be a collection of random variables on some
probability space (Ω, F, P ) such that P (Vj ∈ N) = 1 for all j, Vj → ∞
w.p. 1 and Xj −→d X. Suppose that for each j, the random variable
Vj is independent of the sequence {Xn }n≥1 . Show that XVj −→d X.
(Hint: Verify that for any bounded continuous function h : R → R,
J
J
JEh(XVj ) − Eh(X)J ≤ 27h7P (Vj ≤ N ) + ∆N P (Vj > N )
J
J
where ∆N = sup JEh(Xk )−Eh(X)J and 7h7 = sup{|h(x)| : x ∈ R}.)
k>N
9.33 Let Xn −→ X and xn → x as n → ∞. If P (X = x) = 0, then show
that P (Xn ≤ xn ) → P (X ≤ x).
d
9.34 (Weyl’s equi-distribution property). Let 0 < α < 1 be an irrational number. Let µn (·) be the measure defined by µn (A) ≡
$n−1
1
d
j=0 IA (jα mod 1), A ∈ B([0, 1]). Show that µn −→ Uniform
n
(0,1).
&
&1
(Hint: Verify that f dµn → 0 f (x)dx for all f of the form
f (x) = eι2πkx , k ∈ Z and then approximate a bounded continuous
function f by trigonometric polynomials (cf. Section 5.6).)
9.35 (a) Let {Xi }i≥1 be iid random variables with Uniform (0,1) distribution. Let Mn = max1≤i≤n Xi . Show that n(1 − Mn ) −→d
Exponential (1).
(b) Let {Xi }i≥1 be iid random variables such that λ ≡ sup{x :
P (X1 ≤ x) < 1} < ∞, P (X1 = λ) = 0, and P (λ − x <
X1 < λ) ∼ cxα L(x) as x ↓ 0 where α > 0, c > 0, and L(·)
is slowly varying at 0, i.e., limx↓0 L(cx)
L(x) = 1 for all 0 < c < ∞.
Let Mn = max1≤i≤n Xi . Show that Yn ≡ n1/α (λ − Mn ) converges in distribution as n → ∞ and identify the limit.
9.6 Problems
315
9.36 Let {Xi }i≥1 be iid positive random variables such that P (X1 < x) ∼
cxα L(x) as x ↓ 0, where c, α and L(·) are as in Problem 9.35. Let
X1n ≡ min1≤i≤n Xi . Find {an }n≥1 ⊂ R+ such that Zn ≡ an X1n
converges in distribution to a nondegenerate limit and identify the
distribution. Specialize this to the cases where X1 has a pdf fX (·)
such that
(a) limx↓0 fX (x) = fX (0+) exists and is positive,
(b) X1 has a Beta (a, b) distribution.
10
Characteristic Functions
10.1 Definition and examples
Characteristic functions play an important role in studying (asymptotic)
distributional properties of random variables, particularly for sums of independent random variables. The main uses of characteristic functions are
(1) to characterize the probability distribution of a given random variable,
and (2) to establish convergence in distribution of a sequence of random
variables and to identify the limit distribution.
Definition 10.1.1:
(i) The characteristic function of a random variable X is defined as
φX (t) = E exp(ιtx),
where ι =
√
−1.
t ∈ R,
(1.1)
(ii) The characteristic function of a probability measure µ on (R, B(R))
is defined as
%
µ̂(t) = exp(ιtx)µ(dx), t ∈ R.
(1.2)
(iii) Let F be cdf on R. Then, the characteristic function of F is defined
as µ̂F (·), where µF is the Lebesgue-Stieltjes measure corresponding
to F .
318
10. Characteristic Functions
Note that the integrands in (1.1) and (1.2) are complex valued. Here
and elsewhere, for any f1 , f2 ∈ L1 (Ω, F, µ), the integral of (f1 + ιf2 ) with
respect to µ is defined as
%
%
%
(1.3)
(f1 + ιf2 )dµ = f1 dµ + ι f2 dµ.
Thus, the characteristic function of X is given by φX (t) = (E cos tX) +
ι(E sin tX), t ∈ R. Since the functions cos tx and sin tx are bounded for
every t ∈ R, φX (t) is well defined for all t ∈ R. Furthermore, φX (0) = 1
and for any t ∈ R,
|φX (t)|
:
;1/2
(E cos tX)2 + (E sin tX)2
;1/2
:
≤
E(cos tX)2 + E(sin tX)2
≤ 1.
=
(1.4)
If equality holds in (1.4), i.e., if |φX (t0 )| = 1 for some t0 )= 0, then the
random variable is necessarily discrete, as shown by the following proposition.
Proposition 10.1.1: Let X be a random variable with characteristic function φX (·). Then the following are equivalent:
(i) |φX (t0 )| = 1 for some t0 )= 0.
(ii) There exist a ∈ R, h )= 0 such that
!
"
P X ∈ {a + jh : j ∈ Z} = 1.
(1.5)
Proof: Suppose that (i) holds. Since |φX (t0 )| = 1, there exists a0 ∈ R
such that
φX (t0 ) = eιa0 , i.e., e−ιa0 φX (t0 ) = 1.
Let a = a0 /t0 . Since the characteristic function of (X − a) is given by
e−ιat φX (t), it follows that E exp(ιt0 (X − a)) = 1. Equating the real parts,
one gets
(1.6)
E cos t0 (X − a) = 1.
Since | cos θ| ≤ 1 for all θ and cos θ = 1 if and only if θ = 2πn for some
n ∈ Z, (1.6) implies that
"
!
P t0 (X − a) ∈ {2πj : j ∈ Z} = 1.
(1.7)
Therefore, (ii) holds with h = |t2π0 | and with a = a0 /t0 .
For the converse, note that with pj = P (X = a + jh), j ∈ Z,
φX (t) =
)
j∈Z
!
"
exp ιt(a + jh) pj , t ∈ R,
10.1 Definition and examples
J ! "J
J = 1.
and hence JφX 2π
h
319
!
Definition 10.1.2: A random variable X satisfying (1.5) for some a ∈ R
and h > 0 is called a lattice random variable. In this case, the distribution
of X is also called lattice or arithmetic. If X is a nondegenerate lattice
random variable, then the largest h > 0 for which (1.5) holds is called the
span (of the probability distribution or of the characteristic function) of X.
An inspection of the proof of Proposition 10.1.1 shows that for a lattice
random variable X with span h > 0, its characteristic function satisfies the
relation
J
J
JφX (2πj/h)J = 1 for all j ∈ Z.
(1.8)
In particular, this implies that lim sup|t|→∞ |φX (t)| = 1. The next result
shows that characteristic functions of random variables with absolutely
continuous cdfs exhibit a very different limit behavior.
Proposition 10.1.2: Let X be a random variable with cdf F and characteristic function φX . If F is absolutely continuous, then
lim |φX (t)| = 0.
|t|→∞
(1.9)
Proof: Since F is absolutely continuous, the probability distribution µX
of X has a density, say f , w.r.t. the Lebesgue measure m on R, and
%
φX (t) = exp(ιtx)f (x)dx, t ∈ R.
Fix # ∈ (0, ∞). Since f ∈ L1 (R, B(R), m), by Theorem 2.3.14, there exists
$k
a step function f$ = j=1 cj I(aj bj ) with 1 ≤ k < ∞ and aj , bj , cj ∈ R for
j = 1, . . . , k, such that
%
|f − f$ |dm < #/2.
(1.10)
Next note that for any t )= 0,
J%
J
J
J
J exp(ιtx)f$ (x)dxJ
J k
%
J)
J
= J
cj
j=1
≤
k
)
j=1
|cj |
bj
aj
2
.
|t|
J
J
exp(ιtx)dxJJ
Hence, by (1.10) and (1.11), it follows that
J%
J
J
J
|φX (t)| = J exp(ιtx)f (x)dxJ
(1.11)
320
10. Characteristic Functions
≤
%
J%
J
J
J
|f − f$ |dx + J exp(ιtx)f$ (x)dxJ
#/2 + #/2
$k
for all |t| > t$ , where t$ = 4 j=1 |cj |/#. Thus (1.9) holds.
<
!
Note that the above proof shows that for any f ∈ L1 (m), the Fourier
transforms
%
ˆ
f (t) ≡ eιtx f (x)dx, t ∈ R
satisfies lim|t|→∞ fˆ(t) = 0. This is known as the Riemann-Lebesgue lemma
(cf. Proposition 5.7.1).
Next, some basic results on smoothness properties of the characteristic
function are presented.
Proposition 10.1.3: Let X be a random variable with characteristic function φX (·). Then, φX (·) is uniformly continuous on R.
Proof: For t, h ∈ R,
J
J
JφX (t + h) − φX (t)J
J :
;JJ
J
= JE exp(ι(t + h)X) − exp(ιtX) J
J
J
J
J
= JE exp(ιtX) · (eιhX − 1)J
J
J
≤ E JeιhX − 1J ≡ E∆(h), say,
where ∆(h) ≡ | exp(ιhX) − 1|. It is easy to check that |∆(h)| ≤ 2 and
limh→0 ∆(h) = 0 w.p. 1 (infact, everywhere). Hence, by the BCT, E∆(h) →
0 as h → 0. Therefore,
J
J
(1.12)
lim sup JφX (t + h) − φX (t)J ≤ lim E∆(h) = 0
h→0 t∈R
h→0
and hence, φX (·) is uniformly continuous on R.
!
Theorem 10.1.4: Let X be a random variable with characteristic function
φX (·). If E|X|r < ∞ for some r ∈ N, then φX (·) is r-times continuously
differentiable and
(r)
φX (t) = E(ιX)r exp(ιtX), t ∈ R.
(1.13)
For proving the theorem, the following bound on the function exp(ιx) is
useful.
Lemma 10.1.5: For any x ∈ R, r ∈ N,
J
J
/ r
0
r−1
)
J
|x| 2|x|r−1
(ιx)k JJ
J exp(ιx) −
≤ min
,
.
J
k! J
r! (r − 1)!
k=0
(1.14)
10.1 Definition and examples
Proof: Note that for any x ∈ R and for any r ∈ N,
8 dr
9
8 dr
9
dr
[exp(ιx)]
=
cos
x
+
ι
sin
x
dxr
dxr
dxr
= ιr exp(ιx).
321
(1.15)
Hence, by (1.15) and Taylor’s expansion (applied to the functions sin x and
cos x of a real variable x), for any x ∈ R and r ∈ N,
exp(ιx) =
r−1
)
(ιx)k
k=0
k!
+
(ιx)r
(r − 1)!
%
0
1
(1 − u)r exp(ιux)du.
Hence, for any x ∈ R and any r ∈ N,
J
J
r−1
)
J
(ιx)k JJ |x|r
J exp(ιx) −
≤
.
J
k! J
r!
(1.16)
(1.17)
k=0
Also, for r ≥ 2, using (1.17) with r replaced by r − 1, one gets
J
J
r−1
)
J
(ιx)k JJ
J exp(ιx) −
J
k! J
k=0
J
J
r−2
)
J
|x|r−1
(ιx)k JJ
J
≤ J exp(ιx) −
+
k! J (r − 1)!
k=0
≤
2|x|r−1
.
(r − 1)!
(1.18)
Hence, by (1.17) and (1.18), (1.14) holds for all x ∈ R, r ∈ N with r ≥ 2.
For r = 1, (1.14) follows from (1.17) and the bound ‘supx | exp(ιx)−1| ≤ 2.’
!
Lemma 10.1.5 gives two upper bounds on the difference between the
function exp(ιx) and its (r − 1)th order Taylor’s expansion around x = 0.
!
r"
is more accurate, whereas
For small values of |x|, the first bound i.e., |x|
r!
!
r−1 "
for large values of |x|, the other bound i.e., 2|x|
is more accurate.
(r−1)!
Proof of Theorem 10.1.4: Let µ denote the probability distribution of
X on (R, B(R)). Suppose that E|X| < ∞. First it will be shown that φX (·)
(1)
is differentiable with φX (t) = E{ιX exp(ιtX)}, t ∈ R. Fix t ∈ R. For any
h ∈ R, h )= 0,
h−1 [φX (t + h) − φX (t)]
%
8 exp(ιhx) − 1 9
=
exp(ιtx)
µ(dx)
h
R
%
%
9
8 exp(ιhx) − 1
− ιx µ(dx) + ιx exp(ιtx)µ(dx)
exp(ιtx)
=
h
R
R
322
10. Characteristic Functions
≡
%
ψh (x)µ(dx) +
%
R
ιx exp(ιhx)µ(dx), say.
(1.19)
By Lemma 10.1.5 (with r = 2),
|ψh (x)| ≤ min
3 h|x|2
2
4
, 2|x|
for all x ∈ R, h )= 0.
(1.20)
Hence,
limh→0 ψh (x) = 0 for each x ∈ R. Also, |ψh (x)| ≤ 2|x| and
&
|x|µ(dx) = E|X| < ∞. Hence, by the DCT,
%
lim ψh (x)µ(dx) = 0
h→0
and therefore, from (1.19), it follows that φX (·) is differentiable at t with
&
(1)
φX (t) = ιx exp(ιtx)µ(dx) = E{ιX exp(ιtX)}.
Next suppose that the assertion of the theorem is true for some r ∈ N.
To prove it for r + 1, note that for t ∈ R and h )= 0,
h−1 [φ(r) (t + h) − φr (t)] − E(ιX)r+1 exp(ιtX)
%
= (ιx)r ψh (x)µ(dx),
(1.21)
where ψh (x) is as in (1.19). Now using the bound (1.20) on ψh (x), the
DCT, and the condition E|X|r+1 < ∞, one can show that the right side of
(1.21) goes to zero as h → 0. By induction, this completes the proof of the
theorem.
!
Proposition 10.1.6: Let X and Y be two independent random variables.
Then
(1.22)
φX+Y (t) = φX (t) · φY (t), t ∈ R.
Proof: Follows from (1.3), Proposition 7.1.3, and the independence of X
and Y .
!
For a complex number z = a + ib, a, b ∈ R, let z̄ = a − ib denote the
complex conjugate of z and let
Re(z) = a and
Im(z) = b
(1.23)
respectively denote the real and the imaginary parts of z.
Corollary 10.1.7: Let X be a random variable with characteristic function φX . Then, φ̄X , |φX |2 and Re(φX ) are characteristic functions, where
Re(φX )(t) = Re(φX (t)), t ∈ R.
Proof: φ̄X (t) = E exp(−ιtX) = E exp(ιt(−X)), t ∈ R ⇒ φ̄X is the
characteristic function of −X. Next, let Y be an independent copy of X.
Then, by (1.22), φX−Y (t) = |φX (t)|2 , t ∈ R.
10.2 Inversion formulas
323
&
1
Finally, Re(φ
exp(ιtx)µ(dx), t ∈ R, where
X )(t) = 2 (φX (t) + φ̄XC(t)) =
B
!
µ(A) = 2−1 P (X ∈ A) + P (−X ∈ A) , A ∈ B(R).
Definition 10.1.3: A function φ : R → C, the set of complex numbers is said to be nonnegative definite if for any k ∈ N, t1 , t2 , . . . , tk ∈ R,
α1 , α2 , . . . , αk ∈ C
k )
k
)
αi ᾱj φ(ti − tj ) ≥ 0.
(1.24)
i=1 j=1
Proposition 10.1.7: Let φ(·) be the characteristic function of a random
variable X. Then φ is nonnegative definite.
Proof: Check that for k, {ti }, {αi } as in Definition 10.1.3,
k )
k
)
i=1 j=1
J2 .
-J )
J k
J
ιtj X J
J
αi ᾱj φ(ti − tj ) = E J
αj e
J .
j=1
!
A converse to the above is known as the Bochner-Khinchine theorem,
which states that if φ : R → C is nonnegative definite, continuous, and
φ(0) = 1, then φ is the characteristic function of a random variable X. For
a proof, see Chung (1974).
Another criterion for a function φ : R → C to be a characteristic function
is due to Polyā. For a proof, see Chung (1974).
Proposition 10.1.8: (Polyā’s criterion). Let φ : R → C satisfy φ(0) = 1,
φ(t) ≥ 0, φ(t) = φ(−t) for all t ∈ R and φ(·) is nonincreasing and convex
on [0, ∞). Then φ is a characteristic function.
10.2 Inversion formulas
Let F be a cdf and φ be its characteristic function. In this section, two
inversion formulas to get the cdf F from φ are presented. The first one is
from Feller (1966), and the second one is more standard.
Unless otherwise mentioned, for the rest of this section, X will be a random variable with cdf F and characteristic function φX and N a standard
normal random variable independent of X.
Lemma 10.2.1: Let g : R → R be a Borel measurable bounded function
vanishing outside a bounded set and let # ∈ (0, ∞). Then
% %
!2 t2
1
g(x)φX (t)e−ιtx e− 2 dtdx.
(2.1)
Eg(X + #N ) =
2π
324
10. Characteristic Functions
!2 t2
Proof: The integrand on the right is bounded by e− 2 |g(x)| and so is
integrable on R × R with respect to the Lebesgue measure on (R2 , B(R2 )).
&
&
t2
x2
Further, φX (t) = eιty dF (y) and e− 2 = √12π eιtx e− 2 dx, t ∈ R. By
repeated applications of Fubini’s theorem and the above two identities, the
right side of (2.1) becomes
.
%
% -%
1
# ιt(y−x) −!2 t2
√
√ e
e 2 dt dF (y)dx
g(x)
2π#
2π
[set s = #t]
.
%
% -%
2
1
1
√ eιs(y−x)/$ e−s /2 ds dF (y)dx
= √
g(x)
2π#
2π
.
%
%
2
(y−x)
1
=
g(x) √
e− 2!2 dF (y) dx.
2π#
Since X and N are independent and N has an absolutely continuous
distribution w.r.t. the Lebesgue measure, X + #N also has an absolutely
continuous distribution with density
%
(y−x)2
1
fX+$N (x) = √
e− 2!2 dF (y), x ∈ R.
2π#
Thus, the right side of (2.1) reduces to
%
g(x)fX+$N (x)dx = Eg(X + #N ).
!
Corollary 10.2.2: Let g : R → R be continuous and let g(x) = 0 for all
|x| > K, for some K, 0 < K < ∞. Then
%
Eg(X) =
g(x)dF (x)
% %
!2 t2
1
= lim
g(x)e−ιtx φX (t)e− 2 dtdx.
(2.2)
$→0+
2π
Proof: This follows from Lemma 10.2.1, the fact that X + #N → X w.p.
1 as # → 0, and the BCT.
!
Corollary 10.2.3: (Feller’s inversion formula). Let a and b, −∞ < a <
b < ∞, be two continuity points of F . Then
.
% b%
!2 t2
1
e−ιtx φX (t)e− 2 dt dx.
(2.3)
F (b) − F (a) = lim
$→0+ a
2π
Proof: This follows from Lemma 10.2.1 and Theorem 9.4.2, since the
function g(x) = 1 for a ≤ x ≤ b and 0 otherwise is continuous except at a
and b, which are continuity points of F .
!
10.2 Inversion formulas
325
Corollary 10.2.4: If φX (t) is integrable w.r.t. the Lebesgue measure m
on R, then F is absolutely continuous with density w.r.t. m, given by
%
1
e−ιtx φX (t)dt, x ∈ R.
(2.4)
f (x) =
2π
Proof: If φX is integrable, then
%
!2 t2
1
φX (t)e−ιtx e− 2 dt
2π
&
is bounded
by (2π)−1 |φX (t)|dt for all x ∈ R, and it converges to
&
(2π)−1 e−ιtx φX (t)dt as # → 0+ for each x ∈ R. Hence, by the BCT
and Corollary 10.2.3, for any a, b, −∞ < a < b < ∞, that are continuity
points of F
%
% b8
9
1
φX (t)e−ιtx dt dx.
F (b) − F (a) =
2π
a
Since F has at most countably many discontinuity points and F is right
continuous, the above relation holds for all −∞ < a < b < ∞.
!
Remark 10.2.1: The integrability of φX in Corollary 10.2.4 is only a sufficient condition. The standard exponential distribution has characteristic
function (1−ιt)−1 which is not integrable but the distribution is absolutely
continuous.
Corollary 10.2.5: (Uniqueness). The characteristic function φX determines F uniquely.
Proof: Since a cdf F is uniquely determined by its values on the set of its
continuity points, this corollary follows from Corollary 10.2.3.
!
A more standard inversion formula is the following.
&
Theorem 10.2.6: Let F be a cdf on R and φ(t) ≡ eιtx dF (x), t ∈ R be
its characteristic function.
(i) For any a < b, a, b ∈ R, that are continuity points of F ,
1
lim
T →∞ 2π
%
T
−T
e−ιta − e−ιtb
φ(t)dt = µF ((a, b)),
ιt
(2.5)
where µF is the Lebesgue-Stieltjes measure generated by F .
(ii) For any a ∈ R,
1
T →∞ 2T
µF ({a}) = lim
%
T
−T
e−ιta φ(t)dt.
(2.6)
326
10. Characteristic Functions
A multivariate extension of part (i) and its proof are given in Section
10.4. See also Problem 10.4. For a proof of part (ii), see Problem 10.5 or
see Chung (1974) or Durrett (2004).
Remark 10.2.2: (Inversion formula for integer valued random variables).
If X is integer valued with pk = P (X = k), k ∈ Z, then its characteristic
function is the Fourier series
)
pk eιtk , t ∈ R.
(2.7)
φ(t) =
k∈Z
&π
Since −π eιtj dt = 2π if j = 0 and = 0 otherwise, multiplying both sides of
(2.7) by e−ιtk and integrating over t ∈ (−π, π) and using DCT, one gets
% π
1
pk =
φ(t)e−ιtk dt, k ∈ Z.
(2.8)
2π −π
As a corollary to part (ii) of Theorem 10.2.6, one can deduce a criterion
for a distribution to be continuous. Let $
µ be a probability distribution and
let {pj } be its atoms, if any. Let α = j p2j . Let X and Y be two independent random variables with distribution µ and characteristic function
φ(·). Then Z = X − Y has characteristic function |φ(·)|2 and by Theorem
10.2.6, part (ii),
% T
1
P (Z = 0) = lim
|φ(t)|2 dt.
T →∞ 2T −T
But P (Z = 0) = α. Hence, it follows that
% T
)
1
p2j = lim
|φ(t)|2 dt.
T →∞ 2T −T
(2.9)
j∈Z
Corollary 10.2.7: A distribution is continuous iff
% T
1
|φ(t)|2 dt = 0.
lim
T →∞ 2T −T
(2.10)
Some consequences of the uniqueness result (cf. Corollary 10.2.5) are the
following.
Corollary 10.2.8: For a random variable X, X and −X have the same
distribution iff the characteristic function φX (t) of X is real valued for all
t ∈ R.
Proof: If φX (t) is real, then
%
φX (t) = (cos tx)dF (x)
for all t ∈ R,
10.3 Levy-Cramer continuity theorem
327
where F is the cdf of X. So
φX (t) = φX (−t) = E(e−ιtX ) = E(eιt(−X) ).
(2.11)
Since the characteristic function of −X coincides with φX (t), the ‘if part’
follows.
To prove the ‘only if’ part, suppose that X and −X have the same
distribution. Then as in (2.11),
φX (t) = φ−X (t) = φX (−t) = φX (t),
where for any complex number z = a + ιb, a, b ∈ R, z̄ ≡ a − ιb denotes its
!
conjugate. Hence, φX (t) is real for all t ∈ R.
Example 10.2.1: The standard Cauchy distribution has density
f (x) =
1 1
, −∞ < x < ∞.
π 1 + x2
Its characteristic function is given by
%
1
eιtx
φ(t) =
dx = e−|t| , t ∈ R.
π
1 + x2
(2.12)
(2.13)
To see this, let X1 and X2 be two independent copies of the standard
exponential distribution. Since φX1 (t) = (1 − ιt)−1 , t ∈ R, Y ≡ X1 − X2
has characteristic function
φY (t) = |φX1 (t)|2 = (1 + t2 )−1 ,
t ∈ R.
Since φY is integrable, the density of Y is
%
1
1
fY (y) =
e−ιuy du, y ∈ R.
2π
1 + u2
&
But by the convolution formula, fY (y) = x>−y e−x e−(y+x) dx
& ∞ −x −(y+x)
e e
11(0,∞) (y + x)dx = 12 e−|y| , y ∈ R. So
0
%
1
1
eιuy dt = e−|y| , y ∈ R,
π
1 + u2
=
proving (2.13).
10.3 Levy-Cramer continuity theorem
Characteristic functions are very useful in determining distributions, moments, and establishing various identities involving distributions. But by
328
10. Characteristic Functions
far their most important use is in establishing convergence in distribution.
This is the content of a continuity theorem established by Paul Levy and
H. Cramer. It says that the map ψ taking a distribution F to its characteristic function φ is a homeomorphism. That is, if Fn −→d F , then φn → φ
and conversely. Here, the notion of convergence of φn to φ is that of uniform
convergence on bounded intervals. The following result deals with the ‘if’
part.
Theorem 10.3.1: Let Fn , n ≥ 1 and F be cdfs with characteristic
functions φn , n ≥ 1 and φ, respectively. Let Fn −→d F . Then, for each
0 < K < ∞,
sup |φn (t) − φ(t)| → 0 as n → ∞.
|t|≤K
That is, φn converges to φ uniformly on bounded intervals.
Proof: By Skorohod’s theorem,
there exist random
variables Xn , X de!
"
fined on the Lebesgue space [0, 1], B([0, 1]), m where m(·) is the Lebesgue
measure such that Xn ∼ Fn , X ∼ F and Xn → X w.p. 1. Now, for any
t ∈ R,
J '
(J
J
J
|φn (t) − φ(t)| = JE eιtXn − eιtX J
J(
'J
J
J
≤ E J1 − eιt(X−Xn ) J
J
'J
(
J
J
≤ E J1 − eιt(X−Xn ) J11(|X − Xn | ≤ #)
+P (|Xn − X| > #).
Hence,
sup |φn (t) − φ(t)| ≤
|t|≤K
-
.
sup |1 − e | + P (|Xn − X| > #).
ιu
|u|≤K$
Given K and δ > 0, choose # ∈ (0, ∞) small such that
sup |1 − eιu | < δ.
|u|≤K$
Since for all # > 0, P (|Xn − X| > #) → 0 as n → ∞, it follows that
lim sup |φn (t) − φ(t)| = 0.
n→∞ |t|≤K
!
The Levy-Cramer theorem is a converse to the above theorem. That is,
if φn → φ uniformly on bounded intervals, then Fn −→d F . Actually, it is
a stronger result than this converse. It says that it is enough to know that
φn converges pointwise to a limit φ that is continuous at 0. Then φ is the
characteristic function of some distribution F and Fn −→d F . The key to
this is that under the given hypotheses, the family {Fn }n≥1 is tight.
10.3 Levy-Cramer continuity theorem
329
The next result relates the tail behavior of a probability measure to the
behavior of its characteristic function near the origin, which in turn will be
used to establish the tightness of {Fn }n≥1 .
Lemma 10.3.2: Let µ be a probability measure on R with characteristic
function φ. Then, for each δ > 0,
%
!
" 1 δ
(1 − φ(u))du.
µ {x : |x|δ ≥ 2} ≤
δ −δ
Proof: Fix δ ∈ (0, ∞). Then, using Fubini’s theorem and the fact that
1 − sinx x ≥ 0 for all x, one gets
.
% δ
% -% δ
(1 − φ(u))du =
(1 − eιux )du) µ(dx)
−δ
% 8 −δ
2 sin δx 9
=
2δ −
µ(dx)
x
% 8
sin δx 9
= 2δ
1−
µ(dx)
xδ
%
'
1 (
≥ 2δ
1−
µ(dx)
|xδ|
{x:|xδ|≥2}
!
"
≥ δµ {x : |x|δ ≥ 2} .
!
Lemma 10.3.3: Let {µn }n≥1 be a sequence of probability measures with
characteristic functions {φn }n≥1 . Let limn→∞ φn (t) ≡ φ(t) exist for |t| ≤
δ0 for some δ0 > 0. Let φ(·) be continuous at 0. Then {µn }n≥1 is tight.
Proof: For any 0 < δ < δ0 , by the BCT,
%
%
1 δ
1 δ
(1 − φn (t))dt →
[1 − φ(t)]dt.
δ −δ
δ −δ
Also, by continuity of φ at 0,
%
1 δ
[1 − φ(t)]dt → 0
δ −δ
as
δ → 0.
Thus, given # > 0, there exists a δ$ ∈ (0, δ0 ) and an M$ ∈ (0, ∞) such that
for all n ≥ M$ ,
%
1 δ!
(1 − φn (t))dt < #.
δ$ −δ!
By Lemma 10.3.2, this implies that for all n ≥ M$ ,
'3
2 4(
µn x : |x| ≥
< #.
δ$
330
10. Characteristic Functions
Now choose K$ >
Then,
2
δ!
such that
!
"
µj {x : |x| ≥ K$ } < # for
1 ≤ j ≤ M$ .
!
"
sup µn {x : |x| ≥ K$ } < #
n≥1
and hence, {µn }n≥1 is tight.
!
Theorem 10.3.4: (Levy-Cramer continuity theorem). Let {µn }n≥1 be a
sequence of probability measures on (R, B(R)) with characteristic functions
{φn }n≥1 . Let limn→∞ φn (t) ≡ φ(t) exist for all ∈ R and let φ be continuous
at 0. Then φ is the characteristic function of a probability measure µ and
µn −→d µ.
Proof: By Lemma 10.3.3, {µn }n≥1 is tight. Let {µnj }j≥1 be any subsequence of {µn }n≥1 that converges vaguely to a limit µ. By tightness, µ is
a probability measure and by Theorem 10.3.1, limj→∞ φnj (t) is the characteristic function of µ. That is, φ is the characteristic function of µ. Since
φ determines µ uniquely, all vague limit points of {µn }n≥1 coincide with µ
!
and hence by Theorem 9.2.6, µn −→d µ.
This theorem will be used extensively in the next chapter on central limit
theorems. For the moment, some easy applications are given.
Example 10.3.1: (Convergence of Binomial to Poisson). Let {Xn }n≥1
be a sequence of random variables such that Xn ∼ Binomial(Nn , pn ) for
all n ≥ 1. Suppose that as n → ∞, Nn → ∞, pn → 0 and Nn pn → λ,
λ ∈ (0, ∞). Then
Xn −→d X
where X ∼ P oisson(λ).
(3.1)
To prove (3.1), note that the characteristic function φn of Xn is
φn (t)
=
(pn eιt + 1 − pn )Nn
!
"Nn
= 1 + pn (eιt − 1)
'
(Nn
Nn pn ιt
=
1+
(e − 1)
, t ∈ R.
Nn
Next recall the fact that if {zn }n≥1 is a sequence of complex numbers such
that limn→∞ zn = z exists, then
(1 + n−1 zn )n → z
ιt
as n → ∞.
(3.2)
ιt
So φn (t) → eλ(e −1) for all t ∈ R. Since φ(t) ≡ eλ(e −1) , t ∈ R is the
characteristic function of a Poisson (λ) random variable, (3.1) follows.
10.3 Levy-Cramer continuity theorem
331
A direct proof of (3.1) consists of showing that for each j = 0, 1, 2, . . .
P (Xn = j) ≡
-
.
Nn j
e−λ λj
.
pn (1 − pn )Nn −j → P (X = j) =
j
j!
Example 10.3.2: (Convergence of Binomial to Normal). Let Xn ∼
Binomial(Nn , pn ) for all n ≥ 1. Suppose that as n → ∞, Nn → ∞ and
s2n ≡ Nn pn (1 − pn ) → ∞. Then
Zn ≡
Xn − Nn pn
−→d N (0, 1).
sn
(3.3)
To prove (3.3), note that the characteristic function φn of Zn is
φn (t)
B
CNn
pn exp(ιt(1 − pn )/sn ) + (1 − pn ) exp(−ιtpn /sn )
8
zn (t) 9Nn
≡ 1+
, say,
Nn
=
B!
C
−ιtpn "
ιt
where zn (t) = Nn pn e sn (1−pn ) + (1 − pn )e sn − 1 . By (3.2), it suffices
to show that for all t ∈ R,
zn (t) → −
t2
2
as
n → ∞.
By Lemma 10.1.5, for any x real,
J
(ιx)2 JJ |x|3
J ιx
.
J≤
Je − 1 − ιx −
2
3!
(3.4)
Since sn → ∞, for any t ∈ R, with pn (t) ≡ tpn /sn and qn (t) ≡ t(1−pn )/sn ,
one has
9
8:
;
zn (t) = Nn pn exp(ιt(1 − pn )/sn ) + (1 − pn ) exp(−ιtpn /sn ) − 1
8 :
;
:
;9
= Nn pn eιqn (t) − 1 − ιqn (t) + (1 − pn ) eιpn (t) − 1 − ιpn (t)
8p
9
1 − pn
n
(ιqn (t))2 +
(ιpn (t))2
= Nn
2
2
' p (1 − p )|t|3 (
n
n
+ Nn O
s3n
−t2
+ o(1) as n → ∞.
=
2
This is known as the DeMovire-Laplace CLT in the case Nn = n, pn = p,
0 < p < 1. The original proof was based on Stirling’s approximation.
332
10. Characteristic Functions
Example 10.3.3: (Convergence of Poisson to Normal). Let {Xn }n≥1 be
a sequence of random variables such that for n ≥ 1, Xn ∼ Poisson(λn ),
n −λn
λn ∈ (0, ∞). Let Yn = X√
, n ≥ 1. If λn → ∞ as n → ∞, then
λ
n
Yn −→d N (0, 1).
(3.5)
To prove (3.5), note that the characteristic function φn of Yn is
' 8
' L (
9(
'
L (
φn (t) = exp − ιt λn exp λn exp ιt/ λn − 1
' 8
' L (
' L (9(
= exp λn exp ιt/ λn − 1 − ιt/ λn
,
t ∈ R. Now using (3.4) again it is easy to show that for each t ∈ R,
.
' ιt (
ιt
−t2
−1− √
→
as n → ∞.
λn exp √
2
λn
λn
Hence, (3.5) follows.
10.4 Extension to Rk
Definition 10.4.1:
(a) Let X = (X1 , . . . , Xk ) be a k-dimensional random vector (k ∈ N).
The characteristic function of X is defined as
φX (t)
=
E exp(ιt · X)
- )
.
k
= E exp ι
tj X j ,
(4.1)
j=1
$k
t = (t1 , . . . , tk ) ∈ Rk , where t · x =
j=1 tj xj denotes the inner
product of the two vectors t = (t1 , . . . , tk ), x = (x1 , . . . , xk ) ∈ Rk .
"
!
(b) For a probability measure µ on Rk , B(Rk ) , its characteristic function is defined as
%
exp(ιt · x)µ(dx).
(4.2)
φ(t) =
Rk
Note that for a linear combination L ≡ a1 X1 +· · ·+ak Xk , a1 , . . . , ak ∈ R,
of a set of random variables X1 , . . . , Xk , all defined on a common probability space, the characteristic functions of L and X = (X1 , . . . , Xk ) are
10.4 Extension to Rk
333
related by the identity
φL (λ)
=
-
E exp ιλ
k
)
aj Xj
j=1
= φX (λa),
.
(4.3)
λ ∈ R,
where a = (a1 , . . . , ak ). Thus, the characteristic function of a random vector X = (X1 , . . . , Xk ) is determined by the characteristic functions of all
its linear combinations and vice versa. It will now be shown that as in
the one-dimensional case, the characteristic function of a random vector X
uniquely determines its probability distribution. The following is a multivariate version of Theorem 10.2.6.
Theorem 10.4.1: Let X = (X1 , . . . , Xk ) be a random vector with characteristic function φX (·) and let A = (a1 , b1 ] × · · · × (ak , bk ] be a rectangle in
Rk with −∞ < ai < bi < ∞ for all i = 1, . . . , k. If P (X ∈ ∂A) = 0, then
1
= lim
T →∞ (2π)k
P (X ∈ A)
%
T
−T
···
%
T
k
K
hj (tj )
−T j=1
× φX (t1 , . . . , tk )dt1 . . . dtk ,
(4.4)
!
where ∂A denotes
the boundary of A and where hj (tj ) ≡ exp(−ιtj aj ) −
"
−1
exp(−ιtj bj ) (ιtj ) for tj )= 0 and hj (0) = (bj − aj ), 1 ≤ j ≤ k.
Proof: Consider the product space Ω = [−T, T ]k ×Rk with the corresponding Borel-σ-algebra F = B([−T, T ]k ) × B(Rk ) and! the product measure"
k
k
[−T, T
µ = µ1 ×µ2 , where µ1 is the Lebesgue’s measure
! on
" ] , B([−T, T ] )
k
k
and µ2 is the probability distribution of X on R , B(R ) . Since the function
k
K
hj (tj ) exp(ιt · x),
f (t, x) ≡
j=1
(t, x) ∈ Ω is integrable w.r.t. the product measure µ, by Fubini’s theorem,
IT
≡
%
=
%
%
%
k 1%
K
=
T
−T
Rk
···
%
T
−T
T
−T
Rk j=1
···
/K
k
j=1
%
T
0
hj (tj ) φX (t1 , . . . , tk )dt1 . . . dtk
k
K
−T j=1
T
−T
{hj (tj ) exp(ιtj xj )}dt1 . . . dtk µ2 (dx)
2
exp(ιtj (xj − aj )) − exp(ιtj (xj − bj ))
dtj µ2 (dx)
ιtj
334
10. Characteristic Functions
=
%
Rk
k 1 %
K
2
sin tj (xj − aj )
dtj
tj
T
0
j=1
−2
%
T
0
2
sin tj (xj − bj )
dtj µ2 (dx),
tj
(4.5)
using (1.3) and the fact that sinθ θ and cosθ θ are respectively even and odd
functions of θ. It can be shown that (Problem 10.8)
% T
sin t
dt = π/2.
(4.6)
lim
T →∞ 0
t
Hence, by the change of variables theorem, it

% T
 0
sin tc
π/2
dt =
lim
T →∞ 0

t
−π/2
and
J%
J
sup JJ
T >0,c∈R
T
0
follows that, for any c ∈ R,
if c = 0
if c > 0
if c < 0
J
J
J% T
J
sin tc JJ
sin u JJ
J
dtJ = sup J
duJ ≡ K < ∞.
t
u
T >0
0
(4.7)
(4.8)
This implies that as T → ∞, the integrand in (4.5) converges to the function
<k
k
j=1 gj (xj ) for each x ∈ R , where


 π if y ∈ {aj , bj }
2π if y ∈ (aj , bj )
gj (y) =
(4.9)


0 if y ∈ (−∞, aj ) ∪ (bj , ∞).
Hence, by (4.5), (4.8), (4.9), and the BCT,
lim IT =
T →∞
%
k
K
Rk j=1
gj (xj )µ2 (dx).
By the boundary condition P (X ∈ ∂A) = 0, the right side above equals
(2π)k P (X ∈ (a1 , b1 ) × · · · × (ak , bk )), proving the theorem.
!
Remark 10.4.1: The inversion formula (2.3) can also be extended to the
multivariate case.
Corollary 10.4.2: A probability measure on (Rk , B(Rk )) is uniquely determined by its characteristic function.
Proof: Let µ and ν be probability measures on (Rk , B(Rk )) with the same
characteristic function φ(·), i.e.,
%
%
φ(t) = exp(ιt · x)µ(dx) = exp(ιt · x)ν(dx),
10.4 Extension to Rk
335
t ∈ Rk . Let A = {A : A = (a1 , b1 ] × · · · × (ak , bk ], −∞ < ai < bi < ∞,
i = 1, . . . , k, µ(∂A) = 0 = ν(∂A)}. It is easy to verify that A is a π-class.
Since there are only countably many rectangles (a1 , b1 ] × · · · × (ak , bk ] with
µ(∂A) + ν(∂A) )= 0, A generates B(Rk ). But, by Theorem 10.4.1,
µ(A)
=
=
ν(A)
lim (2π)
T →∞
−k
%
T
−T
···
%
T
−T
/K
k
j=1
0
hj (tj ) φ(t1 , . . . , tk )dt1 , . . . , dtk
for all A ∈ A. Hence, by Theorem 1.2.4, µ(B) = ν(B) for all B ∈ B(Rk ),
i.e., µ = ν.
!
Corollary 10.4.3: A probability measure µ on (Rk , B(Rk )) is determined
by its values assigned to the collection of half-spaces H ≡ {H : H = {x ∈
Rk : a · x ≤ c}, a ∈ Rk , c ∈ R}.
Proof: Let X be the identity mapping on Rk . Then, for any H = {x ∈
Rk : a · x ≤ c}, {X ∈ H} = {a · X ≤ c}. Thus, the values {µ(H) : H ∈ H}
determine the probability distributions (and hence, the characteristic functions) of all linear combinations of X. Consequently, by (4.3), it determines
the characteristic function of X. By Corollary 10.4.2, this determines µ
uniquely.
!
Theorem 10.4.4: Let {Xn }n≥1 , X be k-dimensional random vectors.
Then Xn −→d X iff
φXn (t) → φX (t)
for all
t ∈ Rk .
(4.10)
Proof: Suppose that Xn −→d X. Then, (4.10) follows from the continuous
mapping theorem for weak convergence (cf. Theorem 9.4.2). Conversely,
(j)
suppose (4.10) holds. Let Xn and X (j) denote the jth components of Xn
and X, respectively, j = 1, . . . , k. By (4.10), for any j ∈ {1, . . . , k},
lim E exp(ιλXn(j) ) = E exp(ιλX (j) )
n→∞
for all λ ∈ R.
Hence, by Theorem 10.3.4
Xn(j) −→d X (j)
for all j = 1, . . . , k.
(4.11)
This implies that the sequence of random vectors {Xn }n≥1 is tight (Problem 9.9). Hence, by Theorem 9.3.3, given any subsequence {ni }i≥1 , there
exists a further subsequence {n$i }i≥1 ⊂ {ni }i≥1 and a random vector X0
such that Xn!i −→d X0 as i → ∞. By the ‘only if’ part, this implies
φXn! (t) → φX0 (t)
i
as
i → ∞,
336
10. Characteristic Functions
for all t ∈ Rk . Thus, φX0 (·) = φX (·) and by the uniqueness of characteristic functions, X0 =d X. Thus, all convergent subsequences of {Xn }n≥1
have the same limit. By arguments similar to the proof of Theorem 9.2.6,
!
Xn −→d X. This completes the proof of the theorem.
Theorem 10.4.4 shows that as in the one-dimensional case, the (pointwise) convergence of the characteristic functions of a sequence of kdimensional random vectors {Xn }n≥1 to a given characteristic function is
equivalent to convergence in distribution of the sequence {Xn }n≥1 . Since
the characteristic function of a random vector is determined by the characteristic functions of all its linear combinations, this suggests that one
may also be able to establish convergence in distribution of a sequence of
random vectors by considering the convergence of the sequences of linear
combinations that are one-dimensional random variables. This is indeed
true as shown by the following result.
Theorem 10.4.5: (Cramer-Wold device). Let {Xn }n≥1 be a sequence of
k-dimensional random vectors and let X be a k-dimensional random vector.
Then, Xn −→d X iff for all a ∈ Rk ,
(4.12)
a · Xn −→d a · X.
Proof: Suppose Xn −→d X. Then, for any a ∈ Rk , the function h(x) =
a·x, x ∈ Rk is continuous on Rk . Hence, (4.12) follows from Theorem 9.4.2.
Conversely, suppose that (4.12) holds for all a ∈ Rk . By (4.3) and Theorem 10.3.1, this implies that as n → ∞
φXn (a)
= φa·Xn (1)
→ φa·X (1) = φX (a),
for all a ∈ Rk . Hence, by Theorem 10.4.4, Xn −→d X.
!
Recall that a set of random variables X1 , . . . , Xk defined on a common
probability space are independent iff the joint cdf of X1 , . . . , Xk is the
product of the marginal cdfs of the Xi ’s. A similar characterization of
independence can be given in terms of the characteristic functions, as shown
by the following result. The proof is left as an exercise (Problem 10.16).
Proposition 10.4.6: Let X1 , . . . , Xk , (k ∈ N) be a collection of random
variables defined on a common probability space. Then, X1 , . . . , Xk are independent iff
φ(X1 ,...,Xk ) (t1 , . . . , tk ) =
k
K
j=1
for all t1 , . . . , tk ∈ R.
φXj (tj )
10.5 Problems
337
10.5 Problems
10.1 Let {Xn }n≥1 and {Yn }n≥1 be two sequences of random variables such
that for each n ≥ 1, Xn and Yn are defined on a common probability
space and Xn and Yn are independent. If Xn −→d X and Yn −→d Y ,
then show that
(5.1)
Xn + Yn −→d X0 + Y0
where X0 =d X, Y0 =d Y (cf. Section 2.2) and X0 and Y0 are independent. Show by an example that (5.1) is false without the independence
hypothesis.
10.2 Give an example of a nonlattice discrete distribution F on R supported by only a three point set.
10.3 Let F be an absolutely continuous cdf on R with density f and with
characteristic function φ. Show that if f has a derivative f (1) ∈ L1 (R),
then
lim |tφ(t)| = 0.
|t|→∞
Generalize this result when f is r-times differentiable and the jth
derivative f (j) lie in L1 (R) for j = 1, . . . , r.
10.4 Let F be a cdf on R with characteristic function φ. Show that for any
a < b, a, b ∈ R,
% T
B
C
1
exp(−ιta) − exp(−ιtb) (ιt)−1 φ(t)dt
lim
T →∞ 2π −T
1
= µF ((a, b)) + µF ({a, b}),
(5.2)
2
where µF denotes the Lebesgue-Stieltjes measure corresponding to
F.
(Hint: Use (4.7) and the arguments in the proof of Theorem 10.4.1.)
10.5 Let φ be a characteristic function of a cdf F and let µF denote the
Lebesgue-Stieltjes measure corresponding to F .
(a) Show that for any a ∈ R and T ∈ (0, ∞),
%
T
−T
=
exp(−ιta)φ(t)dt
2T µF ({a})
%
exp(ιT (x − a)) − exp(−ιT (x − a))
+
µF (dx).
T (x − a)
{x.=a}
(5.3)
338
10. Characteristic Functions
(b) Conclude from (5.3) that for any a ∈ R,
% T
1
exp(−ιta)φ(t)dt.
F (a) − F (a−) = lim
T →∞ 2T −T
(5.4)
10.6 Let F be a cdf on R with characteristic function φ. If |φ| ∈ L2 (R),
then show that F is continuous.
(Hint: Use Corollary 10.1.7.)
10.7 Let {Fn }n≥1 , F be cdfs with characteristic functions {φn }n≥1 , φ,
respectively. Suppose that Fn −→d F .
(a) Give an example to show that φn may not converge to φ uniformly over all of R.
t2
(Hint: Try φn (t) ≡ e− n .)
(b) Let {µn }n≥1 and µ denote the Lebesgue-Stieltjes measures
corresponding to {Fn }n≥1 and F , respectively. Suppose that
{µn }n≥1 and µ are dominated by a σ-finite measure λ on
n
(R, B(R)) with Radon-Nikodym derivatives fn = dµ
dλ , n ≥ 1
dµ
and f = dλ . If fn −→ f a.e. (λ), then show that
sup |φn (t) − φ(t)| → 0
as n → ∞.
t∈R
(5.5)
10.8 Let G(x, a) = (1 + a2 )−1 (1 − e−ax {a sin x + cos x}), x ∈ R, a ∈ R.
(a) Show that for any a > 0, x0 ≥ 0,
% x0
(sin x) e−ax dx = G(a, x0 ).
0
(5.6)
(Hint: Consider the derivatives of the left and the right sides of
(5.6) w.r.t. x0 .)
(b) Use Fubini’s theorem to justify that for all T > 0,
% ∞% T
% T% ∞
−ax
(sin x) e
dadx =
(sin x) e−ax dxda.
0
0
0
0
(5.7)
&∞
(c) Use (5.6), (5.7) and the identity that for x > 0, 0 e−ax da = x1
to conclude that for any T > 0
% T
% ∞
sin x
dx =
G(a, T )da.
(5.8)
x
0
0
&∞
(d) Use the DCT and the fact that 0 (1+a2 )−1 da = π2 to conclude
that the limit of the right side of (5.8) exists and equals π2 .
10.5 Problems
339
10.9 Let F1 , F2 , and F3 be three cdfs on R. Then show by an example
that F1 ∗ F2 = F1 ∗ F3 does not imply that F1 = F2 . Here Fi ∗ Fj
denotes the convolution of Fi and Fj , 1 ≤ i, j ≤ 3.
(Hint: For F1 , consider a cdf whose characteristic function φ has a
bounded support.)
10.10 Let µ be a probability measure on R with characteristic function φ.
Prove that
%
%
2 ∞
[1 − Re(φ(t))]t−2 dt = |x|µ(dx).
π −∞
10.11 Let φ be the characteristic function of a random variable X. If |φ(t)| =
1 = |φ(αt)| for some t )= 0 and α ∈ R irrational, then there exists
x0 ∈ R such that P (X = x0 ) = 1.
(Hint: Use Proposition 10.1.1.)
10.12 Show that for any characteristic function φ, {t ∈ R : |φ(t)| = 1} is
either {0} or countably infinite or all of R.
10.13 Let {Xn }n≥1 be a sequence of iid random variables with a nondegenerate distribution F . Suppose that there exist an ∈ (0, ∞) and
bn ∈ R such that
a−1
n
-)
n
j=1
X j − bn
.
−→d Z
(5.9)
for some nondegenerate random variable Z.
(a) Show that
an → ∞
as
n → ∞.
(Hint:' If8 an → a ∈9( R, then E exp(ιtaZ) = limn→∞
$n
E exp ιt
= 0 for all except countably many
j=1 Xj − bn
t ∈ R, which leads to a contradiction.)
(b) Show that as n → ∞
bn − bn−1 = o(an )
and
an
→ 1.
an−1
'$
(
n−1
d
(Hint: Use (a) to show that
j=1 Xj − bn /an −→ Z and
(
'$
n−1
d
by (5.9),
j=1 Xj − bn−1 /an−1 −→ Z.)
340
10. Characteristic Functions
10.14 Show that for every T ∈ (0, ∞), there exist two distinct characteristic
functions φ1T and φ2T satisfying
φ1T (t) = φ2T (t)
for all |t| ≤ T.
(Hint: Let φ1 (t) = e−|t| , t ∈ R and for any
even function φ2T (·) by


 φ1 (t)
φ1 (T ) + (t − T )(−φ1 (T ))
φ2T (t) =


0
T ∈ (0, ∞), define an
for 0 ≤ t ≤ T
T ≤t<T +1
t > T.
Now use Polyā’s criterion.)
10.15 Show that φα (t) = exp(−|t|α ), t ∈ R, α ∈ (0, ∞) is a characteristic
function for 0 ≤ α ≤ 2.
10.16 Prove Proposition 10.4.6.
(Hint: The ‘only if’ part follows from (4.2) and Proposition 7.1.3.
The ‘if part’ follows by using the inversion formulas of Theorems
10.4.1 and 10.2.6 and the characterization of independence in terms
of cdfs (Corollary 7.1.2).)
variables with characteristic
10.17 Let {Xn }n≥0 be a collection of random
&
functions {φn }n≥0 . Suppose that |φn (t)|dt < ∞ for all n ≥ 0 and
that φn (·) → φ0 (·) in L1 (R) as n → ∞. Show that
J
J
sup JP (Xn ∈ B) − P (X0 ∈ B)J → 0
B∈B(R)
as n → ∞.
10.18 Let φ(·) be a characteristic function on R such that φ(t) → 0 as
|t| → ∞. Let X be a random variable with φ as its characteristic
function. For each n ≥ 1, let Xn = nk if nk ≤ X < k+1
n , k = 0,
±1, ±2, . . .. Show that if φn (t) ≡ E(eιtXn ), then φn (t) → φ(t) for
each t ∈ R but for each n ≥ 1,
:
;
sup |φn (t) − φ(t)| : t ∈ R = 1.
10.19 Let {δi }i≥1 be iid random variables with distribution
Let Xn =
$n
δi
i=1 2i
P (δ1 = 1) = P (δ1 = −1) = 1/2.
and X = limn→∞ Xn .
(a) Find the characteristic function of Xn .
10.5 Problems
(b) Show that the characteristic function of X is φX (t) ≡
341
sin t
t .
be iid random variables with pdf f (x) = 21 e−|x| , x ∈ R.
10.20 Let {Xk }k≥1
$∞
Show that k=1 k1 Xk converges w.p. 1 and compute its characteristic
function.
(Hint: Note that the characteristic function of the standard Cauchy
(0,1) distribution is e−|t| .)
10.21 Establish an extension of formula (2.3) to the multivariate case.
11
Central Limit Theorems
11.1 Lindeberg-Feller theorems
The central limit theorem (CLT) is one of the oldest and most useful results
in probability theory. Empirical findings in applied sciences, dating back to
the 17th century, showed that the averages of laboratory measurements on
various physical quantities tended to have a bell-shaped distribution. The
CLT provides a theoretical justification for this observation. Roughly speaking, it says that under some mild conditions, the average of a large number
of iid random variables is approximately normally distributed. A version
of this result for 0–1 valued random variables was proved by DeMoivre
and Laplace in the early 18th century. An extension of this result to the
averages of iid random variables with a finite second moment was done in
the early 20th century. In this section, a more general set up is considered,
namely, that of the limit behavior of the row sums of a triangular array of
independent random variables.
Definition 11.1.1: For each n ≥ 1, let {Xn1 , . . . , Xnrn } be a collection
of random variables defined on a probability space (Ωn , Fn , Pn ) such that
Xn1 , . . . , Xnrn are independent. Then, {Xnj : 1 ≤ j ≤ rn }n≥1 is called a
triangular array of independent random variables.
Let {Xnj : 1 ≤ j ≤ rn }n≥1 be a triangular array of independent random
variables. Define the row sums
344
11. Central Limit Theorems
Sn =
rn
)
j=1
Xnj , n ≥ 1.
(1.1)
2
Suppose that EXnj
< ∞ for all j, n. Write s2n = Var(Sn ) =
$rn
by Lindeberg,
j=1 Var(Xnj ), n ≥ 1. The following condition, introduced
! S −ES "
n
n
plays an important role in establishing convergence of
to a stansn
dard normal random variable in distribution.
Definition 11.1.2: Let {Xnj : 1 ≤ j ≤ rn }n≥1 be a triangular array of
independent random variables such that
2
2
≡ σnj
< ∞ for all
EXnj = 0, 0 < EXnj
1 ≤ j ≤ rn , n ≥ 1.
(1.2)
Then, {Xnj : 1 ≤ j ≤ rn }n≥1 is said to satisfy the Lindeberg condition if
for every # > 0,
lim
n→∞
where s2n =
$rn
j=1
s−2
n
rn
)
j=1
2
EXnj
I(|Xnj | > #sn ) = 0,
(1.3)
2
σnj
, n ≥ 1.
Example 11.1.1: Let {Xn }n≥1 be a sequence of iid random variables
with EX1 = µ and Var(X1 ) = σ 2 ∈ (0, ∞). Consider the centered and
scaled sample mean
√
n(X̄n − µ)
Tn =
, n ≥ 1,
(1.4)
σ
$n
where X̄n = n−1 j=1 Xj . Note that Tn can be written as the row sum of
a triangular array of independent random variables:
Tn =
n
)
(1.5)
Xnj ,
j=1
√
where Xnj = (Xj − µ)/{σ n}, 1 ≤ j ≤ n, n ≥ 1. Clearly, {Xnj : 1 ≤
2
2
= EXnj
= nσ1 2 Var(X1 ) = 1/n for all
j ≤ n}n≥1 satisfies (1.2) with σnj
$
n
2
1 ≤ j ≤ n, and hence, s2n = j=1 σnj
= 1 for all n ≥ 1. Now, for any # > 0,
s−2
n
n
)
j=1
2
EXnj
I(|Xnj | > #sn )
n
' X − µ (2 'J X − µ J
(
)
J j
J
j
√
=
E
I J √ J>#
σ n
σ n
j=1
8 1 :
!
√ ";9
= n 2 E (X1 − µ)2 I |X1 − µ| > #σ n
σ n
:
!
√ ";
−2
= σ E (X1 − µ)2 I |X1 − µ| > #σ n → 0
as
n → ∞,
11.1 Lindeberg-Feller theorems
345
by the DCT, since E(X1 − µ)2 < ∞. Thus, the triangular array {Xnj : 1 ≤
j ≤ n} of (1.5) satisfies the Lindeberg condition (1.3).
The main result of this section is the following CLT for scaled row sums
of a triangular array of independent random variables.
Theorem 11.1.1: (Lindeberg CLT). Let {Xnj : 1 ≤ j ≤ rn }n≥1 be a
triangular array of independent random variables satisfying (1.2) and the
Lindeberg condition (1.3). Then,
where Sn =
$rn
j=1
Xnj
Sn
−→d N (0, 1)
sn
$rn 2
and s2n = Var(Sn ) = j=1
σnj .
(1.6)
As a direct consequence of Theorem 11.1.1 and Example 11.1.1, one gets
the more familiar version of the CLT for the sample mean of iid random
variables.
Corollary 11.1.2: (CLT for iid random variables). Let {Xn }n≥1 be a
sequence of iid random variables with EX1 = µ and Var(X1 ) = σ 2 ∈
(0, ∞). Then,
√
n(X̄n − µ) −→d N (0, σ 2 ),
(1.7)
$
n
−1
where X̄n = n
j=1 Xnj , n ≥ 1.
For proving the theorem, the following simple inequality will be used.
Lemma 11.1.3: For any m ∈ N and for any complex numbers z1 , . . . , zm ,
ω1 , . . . , ωm , with |zj | ≤ 1, |ωj | ≤ 1 for all j = 1, . . . , m,
J m
J
m
m
K
JK
J )
J
J≤
z
−
ω
|zj − ωj |.
(1.8)
j
j
J
J
j=1
j=1
j=1
Proof: Inequality (1.8) follows from the identity
m
K
j=1
zj −
m
K
j=1
ωj
=
m
K
j=1
+
zj −
- m−1
K
- m−1
K
j=1
j=1
.
.
zj ω m
zj ω m −
+ · · · + z1
m
K
j=2
- m−2
K
ωj −
j=1
m
K
.
zj ωm−1 ωm
ωj .
j=1
!
Proof of Theorem 11.1.1: W.l.o.g., suppose that s2n = 1 for all n ≥ 1.
(Otherwise, setting X̃nj ≡ Xnj /sn , 1 ≤ j ≤ rn , n ≥ 1, it is easy to check
346
11. Central Limit Theorems
that for the triangular array {X̃nj : 1 ≤ j ≤ rn }n≥1 , the variance of the nth
$rn
Var(X̃nj ) = 1 for all n ≥ 1, the Lindeberg condition
row sum s̃2n ≡ j=1
$rn
d
holds, and s̃−1
n
j=1 X̃nj −→ N (0, 1) iff (1.6) holds.) Then, by Theorem
10.3.4, it is enough to show that
2
lim E exp(ιtSn ) = e−t
n→∞
/2
for all t ∈ R.
(1.9)
For any # > 0,
∆n
;
:
2
≡ max EXnj
: 1 ≤ j ≤ rn
:
;
2
2
≤ max EXnj
I(|Xnj | > #) + EXnj
I(|Xnj | ≤ #) : 1 ≤ j ≤ rn
rn
)
2
EXnj
I(|Xnj | > #) + #2
≤
j=1
= o(1) + #2
Hence,
as
n → ∞, by the Lindeberg condition (1.3).
∆n → 0
as
n → ∞.
(1.10)
Fix t ∈ R. Let φnj (·) denote the characteristic function of Xnj , 1 ≤ j ≤ rn ,
n ≥ 1. Note that by (1.10), there exists n0 ∈ N such that for all n ≥ n0 ,
2
/2| : 1 ≤ j ≤ rn } ≤ 1. Next, noting that s2n =
I1n ≡ max{|1 − t2 σnj
$rn 2
j=1 σnj = 1, by Lemma 11.1.3, for all n ≥ n0 ,
J
J
2
J
J
JE exp(ιtSn ) − e−t /2 J
JK
rn '
2 (JJ
K
J rn
t2 σnj
J
1−
φnj (t) −
≤ JJ
J
2
j=1
j=1
J
JK
rn
2 (
K
J
J rn '
t2 σnj
2
1−
exp(−t2 σnj
/2)JJ
−
+ JJ
2
j=1
j=1
J
J
rn
2
2
8
)
J
t σnj 9J
J
J
≤
Jφnj (t) − 1 − 2 J
j=1
rn J
2 9JJ
8
)
J
t2 σnj
2
J
J exp(−t2 σnj
/2)
−
1
−
+
J
2 J
j=1
≡ I2n + I3n ,
say.
(1.11)
It will now be shown that
lim Ikn = 0
n→∞
for k = 2, 3.
J
J
First consider I2n . Since J exp(ιx) − [1 + ιx + (ιx)2 /2]J ≤ min{|x|3 /3!, |x|2 }
for all x ∈ R (cf. Lemma 10.1.5) and EXnj = 0 for all 1 ≤ j ≤ rn , for any
11.1 Lindeberg-Feller theorems
347
# > 0, by the Lindeberg condition, one gets
rn J
2 9J
8
)
t2 σnj
J
J
I2n ≡
J
Jφnj (t) − 1 −
2
j=1
=
≤
≤
≤
≤
n J
9J
8
)
(ιt)2
J
2 J
EXnj
JE exp(ιtXnj ) − 1 + ιtEXnj +
J
2!
j=1
rn
)
E min
j=1
rn
)
j=1
E|tXnj |3 I(|Xnj | ≤ #) +
|t|3 #
3
3 |tX |3
4
nj
, |tXnj |2
3!
rn
)
2
EXnj
+ t2
j=1
rn
)
j=1
2
|t| # + t · o(1)
rn
)
j=1
E(tXnj )2 I(|Xnj | > #)
2
EXnj
I(|Xnj | > #)
as n → ∞.
(1.12)
Since # ∈ (0, ∞) is arbitrary, I2n → 0 as n → ∞.
Next, consider I3n . Note that for any x ∈ R,
J)
J
∞
)
J ∞ k J
|x|k−2
x
J
≤ x2 e|x| .
x /k!JJ ≤ x2
|e − 1 − x| = J
k!
k=2
k=2
s2n
= 1, one gets
Hence, using (1.10) and the fact that
r
n
2
2
'
(
) t σnj 2
2
exp(t2 σnj
/2)
I3n ≤
2
j=1
2
1)
rn
4
2
2
σnj · ∆n
≤ t exp(t ∆n /2)
j=1
4
2
→0
as
= t exp(t ∆n /2)∆n
n → ∞.
(1.13)
Now (1.9) follows from (1.11), (1.12), and (1.13). This completes the proof
of the theorem.
!
Oftentimes, verification of the Lindeberg condition (1.3) becomes difficult as one has to find the truncated second moments of Xnj ’s. A simpler
sufficient condition for the CLT is provided by Lyapounov’s condition.
Definition 11.1.3: A triangular array {Xnj : 1 ≤ j ≤ rn }n≥1 of independent random variables satisfying (1.2) is said to satisfy Lyapounov’s
condition if there exists a δ ∈ (0, ∞) such that
lim s−(2+δ)
n
n→∞
rn
)
j=1
E|Xnj |2+δ = 0,
(1.14)
348
11. Central Limit Theorems
where s2n =
$rn
j=1
2
EXnj
.
Note that by Markov’s inequality, if a triangular array {Xnj : 1 ≤ j ≤
rn }n≥1 satisfies Lyapounov’s condition (1.14), then for any # ∈ (0, ∞),
s−2
n
≤ s−2
n
rn
)
j=1
rn
)
j=1
→0
2
EXnj
I(|Xnj | > #sn )
E|Xnj |2 (|Xnj |/#sn )δ
as
n → ∞.
Thus, {Xnj : 1 ≤ j ≤ rn }n≥1 satisfies the Lindeberg condition (1.3). This
observation leads to the following result.
Corollary 11.1.4: (Lyapounov’s CLT). Let {Xnj : 1 ≤ j ≤ rn }n≥1 be
a triangular array of independent random variables satisfying (1.2) and
Lyapounov’s condition (1.14). Then, (1.6) holds, i.e.,
Sn
−→d N (0, 1).
sn
It is clear that Lyapounov’s condition is only a sufficient but not a necessary condition for the validity of the CLT. In contrast, under some regularity conditions on the triangular array {Xnj : 1 ≤ j ≤ rn }n≥1 , which
essentially says that the individual random variables Xnj ’s are ‘uniformly
small’, the Lindeberg condition is also a necessary condition for the CLT.
This converse is due to W. Feller.
Theorem 11.1.5: (Feller’s theorem). Let {Xnj : 1 ≤ j ≤ rn }n≥1 be a
triangular array of independent random variables satisfying (1.2) such that
for any # > 0,
(1.15)
lim max P (|Xnj | > #sn ) = 0,
where s2n =
$rn
j=1
n→∞ 1≤j≤rn
2
EXnj
. Let Sn =
$rn
j=1
Xnj . If, in addition,
Sn
−→d N (0, 1),
sn
(1.16)
then {Xnj : 1 ≤ j ≤ rn }n≥1 satisfies the Lindeberg condition.
A triangular array {Xnj : 1 ≤ j ≤ rn }n≥1 satisfying (1.15) is called a
null array. Thus, the converse of Theorem 11.1.1 holds for null arrays. It
may be noted that there exist non-null arrays for which (1.16) holds but
the Lindeberg condition fails (Problem 11.9).
≥ 1. Next fix # ∈ (0, ∞).
Proof: W.l.o.g., suppose that s2n = 1 for all n $
rn
2
Then, setting t0 = 4/#, and noting that 1 =
j=1 EXnj and cos x ≥
11.1 Lindeberg-Feller theorems
349
1 − x2 /2 for all x ∈ R, one gets
rn
)
!
=
≥
≥
≥
=
j=1
rn
)
j=1
rn
)
j=1
rn
)
" t2
E cos t0 Xnj − 1 + 0
2
E
E
E
j=1
' t2
0
2
' t2 X 2
0
nj
2
' t2 X 2
0
nj
2
' t2 X 2
0
nj
2
−
− 1 + cos t0 Xnj
(
( !
"
− 1 + cos t0 Xnj I |Xnj | > #
( !
"
− 2 I |Xnj | > #
rn
!
"
2 ()
2
EXnj
I |Xnj | > #
2
# j=1
rn
!
"
6 )
2
EXnj
I |Xnj | > # .
2
# j=1
Hence, the Lindeberg condition would hold if it is shown that for all t ∈ R,
rn
)
!
⇔
" t2
→ 0 as n → ∞
E cos tXnj − 1 +
2
j=1
-)
.
rn
!
"
2
exp
E cos tXnj − 1 → e−t /2 as n → ∞. (1.17)
j=1
Let φnj (t) = E exp(ιtXnj ), t ∈ R denote the characteristic function of Xnj ,
1 ≤ j ≤ rn , n ≥ 1. Note that E cos tXnj = Re(φnj (t)), where recall that
for any complex number z, Re(z) denotes the real part of z, i.e., Re(z) = a
if z = a + ιb, a, b ∈ R. Since the function h(z) = |z| is continuous on C
and | exp(φnj (t))| = exp(E cos tXnj ), it follows that (1.17) holds if, for all
t ∈ R,
.
-)
rn
2
(φnj (t) − 1) → e−t /2 as n → ∞.
exp
j=1
However, by (1.16), E exp(ιtSn ) =
Hence, it is enough to show that
I1n (t)
≡
1
exp
→0
-)
rn
j=1
as
< rn
j=1
2
φnj (t) → e−t
/2
for all t ∈ R.
. K
2
rn
[φnj (t) − 1] −
φnj (t)
n→∞
j=1
for all t ∈ R.
(1.18)
350
11. Central Limit Theorems
Note that for any # ∈ (0, ∞), by (1.15) and the inequality |eιx − 1| ≤
min{2, |x|} for all x ∈ R, one has
J !
"J
|φnj (t) − 1| = JE exp(ιtXnj ) − 1 J
≤ E min{|tXnj |, 2}
≤ 2P (|Xnj | > #) + |t|#
uniformly in j = 1, . . . , rn . Hence, letting n → ∞ and then # ↓ 0, by (1.15),
one gets
(1.19)
I2n (t) ≡ max |φnj (t) − 1| = o(1) as n → ∞
1≤j≤rn
for all t ∈ R. Further, by the inequality |eιx − 1 − ιx| ≤ |x|2 /2, x ∈ R,
rn
)
j=1
|φnj (t) − 1| =
≤
rn
)
J
J
JE exp(ιtXnj ) − 1 − E(ιtXnj )J
j=1
rn
t2
t2
t2 )
2
EXnj
= s2n =
2 j=1
2
2
(1.20)
uniformly in t ∈ R, n ≥ 1. Now fix t ∈ R. Then, by (1.19), there exists
n0 ∈ N such that for all n ≥ n0 , max1≤j≤rn |φn (t) − 1| ≤ 1. Hence, by
the arguments in the proof of Lemma 11.1.3,
byJ the inequalities |ez | ≤
J $∞ and
$
∞
k
|z|
z
k
2
J
J
k=0 |z| /k! = e , and |e − 1 − z| =
k=2 z /k! ≤ |z| exp(|z|), z ∈ C,
for all n ≥ n0 , one has
J rn
J
rn
JK
J
!
" K
exp [φnj (t) − 1] −
φnj (t)JJ
I1n (t) = JJ
j=1
≤
≤
≤
=
rn
)
j=1
rn
)
j=1
j=1
n −j
J rK
J
J
"
"J
!
!
J exp [φnj (t) − 1] − φnj (t)J ·
J exp [φnj (t) − 1] J
k=1
J
J
!
"
J exp [φnj (t) − 1] − φnj (t)J · exp
-)
rn
j=1
rn
' 2(
)
J
J
!
"
J exp [φnj (t) − 1] − φnj (t)J · exp t
2
j=1
.
J
J
Jφnj (t) − 1J
rn
' 2(
)
J
J
!
"
J exp [φnj (t) − 1] − 1 − [φnj (t) − 1]J · exp t
2
j=1
rn
'
2(
)
J
J
Jφnj (t) − 1J2 · exp 1 + t
2
j=1
-)
.
rn
'
t2 (
≤
max |φnj (t) − 1|
|φnj (t) − 1| exp 1 +
1≤j≤rn
2
j=1
≤
11.1 Lindeberg-Feller theorems
351
8
'
t2 ( 9
≤ I2n (t) · t2 · exp 1 +
/2
2
→ 0 as n → ∞,
by (1.19) and (1.20). Hence, (1.18) holds. This completes the proof of the
theorem.
!
The following example is an application of the Lindeberg CLT for proving asymptotic normality of the least squares estimator of a regression
parameter.
Example 11.1.2: Let
Yj = xj β + #j ,
j = 1, 2, . . .
(1.21)
be a simple linear regression model, where {xn }n≥1 is a given sequence of
real numbers, β ∈ R is the regression parameter and {#n }n≥1 is a sequence
of iid random variables with E#1 = 0 and E#21 ≡ σ 2 ∈ (0, ∞). The least
squares estimator of β based on Y1 , . . . , Yn is given by
β̂n =
n
)
xj Yj /a2n ,
n ≥ 1,
j=1
where a2n =
Then,
$n
j=1
x2j . Suppose that the sequence {xn }n≥1 satisfies
max {|xj |/an } → 0
1≤j≤n
as
(1.22)
n → ∞.
an (β̂n − β) −→d N (0, σ 2 ).
(1.23)
To prove (1.23), note that by (1.21),
an (β̂n − β)
=
an
1)
n
= a−1
n
j=1
n
)
j=1
xj Yj −
xj #j ≡
a2n β
n
)
2Q
a2n
Xnj ,
say
(1.24)
j=1
2
<
where Xnj = xj #j /an , 1 ≤ j ≤ n, n ≥ 1. Note that EXnj = 0, EXnj
$n
$n
2
2
2
2
2
2
∞ and sn ≡
EX
=
x
E#
/a
=
σ
.
Thus,
{X
:
1
≤
nj
n
nj
j
j=1
j=1 j
j ≤ n}n≥1 is a triangular array of independent random variables satisfying
(1.2). Next, let mn = max{|xj |/an : 1 ≤ j ≤ n}, n ≥ 1. Then, by (1.22),
for any δ ∈ (0, ∞),
s−2
n
n
)
j=1
"
!
2
EXnj
I |Xnj | > δsn
352
11. Central Limit Theorems
= σ −2 a−2
n
n
)
j=1
≤ σ −2 a−2
n
n
)
j=1
x2j E#2j I(|xj #j /an | > δσ)
x2j · E#21 I(mn · |#1 | > δσ)
"
!
= σ −2 E#21 I |#1 | > δσ · m−1
n
→0
as
n → ∞ by the DCT.
Thus, {Xnj : 1 ≤ j ≤ n}n≥1 satisfies the Lindeberg condition (1.3) and
hence, by Theorem 11.1.1,
$n
j=1 Xnj
−→d N (0, 1),
σ
which, in view of (1.24), implies (1.23).
The next result gives a multivariate generalization of Theorem 11.1.1.
Theorem 11.1.6: (A multivariate version of the Lindeberg CLT). For
each n ≥ 1, let {Xnj : 1 ≤ j ≤ rn } be a collection of independent kdimensional random vectors satisfying
EXnj = 0, 1 ≤ j ≤ rn
and
rn
)
$
EXnj
Xnj = Ik ,
j=1
where Ik denotes the identity matrix of order k and for any vector x, x$
denotes its transpose. Suppose that for every # ∈ (0, ∞),
lim
n→∞
Then,
rn
)
j=1
E7Xnj 72 I(7Xnj 7 > #) = 0.
rn
)
j=1
Xnj −→d N (0, Ik ).
The proof is a consequence of Theorem 11.1.1 and the Cramer-Wold
device (cf. Theorem 10.4.5) and is left as an exercise (Problem 11.17).
11.2 Stable distributions
If {Xn }n≥1 is a sequence of iid N (µ, σ 2 ) random variables, then for each
$k
k ≥ 1, Sk ≡ i=1 Xi has a N (kµ, kσ2 ) distribution. Similarly, if {Xn }n≥1
11.2 Stable distributions
353
is a sequence of iid Cauchy (µ, σ) random variables, then for each k ≥ 1,
$k
Sk ≡
i=1 Xi has a Cauchy (kµ, kσ) distribution. Thus, in both cases,
for each k ≥ 1, there exist constants ak and bk such that Sk has the same
distribution as ak X1 + bk (Problem 11.21).
Definition 11.2.1: A nondegenerate random variable X is called stable
if the above property holds, i.e., for each k ∈ N, there exist constants ak
and bk such that
Sk =d ak X1 + bk ,
(2.1)
where X1 , X2 ,$
. . . are iid random variables with the same distribution as
k
X, and Sk = i=1 Xi . In this case, the distribution FX of X is called a
stable distribution.
There are two characterizations of stable distributions.
Theorem 11.2.1: A nondegenerate distribution F is stable iff there exists
a sequence of iid random
variable
"R {Yn }n≥1 and constants {an }n≥1 and
! $n
{bn }n≥1 such that
ak converges in distribution to F .
Y
−
b
k
i=1 i
Theorem 11.2.2: A nondegenerate distribution F is stable iff its characteristic function φ(t) admits the representation
"
!
(2.2)
φ(t) = exp ιtc − b|t|α (1 + ιλsgn(t)ωα (t))
√
where ι = −1, −1 ≤ λ ≤ 1, 0 < α ≤ 2, 0 ≤ b < ∞, and the functions
ωα (t) and sgn(·) are defined as
/
if α )= 1
tan πα
2
(2.3)
ωα (t) =
2
log
|t|
if α = 1
π
and

if
 1
−1 if
sgn(t) =

0
if
t>0
t<0
t=0.
Remark 11.2.1: When α = 2, φ(t) in (2.2) reduces to exp(ιtc−bt2 ), which
is the characteristic function of a normal random variable with mean c and
variance 2b.
Remark 11.2.2: When α = 1, λ = 0, φ(t) reduces to exp(ιtc−b|t|), which
is the characteristic function of a Cauchy (c, b) distribution.
Remark 11.2.3: Since |φ(t)| is integrable, the distribution F must be
absolutely continuous. Apart from the normal and Cauchy distributions,
for α = 1/2, λ = 1, the density is given by
'
1 (
1
1
, x > 0.
(2.4)
exp
−
f (x) = √
2x
2π x3/2
354
11. Central Limit Theorems
For an explicit expression for the density of F , in other cases, see Feller
(1966), Section 17.6.
Definition 11.2.2: The parameter α in (2.2) is called the index of the
stable distribution.
Remark 11.2.4: The parameter λ is related to the behavior of the ratio
of the right tail of the distribution to the left tail through the relation
lim
x→∞
1+λ
1 − F (x)
=
,
F (−x)
(1 − λ)
(2.5)
where for λ = 1, the ratio on the right side of (2.5) is defined to be +∞.
Definition 11.2.3: A function L : (0, ∞) → (0, ∞) is called slowly varying
at ∞ if
L(cx)
= 1 for all 0 < c < ∞.
(2.6)
lim
x→∞ L(x)
A function f : (0, ∞) → (0, ∞) is called regularly varying at ∞ with
index α ∈ R, α )= 0 if f (x) = xα L(x) for all x ∈ (0, ∞) where L(·)
is slowly varying at ∞. The functions L1 (t) = log t, L2 (t) = log(log t),
L3 (t) = (log t)2 are slowly varying at ∞ but the function L4 (t) = tp is not
so for p )= 0.
There is a companion result to Theorem 11.2.1 giving necessary and sufficient conditions for convergence of normalized sums of iid random variables
to a stable distribution.
Theorem 11.2.3: Let F be a nondegenerate stable distribution with index
α, 0 < α < 2. Then in order that a sequence {Yn }n≥1 of iid random
variables admits a sequence of constants {an }n≥1 and {bn }n≥1 such that
$n
i=1 Yi − bn
−→d F,
(2.7)
an
it is necessary and sufficient that
lim
x→∞
exists and
P (Y1 > x)
≡ θ ∈ [0, 1]
P (|Y1 | > x)
P (|Y1 | > x) = x−α L(x),
(2.8)
(2.9)
where L(·) is a slowly varying function at ∞. If (2.8) and (2.9) hold, then
the normalizing constants {an }n≥1 and {bn }n≥1 may be chosen to satisfy
na−α
n L(an ) → 1
and
bn = nEY1 I(|Y1 | ≤ an ).
(2.10)
11.2 Stable distributions
355
Remark 11.2.5: The analog of Theorem 11.2.3 for the case α = 2, i.e.,
for the normal distribution is the following.
Theorem 11.2.4: Let {Yn }n≥1 be iid random variables. In order that
there exist constants {an }n≥1 and {bn }n≥1 such that
$n
Yi − bn
−→d N (0, 1),
an
i=1
(2.11)
it is necessary and sufficient that
x2 P (|Y1 | > x)
→0
EY12 I(|Y1 | ≤ x)
as
x → ∞.
(2.12)
Remark 11.2.6: Note that condition (2.12) holds if EY12 < ∞. However,
if P (|Y1 | > x) ∼ xC2 as x → ∞, then EY12 = ∞ and the classical$CLT (cf.
n
Corollary 11.1.2) fails. However, in this case, (2.12) holds and i=1 Yi is
asymptotically
normal with a suitable centering and scaling (different from
√
n) (Problem 11.20).
Here, only the proof of Theorem 11.2.1 will be given. Further, a proof
of Theorem 11.2.3, sufficiency part, is also outlined. For the rest, see Feller
(1966) or Gnedenko and Kolmogorov (1968). For proving Theorem 11.2.1,
the following result is needed.
Theorem 11.2.5: (Khinchine’s theorem on convergence of types). Let
{Wn }n≥1 be a sequence of random variables such that for some sequences
{αn }n≥1 ⊂ [0, ∞) and {βn }n≥1 ⊂ R, both Wn and αn Wn + βn converge
in distribution to nondegenerate distributions G and H on R, respectively.
Then limn→∞ αn = α and limn→∞ βn = β exist with 0 < α < ∞ and
−∞ < β < ∞.
Proof: Let {Wn$ }n≥1 be a sequence of random variables such that for
each n ≥ 1, Wn and Wn$ have the same distribution and Wn and Wn$
are independent. Then Yn ≡ Wn − Wn$ and Zn ≡ αn (Wn − Wn$ ) both
convergence in distribution to nondegenerate limits, say G̃ and H̃. Indeed
G̃ = G ∗ G and H̃ = H ∗ H, where ∗ denotes convolution. This implies that
{αn }n≥1 cannot have 0 or ∞ as limit points. Also if 0 < α ≤ α$ < ∞ are
two limit points of {αn }n≥1 , then H̃(x) = G̃( αx ) = G̃( αx! ) for all x. Since
G̃(·) is nondegenerate, α must equal α$ and so limn→∞ αn exists in (0, ∞).
!
This implies that limn→∞ βn exists in R.
Proof of Theorem 11.2.1: The ‘only if’ part follows from the definition
of F being stable, since one can take {Yn }n≥1 to be iid with distribution
F.
356
11. Central Limit Theorems
For the ‘if part,’ let {Yn }n≥1 be iid random variables such that there
exists constants {an }n≥1 and {bn }n≥1 such that as n → ∞
$n
i=1 Yi − bn
−→d F.
an
To show that F is stable, fix an integer r ≥ 1. Let {Xn }n≥1 be iid random
variables with distribution F . Then as k → ∞,
$kr
i=1 Yi − bkr
−→d X1 .
akr
Also, the left side above equals
r−1 )
j=0
$(j+1)k
i=jk+1
where αkr =
ak
Yi − bk
ak
akr ,
.
ηjk =
-)
.
r−1
rbk − bkr
ak
+
= αkr
ηjk + βkr ,
akr
akr
j=0
' "(j+1)k
i=jk+1
ak
Yi −bk
(
and βkr =
rbk −bkr
.
akr
say.
Since {ηjk :
j = 0, 1, . . . , r − 1} are independent and for each j, ηjk −→d Xj+1 as
k → ∞, it follows that as k → ∞,
Wk =
r−1
)
j=0
ηjk −→d
r−1
)
Xj+1 =
j=0
r
)
Xj .
j=1
Also, as k → ∞,
αkr Wk + βkr −→d X1 .
$r
Since F is nondegenerate, both X1 and j=1 Xj are nondegenerate random
variables. Thus, as k → ∞, Wk and αkr Wk + βkr converge in distribution
to nondegenerate random variables. Thus, by Khinchine’s theorem on convergence types proved above, it follows that αkr → αr$ and βkr → βr$ ,
0$< αr$ < ∞ and −∞ < βr$ < ∞. This yields that for each r ∈ N that
r
1
$
!
j=1 Xj has the same distribution as α! (X1 − βr ), i.e., X1 is stable.
r
Proof of the sufficiency part of Theorem 11.2.3: (Outline). The
proof is based on the continuity theorem. The characteristic function of
n
is
Tn ≡ Sna−b
n
' Sn −bn ( ' ' t (
(n
(n '
!
1
φn (t) = E eιt an
≡ 1 + hn (t)
= φ
e−ιtbn /an
an
n
'
(
!
where b$n = bn /n and hn (t) = n φ( atn )e−ιtbn /an − 1 . Let G(·) be the cdf
of Y1 . Then
% '
%
(
! ιtx
"
!
eιt(y−bn )/an − 1 dG(y) =
e − 1 µn (dx)
hn (t) = n
11.2 Stable distributions
357
where µn (A) = nP (Y1 ∈ b$n + an A), A ∈ B(R). If A = (u, v], 0 < u < v <
∞, then
"
!
"
!
nP Y1 ∈ b$n + an A = nP an u + b$n < Y1 ≤ an v + b$n
"
!
"
!
= nP Y1 > an u + b$n − nP Y1 > an v + b$n .
By (2.8)–(2.10),
nP (Y1 > an x) =
-
P (Y1 > an x)
P (Y1 > an )
.
nP (Y1 > an ) → θx−α for x > 0.
By using (2.10), it can be show that
b$n
an
EY1 I(|Y1 | ≤ an )
an
' a1−α L(a ) (
n
= O n
an
= o(1) as n → ∞.
=
Hence, it follows that
nP (Y1 > an u + b$n ) − nP (Y1 > an v + b$n ) → θ(u−α − v −α ).
Similarly, for A = (−v, −u],
nP (Y1 ∈ b$n + an A) → (1 − θ)(u−α − v −α ).
This suggests that hn (t) should approach
%
% ∞
(eιtx − 1)x−(α+1) dx + (1 − θ)α
θα
0
0
−∞
(eιtx − 1)|x|−(α+1) dx.
But there are integrability problems for |x|−(α+1) near 0 and so a more
careful analysis is needed. It can be shown that
% ∞'
ιtx ( (α+1)
lim hn (t) = ιtc + θα
eιtx − 1 −
x
dx
n→∞
1 + x2
0
% 0 '
ιtx ( −(α+1)
+ (1 − θ)α
eιtx − 1 −
|x|
dx
1 + x2
−∞
where c is a constant. The right side is continuous at t = 0 and so, the
result follows by the continuity theorem. For details, see Feller (1966). !
Remark 11.2.7: By the necessity part of Theorem 11.2.3, every stable
distribution F must satisfy
1 − F (x) = θx−α L(x)
F (−x) = (1 − θ)x−α L(x)
358
11. Central Limit Theorems
for large x where 0 ≤ θ ≤ 1 and L(·) is slowing varying at ∞ and 0 < α < 2.
This implies that F has moments of order p such that α > p. Distributions
satisfying the above tail condition are called heavy tailed and arise in many
applications. The Pareto distribution in economics is an example of a heavy
tail distribution.
Remark 11.2.8: One way to generate heavy tailed distributions is as
follows. If Y is a positive random variable such that there exist 0 < c < ∞
and 0 < p < ∞ satisfying
P (Y < y) ∼ cy p
as
y ↓ 0,
then the random variable X = Y −q has the property
P (X > x) = P (Y < x−1/q ) ∼ cx−p/q
as
x → ∞.
If p <
then X has heavy tails. Thus if {Yn }n≥1 are iid Gamma(1,2), then
$2q,
n
n−1 i=1 Yi−1 converges in distribution to a one sided Cauchy distribution
(Problem 11.15).
Definition 11.2.4: Let F and G be two probability distributions on R.
Then G is said to belong to the domain of attraction of F if there exist a
sequence of iid random variables {Yn }n≥1 with distribution G and constants
{an }n≥1 and {bn }n≥1 such that
$n
i=1 Yi − bn
−→d F.
an
Theorem 11.2.1 says that the only nondegenerate distributions F that admit a nonempty domain of attraction are the stable distributions.
11.3 Infinitely divisible distributions
Definition 11.3.1: A random variable X (and its distribution) is called
infinitely divisible if for each integer
k ∈ N, there exist iid random variables
$k
Xk1 , Xk2 , . . . , Xkk such that j=1 Xkj has the same distribution as X.
Examples include constants (degenerate distributions), normal, Poisson,
Cauchy, and Gamma distributions. But distributions with bounded support
cannot be infinitely divisible unless they are degenerate. In fact, if X is
infinitely divisible satisfying P (|X| ≤ M ) = 1 for
" ∞, then
! some M <
= 1 and
the Xki ’s in the above definition must satisfy P |Xk1 | < M
k
M2
M2
2
so Var(Xk1 ) ≤ EXk1 ≤ k2 implying Var(X) = kVar(Xk1 ) ≤ k for each
k ≥ 1. Hence Var(X) must be zero, and the random variable X is a constant
w.p. 1.
The following results are easy to establish.
11.3 Infinitely divisible distributions
359
Theorem 11.3.1: (a) If X and Y are independent and infinitely divisible,
then X + Y is also infinitely divisible. (b) If Xn is infinitely divisible for
each n ∈ N and Xn −→d X, then X is infinitely divisible.
Proof: (a) Follows from the definition. (b) For each k ≥ 1 and n ≥ 1,
there exist iid random variables Xnk1 , Xnk2 , . . . , Xnkk such that Xn and
$
k
j=1 Xnkj have the same distribution. Now fix k ≥ 1. Then for any y > 0,
"k
!
P (Xnk1 > y) = P (Xnkj > y for all j = 1, ..., k) ≤ P (Xn > ky)
and similarly,
!
"k
P (Xnk1 < −y) ≤ P (Xn ≤ ky).
Since Xn −→d X, the distributions of {Xn }n≥1 are tight and so are
∞
k
{Xnk1 }∞
n=1 . So if Fk is a weak limit point of {Xnk1 }n=1 and if {Ykj }j=1 are
$k
iid with distribution Fk , then X and j=1 Ykj have the same distribution
and so X is infinitely divisible.
!
A large class of infinitely divisible distributions are generated by the
compound Poisson family.
Definition 11.3.2: Let {Yn }n≥1 be iid random variables and let N be
a Poisson (λ) random variable, independent of the {Yn }n≥1 . The random
$N
variable X ≡ i=1 Yi is said to have a compound Poisson distribution.
Theorem 11.3.2: A compound Poisson distribution is infinitely divisible.
Proof: Let X be a random variable as in Definition 11.3.2. For each
k ≥ 1, let {Ni }ki=1 be iid Poisson random variables with mean λk that are
independent of {Yn }n≥1 . Let
Tj+1
Xkj =
)
i=Tj +1
Yi , 1 ≤ j ≤ k
$j−1
k
where T1 = 0, Tj =
i=1 Ni , 2 ≤ j ≤ k. Then {Xkj }j=1 are iid and
$k
j=1 Xkj and X are identically distributed and so X is infinitely divisible.
!
Although the converse to the above is not valid, it is known that every
infinitely divisible distribution is the limit of a sequence of centered and
scaled compound Poisson distributions. This is a consequence of a deep result giving an explicit formula for the characteristic function of an infinitely
divisible distribution which is stated below. For a proof of this result (stated
below), see Feller (1966) and Chung (1974) or Gnedenko and Kolmogorov
(1968).
Theorem 11.3.3: (Levy-Khinchine representation theorem). Let X be an
infinitely divisible random variable. Then its characteristic function φ(t) ≡
360
11. Central Limit Theorems
E(eιtX ) is of the form
.
% '
ιtx (
t2
ιtx
e −1−
µ(dx) ,
φ(t) = exp ιtc − β +
2
1 + x2
R
where
& c ∈ R, β > 0 and µ is a measure on (R, B(R)) such that µ({0}) = 0
and |x|≤1 x2 µ(dx) < ∞ and µ({x : |x| > 1}) < ∞.
Corollary 11.3.4: Stable distributions are infinitely divisible.
Proof: The normal distribution corresponds to the case µ(·) ≡ 0 and
β > 0. For nonnormal stable laws with index α < 2, set β = 0 and µ(dx) =
!
θx−(α+1) dx for x > 0 and (1 − θ)|x|−(α+1) dx for x < 0.
Corollary 11.3.5: Every infinitely divisible distribution is the limit of
centered and scaled compound Poisson distributions.
Proof: Since the normal distribution can be obtained as a (weak) limit of
centered and scaled Poisson distributions, it is enough to consider the case
when β = 0, c = 0. Let µn (A) = µ(A ∩ {x : |x| > n−1 }), A ∈ B(R) and let
-% '
.
ιtx (
φn (t) = exp
µ
eιtx − 1 −
(dx)
n
1 + x2
- 8%
9.
= exp λn
(eιtx − 1)µ̃n (dx) − ιtcn
where
µ̃n (A)
cn
=
µn (A)/µn (R), A ∈ B(R), λn = µn (R),
%
x
=
µ̃n (dx).
1 + x2
and
Thus, φn (·) is a compound Poisson characteristic function centered at cn ,
with Poisson parameter λn and with the compounding distribution µ̃n .
By the DCT, φn (t) → φ(t) for each t ∈ R. Hence by the Levy-Cramer
continuity theorem, the result follows.
!
Another characterization of infinitely divisible distributions is similar to
that of stable distributions. Recall that a stable distribution is one that is
the limit of normalized sums of iid random variables and conversely.
Theorem 11.3.6: A random variable X is infinitely divisible iff it is the
limit in distribution of a sequence {Xn }n≥1 where for each n, Xn is the
sum of n iid random variables {Xnj }nj=1 .
Thus X is infinitely divisible iff it is the limit in distribution of the row
sums of a triangular array of random variables where in each row, all the
random variables are iid.
11.4 Refinements and extensions of the CLT
361
Proof: The ‘only if’ part follows from the definition. For the ‘if’ part, fix
k ≥ 1. Then Xk·n can be written as
Xk·n =
k
)
Yjn ,
j=1
$jn
where Yjn = r=(j−1)n+1 Xk·n,r j = 1, 2, . . . , k. By hypothesis, Xk·n −→d
X. Now, {Yjn }kj=1 are iid and it can be shown, as in the proof of Theorem
in
11.3.1, that for each i = 1, . . . , k, {Yin }∞
n=1 are tight and hence, converges
$k
distribution to a limit Yi through a subsequence, and that X and i=1 Yi
have the same distribution. Thus, X is infinitely divisible.
!
11.4 Refinements and extensions of the CLT
This section is devoted to studying some refinements and generalizations
of the basic CLT results, such as the rate of convergence in the CLT, Edgeworth expansions and large deviations for sums of iid random variables,
and also a generalization of the basic CLT to a functional version.
11.4.1
The Berry-Esseen theorem
Let X1 , X2 , . . . be a sequence of iid random variables with EX1 = µ and
Var(X1 ) = σ 2 ∈ (0, ∞). Then, Corollary 11.1.2 and Polyā’s theorem imply
that
J '
J
(
J Sn − nµ
J
J
√
∆n ≡ sup JP
≤ x − Φ(x)JJ → 0 as n → ∞,
(4.1)
σ
n
x∈R
where Sn = X1 + · · · + Xn , n ≥ 1, and Φ(·) is the cdf of the N (0, 1)
distribution. A natural question that arises in this context is “how fast does
∆n go to zero?” Berry (1941) and Esseen (1942) independently proved that
∆n = O(n−1/2 ) as n → ∞, provided E|X1 |3 < ∞. This result is referred
to as the Berry-Esseen theorem.
Theorem 11.4.1: (The Berry-Esseen theorem). Let {Xn }n≥1 be a sequence of iid random variables with EX1 = µ, Var(X1 ) = σ 2 ∈ (0, ∞) and
E|X1 |3 < ∞. Then, for all n ≥ 1,
J '
J
(
J Sn − nµ
J
E|X1 − µ|3
J
√
√
∆n ≡ sup JP
≤ x − Φ(x)JJ ≤ C ·
(4.2)
σ n
σ3 n
x∈R
where C ∈ (0, ∞) is a constant.
The value of the constant C ∈ (0, ∞) does not depend on n and on
any characteristics of the distribution
P 8 of X19. Indeed, the proof of Theorem
11.4.1 below shows that C ≤
2
π
·
5
2
+
12
π
< 5.05.
362
11. Central Limit Theorems
The following result plays an important role in the proof of Theorem
11.4.1.
Lemma
11.4.2: (A smoothing inequality). Let& F be a cdf on R with
&
xdF (x) = 0 and characteristic function ζ(t) = exp(ιtx)dF (x), t ∈ R.
Let G : R
function with
! → R be a differentiable
"
& derivative g such that
lim|x|→∞ F (x) − G(x) = 0. Suppose that (1 + |x|)|g(x)|dx < ∞,
&∞ r
x g(x)dx = 0 for r = 0, 1 and |g(x)| ≤ C0 for all x ∈ R, for some
−∞
C0 ∈ (0, ∞). Then, for any T ∈ (0, ∞),
%
J
J
1 T |ζ(t) − ξ(t)|
24C0
J
J
sup JF (x) − G(x)J ≤
dt +
π −T
|t|
πT
x∈R
where ξ(t) =
&∞
−∞
(4.3)
exp(ιtx)g(x)dx, t ∈ R.
For a proof of Lemma 11.4.2, see Feller (1966).
The next lemma deals with an expansion of the logarithm of the characteristic function of X, in a neighborhood of zero. Let z = reiθ , r ∈ (0, ∞),
θ ∈ [0, 2π) be the polar representation of a nonzero complex number z.
Then, the (principal branch of the) complex logarithm of z is defined as
log z = log r + iθ.
(4.4)
The function log z is infinitely differentiable on the set {z ∈ C : z = reiθ , r ∈
(0, ∞), 0 ≤ θ < 2π} and has a convergent Taylor’s series expansion around
1 on the unit disc:
log(1 + z) =
∞
)
k=1
z k /k
for |z| < 1.
(4.5)
Lemma 11.4.3: Let Y be a random variable with EY = 0, σ̃ 2 = EY 2 ∈
function φY (t) = E exp(ιtY ),
(0, ∞), ρ̃ = E|Y |3 < ∞B and characteristic
C
t ∈ R. Then, for all t ∈ − σ̃1 , σ̃1 ,
and
J
J
2 2J
J
J log φY (t) + t σ̃ J ≤ 5 |t|3 ρ̃
J
2 J 12
J
8
9JJ
2
3
J
J log φY (t) − (ιt) σ̃ 2 + (ιt) EY 3 J
J
J
2!
3!
0.
/
t4 σ̃ 4
|tY |3 (tY )4
,
.
+
≤ E min
3
24
4
(4.6)
(4.7)
11.4 Refinements and extensions of the CLT
363
Proof: Note that by Lemma 10.1.5,
2
2
J J !
J
"J
JφY (t) − 1J = JE exp(ιtY ) − 1 − ιtY J ≤ t EY ≤ 1
2
2
(4.8)
|t|C ≤ σ̃ −1 . In particular, log φY (t) is well defined for all t ∈
Bwhenever
−1
−1
− σ̃ , σ̃
.
By (4.5), (4.8), and Lemma 10.1.5, for |t| ≤ σ̃ −1 ,
J
J
2 2J
J
J log φY (t) + t σ̃ J
J
2 J
J
J
8
J
"9 t2 σ̃ 2 J
!
J
= JJ log 1 + φY (t) − 1 +
2 J
J
J
∞
8
J
Jk
t2 σ̃ 2 9JJ ) JJ
J
≤ JφY (t) − 1 −
+
φY (t) − 1J /k
J
2
k=2
J
J
∞
J
|tY |3 JJ 1 ' t2 σ̃ 2 (2 ) ' 1 (k−2
≤ E JJ(tY )2 ∧
+
3! J 2
2
2
k=2
3
≤
4 4
|t| ρ t σ̃
+
.
6
4
Now using the bounds |tσ̃| ≤ 1 and σ̃ 3 = (EY 2 )3/2 ≤ E|Y |3 = ρ̃, one
gets (4.6). The proof of (4.7) is similar and hence, it is left as an exercise
(Problem 11.27).
!
Proof of Theorem 11.4.1: W.l.o.g., set µ = 0 and σ = 1. Then,
X1 , X2 , . . . are iid zero mean, unit variance random variables. Let X =d X1 ,
ρ = E|X|3 and φX (·) denote the characteristic function of X. It!is easy to
"
Sn
≤x ,
check that the conditions of Lemma 11.4.2 hold with F (x) = P √
n
G(x) = Φ(x), x ∈ R, and C0 = √12π . Hence, by Lemma 11.4.2, with
√
T = n/ρ,
J n t
J
%
−t2 /2 J
J
24ρ
1 T φX ( √n ) − e
dt + √
∆n ≤
.
(4.9)
π −T
|t|
π 2πn
By Lemma 11.4.3 (with Y =
X1 −µ
σ
and t replaced by
√t ),
n
J
' t ( t2 J
J
J
+ J
rn (t) ≡ Jn log φX √
2
n
J
' t ( ' t (2 σ 2 J
J
J
= nJ log φX √
+ √
J
2
n
n
5 ρ|t|3
· √
≤
12
n
for all |t| ≤
√
n, n ≥ 1.
(4.10)
364
11. Central Limit Theorems
√
Since ρ = E|X1 |3 ≥ (EX12 )3/2 = σ 3 = 1, |T | ≤ n. Hence, using the
inequality |ez − 1| ≤ |z|e|z| for all z ∈ C and (4.10), one gets
J
J ' t (
2
J
J n
− e−t /2 J
JφX √
n
J
J
' t2 (
' t ( t2 (
'
J
J
− 1J · exp −
+
= J exp n · log φX √
2
2
n
' t2 (
!
"
≤ |rn (t)| exp |rn (t)| · exp −
2
' t2 8
5ρ|t| 9(
5ρ
3
√ |t| exp −
1− √
≤
2
12 n
6 n
'
2(
t
5ρ
√ |t|3 exp −
(4.11)
≤
12
12 n
√
! t2 "
&∞ 2
√ ≤ 1, i.e., for all |t| ≤ T , n ≥ 1. Since
dt = 6 2π,
t exp − 12
for all ρ|t|
−∞
n
P B
C
the theorem follows from (4.9) and (4.11) with C = π2 52 + 12
!
π .
A striking feature of Theorem 11.4.1 is that the upper bound on ∆n in
(4.2) is valid for all n ≥ 1. Also, under the conditions of Theorem 11.4.1,
the rate O( √1n ) in (4.2) is the best possible in the sense that there exist
random variables for which ∆n is bounded below by a constant multiple of
−nµ
√1 (cf. Problem 11.29). Edgeworth expansions of the cdf of Sn √
, to be
n
σ n
developed in the next section, can be used to show that for certain random
variables X1 satisfying additional moment and symmetry conditions, ∆n
may go to zero at a faster rate. (For example, consider X1 ∼ N (µ, σ 2 ).)
For iid sequences {Xn }n≥1 with E|X1 |2+δ < ∞ for some δ ∈ (0, 1],
Theorem 11.4.1 can be strengthened to show that ∆n decreases at the rate
O(n−δ/2 ) as n → ∞ (cf. Chow and Teicher (1997), Chapter 9).
11.4.2
Edgeworth expansions
Recall from Chapter 10 that a random variable X1 is called lattice if there
exist a ∈ R and h ∈ (0, ∞) such that
!
"
P X1 ∈ {a + ih : i ∈ Z} = 1.
(4.12)
The largest h satisfying (4.12) is called the span of (the distribution of)
X1 . A random variable X1 is called nonlattice if it is not a lattice random
variable. From Proposition 10.1.1, it follows that X1 is nonlattice iff
J
J
JE exp(ιtX1 )J < 1 for all t )= 0.
(4.13)
The next result gives an Edgeworth expansion for the cdf of
an error of order o(n
−1/2
) for nonlattice random variables.
Sn √
−nµ
σ n
with
11.4 Refinements and extensions of the CLT
365
Theorem 11.4.4: Let {Xn }n≥1 be a sequence of iid random variables with
EX1 = µ, Var(X1 ) = σ 2 ∈ (0, ∞) and E|X1 |3 < ∞. Suppose, in addition,
that X1 is nonlattice, i.e., it satisfies (4.13). Then,
J '
9JJ
( 8
J Sn − nµ
1
µ3 2
J
√
≤ x − Φ(x) − √ · 3 (x − 1)φ(x) JJ
sup JP
σ n
n 6σ
x∈R
= o(n−1/2 )
where φ(x) =
as
2
√1 e−x /2 ,
2π
(4.14)
n → ∞,
x ∈ R and µ3 = E(X1 − µ)3 .
The function
1
µ3
en,2 (x) ≡ Φ(x) − √ · 3 (x2 − 1), x ∈ R
n 6σ
(4.15)
−nµ
is called a second order Edgeworth expansion for Tn ≡ Sσn √
. The above
n
theorem shows that the cdf of the normalized sum Tn can be approximated
by the second order Edgeworth expansion with accuracy o(n−1/2 ). It can
be shown that if E|X1 |4 < ∞ and X1 satisfies Cramer’s condition:
J
J
lim sup JE exp(ιtX1 )J < 1,
(4.16)
|t|→∞
then the bound on the right side of (4.14) can be improved to O(n−1 ). Note
that for a symmetric random variable X1 , having a finite fourth moment
and satisfying (4.16), the second term in en,2 (x) is zero and the rate of
normal approximation becomes O(n−1 ). Higher order Edgeworth expansions for Tn can be derived using (4.16) and arguments similar to those in
the proof of Theorem 11.4.4, but the form of the expansion becomes more
complicated. See Petrov (1975), Bhattacharya and Rao (1986), and Hall
(1992) for detailed accounts of the Edgeworth expansion theory.
Proof of Theorem 11.4.4: W.l.o.g., let µ = 0 and σ = 1. In Lemma
11.4.2, take F (x) = P (Tn ≤ x), and G(x) = en,2 (x), x ∈ R. Then, it is easy
to verify that the conditions of Lemma 11.4.2 hold with g(x) = gn (x) ≡
φ(x) + 6µ√3n (x3 − 3x)φ(x), x ∈ R. Using repeated differentiation on both
sides of the identity (inversion formula):
% ∞
2
2
e−x /2
1
√
e−ιtx · e−t /2 dt, x ∈ R,
=
2π −∞
2π
one can show that
e−x /2
1
d3 ' e−x /2 (
−(x − 3x) √
= 3 √
=
dx
2π
2π
2π
2
2
3
x ∈ R. Hence,
ξn (t) ≡ ξ(t) =
%
2
eιtx gn (x)dx = e−t
/2
8
%
∞
2
e−ιtx (−ιt)3 e−t
/2
dt,
−∞
9
µ3
1 + √ (ιt)3 , t ∈ R.
6 n
(4.17)
366
11. Central Limit Theorems
√
Next, let # ∈ (0, 1) be given. Then, set T = c n where
c$ ≡ c = 24 · sup
3'
(
4S
|µ3 | 3
|x − 3x| φ(x) : x ∈ R #.
1+
6
Then, by Lemma 11.4.2 and (4.17),
∆n,2
J '
J
(
J Sn − nµ
J
J
√
≡ sup JP
≤ x − en,2 (x)JJ
σ n
x∈R
J
J ! "
√
J
J
t
n
%
#
1 c n JφX √n − ξ(t)J
≤
dt + √ ,
√
π −c n
|t|
n
(4.18)
where φX (t) = E exp(ιtX), t ∈ R and X =d X1 . Let ρ = E|X1 |3 . Let M ∈
$
(1, ∞) be such that E|X1 |3 I(|X1 | > M ) ≤ #/2. Then, setting δ = 2M
ρ , it
√
|t|
4√
3
follows that E|X1 | n I(|X1 | ≤ M ) ≤ M δE|X1 | ≤ #/2 for all |t| ≤ δ n.
√
Hence, for all |t| ≤ δ n, by (4.7) of Lemma 11.4.3,
J
J
' t ( 8 (ιt)2
J
µ3 ' ιt (3 9JJ
J
√
rn,2 (t) ≡ nJ log φX √
+
−
J
2n
6
n
n
3 |X |3 |X |4 |t| 4(
8J t J3 '
t4 9
J
J
1
1
√
,
≤ n · J √ J E min
+ 2
3
24
4n
n
n
'
(4
"
|t|
|t|3 83 !
√
E |X1 |3 I(|X1 | > M ) + E |X1 |4 √ I(|X1 | ≤ M )
≤
3 n
n
|t| 3 9
+√
n 4
|t|3 #
(4.19)
≤ √ .
n
Also, note that for any complex numbers z, w,
|ez − 1 − w|
≤
≤
|ez − ew | + |ew − 1 − w|
C
!
"
B
1
|z − w| + |w|2 exp |z| ∨ |w| .
2
(4.20)
√
Hence, by (4.10), (4.19), and (4.20), it follows that for all |t| ≤ δ n,
J ' t (
J
J n
J
− ξn (t)J
JφX √
n
J
J
' t ( t2 (
'
J 2
J
µ3
− 1 − √ (ιt)3 JJe−t /2
= JJ exp n log φX √
+
2
n
6 n
J2 9
J( 2
J µ
'
8
1 JJ µ3
J
J 3 J
≤ rn,2 (t) + J √ (ιt)3 J exp rn (t) ∨ J √ t3 J e−t /2 .
2 6 n
6 n
11.4 Refinements and extensions of the CLT
√
5
≤ 12
|t|2 for all |t| ≤ δ n,
% δ√n JJφn ( √t ) − ξ (t)JJ
n
X
n
dt
√
|t|
−δ n
% δ √n 8
9
' t2 (
µ23
#
2
5
√
exp
−
≤
·
|t|
+
dt
·
t
√
72n
12
n
−δ n
#
≤ C1 · √
n
Since rn (t) ∨
367
|µ3 t|3
√
6 n
(4.21)
for some C1 ∈ (0, ∞). By (4.13),
3J ' t (Jn √
√ 4
J
J
sup JφX √ J : δ n < |t| < c n ≤ θn
n
for some θ ∈ (0, 1). Hence,
J
J n! t "
%
Jφ √ − ξn (t)J
X
n
dt
|t|
%
'
(
2
ρ
1
2θn log(c/δ) + √
e−t /2 1 + √ |t|3 dt,
√
δ n |t|>δ n
n
√
√
δ n<|t|<c n
≤
=
O(θ1n )
as
n→∞
(4.22)
for some θ1 = θ1 (δ) ∈ (0, 1). Since # > 0 is arbitrary, the result follows from
(4.18), (4.21), and (4.22).
!
This section concludes with an analog of Theorem 11.4.4 for lattice random variables. Note that for X1 satisfying P (X1 ∈ {a+jh
: : j ∈ Z}) = 1,√the
√
+ jh/σ n :
normalized sample sum Tn takes values in the lattice na−nµ
σ n
;
j ∈ Z . Hence, the cdf of Tn is a step function. The second order Edgeworth
expansion for Tn is no longer a smooth function — the effect of the jumps
of the cdf of Tn is now accounted for by adding a discontinuous function
to the expansion en,2 . Let
1
Q(x) = x − <x= − , x ∈ R
(4.23)
2
where <x= denotes the largest integer not exceeding x, x ∈ R. It is
B easy to
C
check that Q(x) is a periodic function of period 1 with values in − 12 , 12 ,
Q(x) is right continuous, and it has jumps of size 1 at the integer values.
The second order Edgeworth expansions for Tn in the lattice case involves
the function Q, as shown below.
Theorem 11.4.5: Let {Xn }n≥1 be a sequence of lattice random variables
with span h > 0. Suppose that E|X1 |3 < ∞ and σ 2 = Var(X1 ) ∈ (0, ∞).
Then,
J '
J
(
J Sn − nµ
J
J
√
(4.24)
sup JP
≤ x − ẽn,2 (x)JJ = o(n−1/2 ) as n → ∞
σ
n
x∈R
368
11. Central Limit Theorems
where µ = EX1 , Sn = X1 + · · · + Xn , n ≥ 1,
2
1
√
1 µ3 2
h ' n[nxσ − nx0 ] (
ẽn,2 (x) = Φ(x) − √
(x − 1) + Q
φ(x), (4.25)
σ
h
n 6σ 3
x ∈ R, Q(x) is as in (4.23), φ(x) = √12π exp(−x2 /2), x ∈ R and x0 is a
real number satisfying P (X1 = µ + x0 ) > 0.
For a proof of Theorem 11.4.5 and for related results, see Esseen (1945)
and Bhattacharya and Rao (1986).
11.4.3
Large deviations
Let X1 , X2 , . . . be iid random variables with EX1 = µ. The SLLN implies
that for any x > µ,
$n
−1
P (X̄n > x) → 0
as
n → ∞,
(4.26)
σ 2 ) for
where X̄n = n
i=1 Xi , n ≥ 1. If Xi ’s were distributed as N!(µ,
√
"
some µ ∈ R and σ 2 ∈ (0, ∞), then the left side of (4.26) equals Φ n(x−µ)
.
σ
Now note that
' √n(x − µ) (
√ C
B
√
∼ − log exp(−nc21 )/( n c1 2n) ∼ nc21 ,
− log Φ
σ
where c1 = (x − µ)/σ ∈ (0, ∞). Large deviation bounds on the probability
P (X̄n > x) assert that a similar behavior holds for many distributions
other than the normal distribution on the (negative) logarithmic scale.
The main result of this section is the following.
Theorem 11.4.6: Let {Xn }n≥1 be a sequence of iid nondegenerate random
variables with
φ(t) ≡ EetX1 < ∞ for all t > 0.
(4.27)
Let µ = EX1 . Then, for all x ∈ (µ, θ)
lim n−1 log P (X̄n ≥ x) = −γ(x),
n→∞
(4.28)
where
γ(x) = sup{tx − log φ(t)} and θ = sup{x ∈ R : P (X1 ≤ x) < 1}. (4.29)
t>0
Note that under (4.27), EX1+ < ∞ and hence, µ ≡ EX1 is well defined,
and µ ∈ [−∞, ∞).
For proving the theorem, the following results are needed.
Lemma 11.4.7: Let X1 be a nondegenerate random variable satisfying
(4.27). Let µ = EX1 and let γ(x), θ be as in (4.29). Then,
11.4 Refinements and extensions of the CLT
369
(i) the function φ(t) is infinitely differentiable on (0, ∞) with φ(r) (t), the
rth derivative of φ(t) being given by
'
(
φ(r) (t) = E X1r etX1 , t ∈ (0, ∞), r ∈ N,
φ(1) (t)
t→∞ φ(t)
(ii) lim φ(t) = 1, lim φ(1) (t) = µ, and lim
t↓0
t↓0
(4.30)
= θ,
(iii) for every x ∈ (µ, θ), there exists a unique solution ax ∈ R to the
equation
x = φ$ (ax )/φ(ax )
(4.31)
such that γ(x) = xax − log φ(ax ).
Proof: Let F denote the cdf of X1 . (i) Note that for any h )= 0,
B
C
h−1 φ(t + h) − φ(t) =
%
∞
−∞
ehx − 1 tx
· e dF (x).
h
J hs J
As h J→ 0, the
integrand converges xetx for all x, t. Also, J e h−1 J ≤
J
$∞
J k−1 xk J/k! ≤ |x|e|hx| for all h, x. Hence, for any x ∈ R, t ∈ (0, ∞),
k=1 h
and 0 < |h| < t/2, the integrand is bounded above by
|x|e|hx| etx
Since
&
= |x|e(t−|h|)x I(−∞,0) (x) + |x|e(t+|h|)x I(x > 0)
≤
≡
|x|e−t|x|/2 I(−∞,0) (x) + |x|e3tx I(0,∞) (x)
g(x), say.
g(x)dF (x) < ∞, by the DCT, it follows that
φ(t + h) − φ(t)
lim
h→0
h
exists and equals
%
(4.32)
xetx dF (x)
for all t ∈ (0, ∞). Thus, φ(t) is differentiable on (0, ∞) with φ(1) (t) =
EX1 etX1 , t ∈ (0, ∞). Now, using induction and similar arguments, one can
complete the proof of part (i) (Problem 11.34).
Next consider (ii). Since etx ≤ I(−∞,0] (x) + ex I(0,∞) (x) for all x ∈ R,
t ∈ (0, 1), by the DCT, the first relation follows. For the second, note that
|x|etx I(−∞,0) (x) ↑ |x|I(−∞,0] (x)
as
t↓0
(4.33)
and |x|etx ≤ |x|ex for all 0 < t ≤ 1, x > 0. Hence, applying the MCT for
x ∈ (−∞, 0] and the DCT for x ∈ (0, ∞), one obtains the second limit.
Derivation of the third limit is left as an exercise (Problem 11.35).
370
11. Central Limit Theorems
To prove part (iii), fix x ∈ (µ, θ) and let γ(t) = tx − log φ(t), t ≥ 0. Then,
for t ∈ (0, ∞),
φ(1) (t)
φ(t)
(2)
φ (t) ' φ(1) (t) (2
γ (2) (t) =
−
= Var(Yt ),
(4.34)
φ(t)
φ(t)
&y
where Yt is a random variable with cdf P (Yt ≤ y) = −∞ etu dF (x)/φ(t),
y ∈ R. Since X1 is nondegenerate, so is Yt (for any t ≥ 0) and hence,
Var(Yt ) > 0. As a consequence, the second derivative of the function γ(t)
is positive. And the minimum of γ(t) over (0, ∞) is attained by a solution
to the equation γ (1) (t) = 0, i.e., by t = ax satisfying (4.30). That such a
solution exists and is unique follows from part (ii) and the facts that x > µ,
γ (1) (t)
φ(1) (0+)
φ(0)
=
x−
= µ (by (ii)), and that
φ(1) (t)
φ(t)
is continuous and strictly increasing
on (0, ∞) (as for any t ∈ (0, ∞), the derivative of
γ
(2)
φ(1) (t)
φ(t)
(t), which is positive by (4.34)). This proves part (iii).
coincides with
!
Lemma 11.4.8: Let {Xn }n≥1 be as in Theorem 11.4.6. For t ∈ (0, ∞),
let {Yt,n }n≥1 be a sequence of iid random variables with cdf
% y
P (Yt,1 ≤ y) =
etu dF (u)/φ(t), y ∈ R,
−∞
where F is the cdf of X1 . Let νn and λn denote the probability distributions
of Sn ≡ X1 + · · · + Xn and Tn,t = Yt,1 + · · · + Yt,n , n ≥ 1. Then, for each
n ≥ 1,
dνn
νn ; λn and
(x) = e−tx φ(t)n , x ∈ R.
(4.35)
dλn
Proof: The proof is by induction. Clearly, the assertion holds for n = 1.
Next, suppose that (4.35) is true for some r ∈ N and let n = r + 1. Then,
for any A ∈ B(R),
!
"
νn (A) = P X1 + · · · + Xn ∈ A
% ∞
!
"
=
P X1 + · · · + Xn−1 ∈ A − x dF (x)
−∞
% ∞%
9
8 dν
n−1
(u) dλn−1 (u)dF (x)
=
−∞ A−x dλn−1
% ∞%
e−tu φ(t)n−1 dλn−1 (u)dF (x)
=
−∞ A−x
% ∞%
e−t(u+x) dλn−1 (u)dλ1 (x)
= [φ(t)]n
−∞ A−x
%
!
"
e−tu λn−1 ∗ λ1 (dν),
= [φ(t)]n
A
11.4 Refinements and extensions of the CLT
371
where ∗ denotes convolution. Since λn−1 ∗ λ1 = λn , the result follows.
!
Proof of Theorem 11.4.6: Fix x ∈ (µ, θ). Note that by Markov’s inequality, for any t > 0, n ≥ 1,
(
'
P (X̄n ≥ x) = P etX̄n ≥ etx
(
'
≤ e−tx E etX̄n
!
"
= exp − tx + n log φ(t/n) .
Hence,
!t"
!
"
t
n−1 log P X̄n ≥ x ≤ −x · + log φ
for all t > 0, n ≥ 1
n
n
!
"
⇒ lim sup n−1 log P X̄n ≥ x ≤ inf {−xt + log φ(t)} = −γ(x). (4.36)
t>0
n→∞
This yields the upper bound. Next it will be shown that
!
"
lim inf n−1 log P X̄n ≥ x ≥ −γ(x).
n→∞
(4.37)
To that end, let {Yt,n }n≥1 , νn , and λn be as in Lemma 11.4.8. Also, let
ax be as in (4.30). Then, for any y > x, t ∈ (ax , ∞), and n ≥ 1, by Lemma
11.4.8,
!
"
!
"
P X̄n ≥ x = νn [nx, ∞)
%
=
e−tu φ(t)n du
[nx,∞)
%
≥
e−tu φ(t)n du
&
≥
[nx,ny]
n −tny
φ(t) e
!
"
λn [nx, ny] .
(4.38)
Note that EYt,1 = udλ1 (u) = φ(1) (t)/φ(t). Since φ(1) (·)/φ(·) is strictly
increasing and continuous on (0, ∞), given y > x, there exists a t = ty ∈
(ax , ∞) such that
φ(1) (t)
φ(1) (ax )
y>
>
= x.
(4.39)
φ(t)
φ(ax )
By the WLLN, for any y > x and t satisfying (4.39),
.
!
"
Yt,1 + · · · + Yt,n
λn [nx, ny] = P x ≤
≤y
n
→ 1 as n → ∞.
Hence, from (4.38), it follows that
!
"
lim inf n−1 log P X̄n ≥ x ≥ −ty + log φ(t)
n→∞
372
11. Central Limit Theorems
for all y > x and all t ∈ (ax , ∞) satisfying (4.39). Now, letting t ↓ ax
first and then y ↓ x, one gets (4.37). This completes the proof of Theorem
11.4.6.
!
Remark 11.4.1: If (4.27) holds and θ < ∞, then
!
" B
Cn
P X̄n ≥ θ = P (X1 = θ)
so that
!
"
lim n−1 log P X̄n ≥ θ = log P (X1 = θ).
n→∞
In this case, (4.28) holds for x = θ with γ(θ) = − log P (X1 = θ). For x > θ,
(4.28) holds with γ(x) = +∞.
Remark 11.4.2: Suppose that there exists a t0 ∈ (0, ∞) such that, instead
of (4.27), the following condition holds:
=
φ(t)
+∞ for all t > t0
< ∞
for all t ∈ (0, t0 ),
and φ$ (t)/φ(t) increases to a finite limit θ0 as t ↑ t0 . Then, θ must be +∞.
In this case, it can be shown that (4.28) holds for all x ∈ (µ, θ0 ) (with
the given definition of γ(x)) and that (4.28) holds for all x ∈ [θ0 , ∞), with
γ(x) ≡ t0 x − log φ(t0 ). See Theorem 9.6, Chapter 1, Durrett (2004).
11.4.4
The functional central limit theorem
2
Let {X
$i }ni≥1 be iid random variables with EX1 = 0, EX1 = 1. Let S0 = 0,
Sn = i=1 Xi , n ≥ 1. The central limit theorem says that as n → ∞,
Sn
Wn ≡ √ −→d N (0, 1).
n
Now let
Wn
and for any
j
n
'j (
1
= √ Sj , j = 0, 1, 2, . . . , n
n
n
j+1
n ,
j = 0, 1, 2, . . . , n, let
(
'
j+1
j
'j ( '
(
(
)
−
W
(
)
W
n
n
n
n
j
Wn (t) = Wn
+ t−
1
n
n
n
≤t<
(4.40)
(4.41)
! "
be the function obtained by linear interpolation of {Wn nj : 0 ≤ j ≤ n}
on [0, 1]. Then, Wn (·) is a random element of the metric space S ≡ C[0, 1]
of all real valued continuous functions on [0, 1] with the supremum metric
ρ(f, g) ≡ sup{|f (t) − g(t)| : 0 ≤ t ≤ 1}. Let µn (·) be the probability
distribution induced on C[0, 1] by Wn (·).
11.4 Refinements and extensions of the CLT
373
By an application of the multivariate CLT it can be shown that for any
k ∈ N and any 0 ≤ t1 ≤ t2 ≤ · · · ≤ tk ≤ 1, the joint distribution of
'
(
Wn (t1 ), Wn (t2 ), . . . , Wn (tk )
will converge to a k-variate normal distribution
with mean vector
"
!
(0, 0, . . . , 0) and covariance matrix Σ ≡ (σij ) , where σij = ti ∧ tj . It
turns out that a C[0, 1] valued random variable W (·), called the standard
Brownian motion on [0, 1] (SBM [0, 1]) can be defined such that for any
k ≥ 1 and any 0 ≤ t1 ≤ t2 ≤ · · · ≤ tk ≤ 1, (W (t1 ), . . . , W (tk )) has a
k-variate normal distribution with mean vector (0, 0, . . . , 0) and covariance
matrix Σ as above (see Chapter 15). Thus,
'
(
(
'
Wn (t1 ), . . . , Wn (tk ) −→d W (t1 ), . . . , W (tk ) .
If µ(·) is the probability distribution of W (·) on C[0, 1], then the above
suggests that µn −→d µ. This is indeed true and is known as a functional
central limit theorem. See Billingsley (1968, 1995) for more details.
Theorem 11.4.9: (Functional central limit theorem). Let {Xi }i≥1
$n be iid
random variables with EX1 = 0, EX12 = 1. Let S0 = 0, Sn = i=1 Xi ,
n ≥ 1 and for all n ≥ 1, let Wn (·) be the C[0, 1] random elements obtained
S
by the linear interpolation of Wn ( nj ) ≡ √jn , j = 0, 1, 2, . . . , n as defined in
(4.41). Then
(i) there exists a C[0, 1]-valued random variable W (·) such that
Wn (·) −→d W (·)
in
C[0, 1],
(ii) W (·) is a Gaussian process with zero as its mean function and the
covariance function C(s, t) = s ∧ t, 0 ≤ s, t ≤ 1.
Proof: (An outline). There are three steps.
Step I: The sequence of probability distributions {µn (·)}n≥1 on C[0, 1]
induced by {Wn (·)}n≥1 is tight (as defined in Chapter 9).
Step II: For any random element W̃ (·) of C[0, 1], the family of finite
dimensional joint distributions of {W̃ (t1 ), . . . , W̃ (tk )}, with k ranging over
N and 0 ≤ t1 ≤ t2 ≤ . . . ≤ tk ≤ 1 determines the probability distribution
of W̃ (·) in C[0, 1].
Step III: By Step I, every subsequence {µnj (·)} of {µn (·)} has a further
subsequence {µnjk (·)} that converges to some probability distribution µ
on C[0, 1]. (This is a generalization of the Helly’s selection theorem. It is
known as the Prohorov-Varadarajan theorem.) But then µ has the same
374
11. Central Limit Theorems
finite dimensional distribution as SBM [0,1] and hence, by Step II, the
measure µ is independent of the subsequence. Thus µn −→d µ. For a full
proof, see Billingsley (1968, 1995).
Corollary 11.4.10: For k ∈ N, let T : C[0, 1] → Rk be a continuous
function from the metric space (C[0, 1], ρ) to Rk . Then
"
!
"
!
T Wn (·) −→d T W (·) .
Some examples of such continuous functions on the metric space C[0, 1]
are
T1 (f )
T3 (f )
≡ sup{f (x) : 0 ≤ x ≤ 1},
T2 (f )
≡ inf{f (x) : 0 ≤ x ≤ 1}.
≡ sup{|f (x)| : 0 ≤ x ≤ 1},
and
(4.42)
As an application of Corollary 11.4.10 to the above choices of T yields
Corollary 11.4.11: Let {Xi }i≥1 be iid random variables with EX1 = 0,
EX12 = 1, let
Then,
Mn1
=
Mn2
=
Mn3
=
max Sj ,
0≤j≤n
min Sj ,
0≤j≤n
and
max |Sj |.
0≤j≤n
1
√ (Mn1 , Mn2 , Mn3 ) −→d (M1 , M2 , M3 )
n
where
M1
M2
=
M3
=
=
sup{W (x) : 0 ≤ x ≤ 1}
inf{W (x) : 0 ≤ x ≤ 1}
sup{|W (x)| : 0 ≤ x ≤ 1}
and where W (·) is the Standard Brownian motion on [0, 1], as defined in
Theorem 11.4.9.
Corollary 11.4.11 is useful in statistical inference in obtaining approximations to the sampling distributions of the statistics (Mn1 , Mn2 , Mn3 ) for
large n. The exact distribution of (M1 , M2 , M3 ) can be obtained by using
the reflection principle as discussed in Chapter 15.
11.4.5
Empirical process and Brownian bridge
Let
$n{Ui }i≥1 be iid Uniform [0, 1] random variable. Let Fn (t) =
1
i=1 I(Ui ≤ t), 0 ≤ t ≤ 1 be the empirical cdf of {U1 , U2 , . . . , Un }.
n
11.4 Refinements and extensions of the CLT
375
Clearly, Fn (·) is a step function on [0, 1] with jumps of size n1 at
Un1 < Un2 < · · · < Unn , where (Un1 , Un2 , . . . , Unn ) is the increasing
rearrangement of (U1 , U2 , . . . , Un ), i.e., the n order statistics based on
(U1 , U2 , . . . , Un ).
√
Let Yn (t) be the function obtained by linearly interpolating n(Fn (t)−t),
0 ≤ t ≤ 1 from the values at the jump points (Un1 , Un2 , . . . , Unn ). Then
Yn (·) is a random element of the metric space (C[0, 1], ρ), the space of all
real valued continuous functions on [0, 1] with the supremum metric ρ,
where ρ(f, g) = sup{|f (t) − g(t)| : 0 ≤ t ≤ 1}. Let {B(t) : 0 ≤ t ≤ 1}
be the Standard Brownian motion, i.e., a C[0, 1]-valued random variable
that is also a Gaussian process with mean zero and covariance function
c(s, t) = s ∧ t, 0 ≤ s, t ≤ 1. Let B0 (t) ≡ B(t) − tB(1), 0 ≤ t ≤ 1. Then B0 (·)
is a random element of C[0, 1]. It is also a Gaussian process with mean zero
and covariance function c0 (s, t) = s ∧ t − st − st + st = s ∧ t − st.
Theorem 11.4.12: Yn (·) −→d B0 (·) in C[0, 1].
The proof is similar to that of Theorem 11.4.9. See Billingsley (1968,
1995) for details.
Definition 11.4.1: The process {B0 (t) : 0 ≤ t ≤ 1} is called the Brownian
bridge and the process {Yn (t) : 0 ≤ t ≤ 1} is called the empirical process
based on {U1 , U2 , . . . , Un }.
Now recall that by the Glivenko-Cantelli theorem (cf. Chapter 8),
sup{|Fn (t) − t| : 0 ≤ t ≤ 1} → 0.
Since T (f ) ≡ sup{|f (x)| : 0 ≤ x ≤ 1} is a continuous map form C[0, 1] to
R+ , by Theorem 11.4.12 this leads to
√
n sup{|Fn (t) − t| : 0 ≤ t ≤ 1} −→d sup{|B0 (t)| :
Corollary 11.4.13:
0 ≤ t ≤ 1}.
This, in turn, can be used to find the asymptotic distribution of the
Kolmogorov-Smirnov statistic (see (4.43) below).
variables with a continuous distribution
Let {Xi }i≥1 be iid random $
n
function F (·). Let Fn (x) ≡ n1 i=1 I(Xi ≤ x) be the empirical distribution based on (X1 , X2 , . . . , Xn ). Let Ui = F −1 (Xi ), i = 1, 2, . . . , n where
F −1 (t) = inf{x : F (x) ≥ t}, 0 ≤ t ≤ 1. Then {Ui }i≥1 are iid Uniform (0,1)
random variables. It can be shown that the Kolmogorov-Smirnov statistic
KS(Fn ) ≡
√
n sup{|Fn (x) − F (x)| : 0 ≤ x ≤ 1}
has the same distribution as sup{|Yn (t)| : 0 ≤ t ≤ 1} and hence
KS(Fn ) −→d M0 ≡ sup{|B0 (t)| : 0 ≤ t ≤ 1}.
(4.43)
376
11. Central Limit Theorems
This can be used to test the hypothesis that F (·) is the cdf of {Xi }i≥1 by
rejecting it if KS(Fn ) is large. To decide on what is large, the distribution
of KS(Fn ) can be approximated by that of M0 . Thus, if the significance
level is α, 0 < α < 1, then one determines a value C0 such that P (M0 >
C0 ) = α and rejects the hypothesis that F is the distribution of {Xi }i≥1 if
KS(Fn ) > C0 and accepts it, otherwise.
11.5 Problems
11.1 Show that the triangular array {Xnj : 1 ≤ j ≤ n}n≥1 , with Xnj as
in (1.24), is a null array, i.e., satisfies (1.15) iff (1.22) holds.
11.2 Construct an example of a triangular array {Xnj : 1 ≤ j ≤ rn }n≥1 of
independent random variables such that for any 1 ≤ j ≤ rn , n ≥ 1,
E|Xnj |α = ∞ for all α ∈ (0, ∞), but there exist sequences {an }n≥1 ⊂
(0, ∞) and {bn }n≥1 ⊂ R such that
Sn − bn
−→d N (0, 1).
an
11.3 Let {Xn }n≥1 be a sequence of independent random variables with
P (Xn = ±1) =
1
1
1
− √ , P (Xn = ±n2 ) = √ , n ≥ 1.
2 2 n
2 n
Find constants {an }n≥1 ⊂ (0, ∞) and {bn }n≥1 ⊂ R such that
$n
j=1 Xj − bn
−→d N (0, 1).
an
11.4 Let {Xn }n≥1 be a sequence of independent random variables such
that for some α ≥ 12 ,
P (Xn = ±nα ) =
Let Sn =
$n
j=1
n1−2α
2
and P (Xn = 0) = 1 − n1−2α , n ≥ 1.
Xj and s2n = Var(Sn ).
(a) Show that for all α ∈ [ 12 , 1),
Sn
−→d N (0, 1).
sn
(b) Show that (5.1) fails for α ∈ [1, ∞).
(5.1)
(c) Show that for α > 1, {Sn }n≥1 converges to a random variable
S w.p. 1 and that sn → ∞.
11.5 Problems
377
11.5 Let {Xnj : 1 ≤ j ≤ rn }n≥1 be a triangular array of independent zero
mean random variables satisfying the Lindeberg condition. Show that
1
2 2
Xnj
→ 0 as n → ∞,
E max
1≤j≤rn s2
n
$rn
where s2n = j=1
Var(Xnj ), n ≥ 1.
$n
11.6 Let {Xn }n≥1 be a sequence of random variables. Let Sn = j=1 Xj
$n
and s2n = j=1 EXj2 < ∞, n ≥ 1. If s2n → ∞ then show that
lim
n→∞
⇐⇒
s−2
n
lim s−2
n
n→∞
n
)
j=1
n
)
j=1
EXj2 I(|Xj | > #sn ) = 0
for all # > 0
EXj2 I(|Xj | > #sj ) = 0
for all # > 0.
(Hint: Verify that for any δ > 0,
$
j:sj <δsn
EXj2 ≤ δ 2 s2n .)
11.7 For a sequence of random variables {Xn }n≥1 and for a real number
r > 2, show that
lim s−r
n
n→∞
⇐⇒
lim s−r
n
n→∞
where s2n =
n
)
j=1
n
)
j=1
$n
j=1
E|Xj |r I(|Xj | > #sn ) = 0
E|Xj |r = 0,
for all # ∈ (0, ∞),
(5.2)
EXj2 .
11.8 Let {Xn }n≥1 be a sequence of zero mean independent random variables satisfying (5.2) for r = 4.
(a) Show that
k
k
lim E(s−1
n Sn ) = EZ
n→∞
for all k = 2, 3, 4, where Z ∼ N (0, 1).
:! "4 ;
(b) Show that Ssnn
is uniformly integrable.
n≥1
! Sn "
(c) Show that lim Eh sn = Eh(Z) where h(·) : R → R is continn→∞
uous and h(x) = O(|x|4 ) as |x| → ∞.
11.9 Let {Xn }n≥1 be a sequence of independent random variables such
that
1
1
P (Xn = ±1) = , P (Xn = ±n) = 2
4
4n
and
1
1
P (Xn = 0) = (1 − 2 ), n ≥ 1.
2
n
378
11. Central Limit Theorems
(a) Show that √
the triangular array {Xnj : 1 ≤ j ≤ n}n≥1 with
Xnj = Xj / n, 1 ≤ j ≤ n, n ≥ 1 does not satisfy the Lindeberg
condition.
(b) Show that there exists σ ∈ (0, ∞) such that
S
√n −→d N (0, σ 2 ).
n
Find σ 2 .
11.10 Let {Xj }j≥1 be independent random variables such that Xj has Uniform [−j, j] distribution. Show that the Lindeberg-Feller condition
holds for the triangular array Xnj = Xj /n3/2 , 1 ≤ j ≤ n, n ≥ 1.
11.11 (CLT for random sums). Let {Xi }i≥1 be iid random variables with
EX1 = 0, EX12 = 1. Let {Nn }n≥1 be a sequence of positive integer
valued random variables such that Nnn −→p c, 0 < c < ∞. Show that
S
√Nn −→d N (0, 1).
N
n
(Hint:
Use Kolmogorov’s first (inequality (cf. 8.3.1) to show that
'
|SNn −Snc |
√
> λ, |Nn − nc| < n# ≤ λ$2 for any # > 0, λ > 0.)
P
n
11.12 Let {N (t) : t ≤ 0} be the renewal process as defined in (5.1) of Section
8.5. Assume EX1 = µ ∈ (0, ∞), EX12 < ∞. Show that
N (t) − t/µ
√
−→d N (0, σ 2 )
t
(5.3)
for some 0 < σ 2 < ∞. Find σ 2 .
(Hint: Use
SN (t)+1 − (N (t) + 1)µ
SN (t) − N (t)µ
t − N (t)µ
µ
L
L
≤ L
≤
+L
N (t)
N (t)
N (t)
N (t)
and the fact that
N (t)
t
→
1
µ
w.p. 1.)
11.13 Let {N (t) : t ≥ 0} be as in the above problem. Give another proof of
(5.3) by using the CLT for {Sn }n≥1 and the relation P (N (t) > n) =
P (Sn < t) for all t, n.
11.14 Let {Xj }j≥1 be iid random variables with distribution P (X1 = 1) =
1/2 = P (X1 = −1). Show that there exist positive integer valued
S
random variables {rk }k≥1 such that rk → ∞ w.p. 1, but √rrkk does
not converge in distribution.
(Hint: Let r1 = min{n :
rk+1 = min{n : n >
Sn
rk , √
n
Sn
√
n
> 1} and for k ≥ 1, define recursively
> k + 1}.)
11.5 Problems
379
11.15 (CLT for sample quantiles). Let {Xi }i≥1 be iid random variables.
−1
Let 0 < p < 1 and
$n let Yn ≡ Fn (p) = inf{x : Fn (x) ≥ p},
where Fn (x) ≡ n1 i=1 I(Xi ≤ x) is the empirical cdf based on
X1 , X2 , . . . , Xn . Assume that the cdf F (x) ≡ P (X1 ≤ !x) is differen"
$
−1
F
:
F
(x)
≥
p}
and
that
λ
≡
F
(p)
> 0.
tiable at F −1 (p) ≡ inf{x
p
"
√ !
−1
d
2
2
Then show that n Yn − F (p) −→ N (0, σ ), where σ =
p(1 − p)/λ2p .
(Hint: Use the identity P (Yn ≤ x) = P (Fn (x) ≥ p) for all x and p.)
11.16 (A coupon collector’s problem). For each n ∈ N, let {Xni }i≥1 be
iid random variables such that P (Xn1 = j) = n1 , 1 ≤ j ≤ n. :Let
)= X1 }, and Tn(i+1) = min j :
Tn0 = 1, Tn1 = min{j : j > 1, Xj ;
/ {XTnk : 0 ≤ k ≤ i} . That is, Tni is the first time
j > Tni , Xj ∈
the sample has (i + 1) distinct elements. Suppose kn ↑ ∞ such that
kn
n → θ, 0 < θ < 1. Show that for some an , bn
Tn,kn − an
−→d N (0, 1).
bn
(Hint: Let Ynj = Tnj − Tn(j−1) , j = 1, 2, . . . , (n − 1). Show that for
with Ynj having a geometric
each n, {Ynj }j=1,2,... are independent
!
"
distribution with parameter 1− nj . Now apply Lyapounov’s criterion
to the triangular array {Ynj : 1 ≤ j ≤ kn }.)
11.17 Prove Theorem 11.1.6.
11.18 Let {Xn }n≥1 be a sequence of iid random
variables with EXn = 0
$n
and EXn2 = σ 2 ∈ (0, ∞). Let Sn = j=1 Xj , n ≥ 1. For each k ∈ N,
find the limit distribution of the k-dimensional vector(s).
'
(
S −S
Sn S2n
√−Sn , . . . , nk √n(k−1) ,
(a) √
,
n
n
n
'
(
Snak
Sna1 Sna2
√ , √ ,..., √
(b)
, where 0 < a1 < a2 < · · · < ak < ∞ are
n
n
n
given real numbers,
(
'
S
−S
S3n
√−Sn , . . . , (k+1)n√ (k−1)n .
(c) S√2n
,
n
n
n
11.19 For any random variable X, show that EX 2 < ∞ implies
y 2 P (|x| > y)
→0
E(X 2 I(|x| ≤ y))
as y → ∞. Give an example to show that the converse is false.
(Hint: Consider a random variable X with pdf f (x) = c1 |x|−3 for
|x| > 2.)
380
11. Central Limit Theorems
11.20 Let {Xn }n≥1 be a sequence of iid random variables with common
distribution
%
P (X1 ∈ A) =
|x|−3 I(|x| > 1)dx, A ∈ B(R).
A
Find sequences {an }n≥1 ⊂ (0, ∞) and {bn }n≥1 ⊂ R such that
Sn − bn
−→d N (0, 1)
an
where Sn =
$n
j=1
Xj , n ≥ 1.
11.21 Show using characteristic functions that if $
X1 , X2 , . . . , Xk are iid
k
Cauchy (µ, σ 2 ) random variables, then Sk ≡ i=1 Xi has a Cauchy
(kµ, kσ) distribution.
11.22 Show that if a random variable Y1 has pdf f as in (2.4), then (2.9)
holds with α = 1/2.
$n
−1
−1
11.23 If {Yn }n≥1 are iid Gamma
−→d W , where
i=1 Yi
" then n
! 2 1 (1,2),
W has pdf fW (w) ≡ π 1+w2 · I(0,∞) (w).
11.24 Let X be a nonnegative random variable such that P (X ≤ x) ∼
xα L(x) as x ↓ 0 for some α > 0 and L(·) slowly varying at 0. Let
Y = X −β , β > 0. Show that
P (Y > y) ∼ y −γ L̃(y)
as
y↑∞
for some γ > 0 and L̃(·) slowly varying at ∞.
11.25 Let {Xi }i≥1 be iid Beta (m, n) random variables. Let Yi = Xi−β ,
β > 0, i ≥"1. Show that there exist sequences {an }n≥1 and {bn }n≥1
n
Y −a
such that i=1bni n −→d a stable law of order γ for some γ in (0, 2].
11.26 Let {Xi }i≥1 be iid Uniform [0, 1] random variables.
(a) Show that for each 0 < β < 12 , there exist constants µ and σ 2
such that
.
-)
n
1
−β
√
X − nµ −→d N (0, 1).
σ n i=1 i
(b) Show that for each 12 < β < 1, there exist a constant 0 < γ < 2
and sequences {an }n≥1 and {bn }n≥1 such that
1
bn
-)
n
i=1
Xi−β
− an
.
−→d
a stable law of order
γ.
11.5 Problems
381
11.27 Prove (4.7).
(Hint: Use (4.5) and Lemma 10.1.5.)
11.28 (a) Show that the Gamma (α, β) distribution is infinitely divisible,
0 < α, β < ∞.
!
"
(b) Let :µ& be a finite measure
on R, β(R) . Show that φ(t) ≡
;
exp
(eιtu − 1)µ(du) is the characteristic function of an infinitely divisible distribution.
11.29 Let {Xn }n≥1 be iid random variables with P (X1 = 0) = P (X1 =
1) = 12 . Show that there exists a constant C1 ∈ (0, ∞) such that
∆n ≥ C1 n−1/2
for all n ≥ 1,
where ∆n is as in (4.1).
11.30 Let X1 be a random variable such that the absolutely continuous
component βFac (·) in the decomposition (4.5.3) of the cdf F of X1 is
nonzero. Show that X1 satisfies Cramer’s condition (4.13).
(Hint: Use the Riemann-Lebesgue lemma.)
11.31 (Berry-Esseen theorem for sample quantile). Let {Xn }n≥1 be a collection of iid random variables with
$ncdf F (·). Let 0 < p < 1 and
Yn = Fn−1 (p), where Fn (x) = n−1 i=1 I(X1 ≤ x), x ∈ R. Suppose
that F (·) is twice differentiable in a neighborhood of ξp ≡ F −1 (p)
and F $ (ξp ) ∈ (0, ∞). Show that
J '√
J
(
J
J
n(Yn − ξp )/σp ≤ x − Φ(x)J = O(n−1/2 ) as n → ∞
sup JP
x∈R
!
"2
where σp = p(1 − p)/ F $ (ξp ) .
(Hint: Use the identity P (Yn ≤ x) =√P (Fn (x) ≥ p) for all x and
p, apply Theorem 11.4.1 √
to Fn (x) for n|x − ξp | ≤ log n, and use
monotonicity of cdfs for n|x − ξp | > log n. See Lahiri (1992) for
more details. Also, see Reiss (1974) for a different proof.)
11.32 (A moderate deviation bound). Let {Xn }n≥1 be a sequence of iid
random variables with EX1 = µ, Var(X) = σ 2 ∈ (0, ∞) and
E|X1 |3 < ∞. Show that
(
'√ J
L
J
nJX̄n − µJ > σ log n = O(n−1/2 ) as n → ∞.
P
(Hint: Apply Theorem 11.4.1.)
It can be shown that the bound on the right side is indeed o(n−1/2 )
as n → ∞. For a more general version of this result, see Götze and
Hipp (1978).
382
11. Central Limit Theorems
11.33 Show that the values of the functions en,2 (x) of (4.14) and ẽn,2 (x) of
(4.24) are not necessarily nonnegative for all x ∈ R.
11.34 Complete the proof of Lemma 11.4.7 (i).
(Hint: Suppose that for some r ∈ N, φ is r-times differentiable with
its rth derivative given by (4.30). Then, for t ∈ (0, ∞),
B
C
h−1 φ(r) (t + h) − φ(r) (t) =
%
∞
−∞
ehx − 1 r tx
· x e F (dx)
h
and the integrand is bounded by the integrable function |x|r g(x) for
all x ∈ R, 0 < |h| < t/2, where g(·) is as in (4.32). Now apply the
DCT.)
11.35 Under the conditions of Lemma 11.4.7, show that
φ(1) (t)
= θ.
t→∞ φ(t)
lim
(Hint: Consider the cases ‘θ ∈ R’ and ‘θ = ∞’ separately.)
11.36 Find the function γ(x) of (4.28) in each of the following cases:
(a) X1 ∼ N (µ, σ 2 ),
(b) X1 ∼ Gamma (α, β),
(c) X1 ∼ Uniform (0, 1).
11.37 Verify that the functions Ti , i = 1, 2, 3 defined by (4.42) are continuous on C[0, 1].
12
Conditional Expectation and
Conditional Probability
12.1 Conditional expectation: Definitions and
examples
This section motivates the definition of conditional expectation for random
variables with a finite variance through a mean square error prediction
problem. The definition is then extended to integrable random variables
by an approximation argument (cf. Definition 12.1.3). The more standard
approach of proving the existence of conditional expectation by the use of
Radon-Nikodym theorem is also outlined.
Let (X, Y ) be a bivariate random vector. A standard problem in regression analysis is to predict Y having observed X. That is, to find a function
h(X) that predicts Y . A common criterion for measuring the accuracy of
such a predictor is the mean squared error E(Y − h(X))2 . Under the assumption that E|Y |2 < ∞, it can be shown that there exists a unique
h0 (X) that minimizes the mean squared error.
Theorem 12.1.1: Let (X, Y ) be a bivariate random vector. Let EY 2 <
∞. Then there exists a Borel measurable function h0 : R → R with
!
"2
E h0 (X) < ∞, such that
!
;
"2
: !
"2
E Y − h0 (X) = inf E Y − h(X) : h(X) ∈ H0 ,
(1.1)
:
;
where H0 = h(X) | h : R → R is Borel measurable and E(h(X))2 < ∞ .
384
12. Conditional Expectation and Conditional Probability
Proof: Let H be the space of all Borel measurable functions of (X, Y )
that have a finite second moment. Let H0 be the subspace of all Borel
measurable functions of X that have a finite second moment. It is known
that H0 is a closed subspace of H (Problem 12.1) and that for any Z in H,
there exists a unique Z0 in H0 such that
;
:
E(Z − Z0 )2 = min E(Z − Z1 )2 : Z1 ∈ H0 .
Further, Z0 is the unique random variable (up to equivalence w.p. 1) such
that
(1.2)
E(Z − Z0 )Z1 = 0 for all Z1 ∈ H0 .
A proof of this fact is given at the end of this section in Theorem 12.1.6.
If one takes Z to be Y , then this Z0 is the desired h0 (X).
!
Remark 12.1.1: The random variable Z0 in (1.2) is known as the projection of Y onto H0 .
It is known that for any random variable Y with EY 2 < ∞, the constant
c that minimizes E(Y − c)2 over all c ∈ R is c = EY , the expected value
of Y . By analogy with this, one is led to the following definition.
Definition 12.1.1: For any bivariate random vector (X, Y ) with EY 2 <
∞, the conditional expectation of Y given X, denoted as E(Y |X), is the
function h0 (X) of Theorem 12.1.1. Note that h0 (X) is determined up to
equivalence w.p. 1. Any such h0 (X) is called a version of E(Y |X).
From (1.2) in the proof of Theorem 12.1.1, by taking Z = Y , Z1 = IB (X),
one finds that Z0 = h0 (X) satisfies
EY IA = Eh0 (X)IA
(1.3)
for every event A of the form X (B) where B ∈ B(R). Conversely, it can
be shown that (1.3) implies (1.2), by the usual approximation procedure
(Problem 12.1). From (1.3), the function h0 (X) is determined w.p. 1. So
one can take (1.3) to be the definition of h0 (X). In statistics, the function
E(Y |X) is called the regression of Y on X.
The function h0 (x) can be determined explicitly in the following two
special cases.
If X is a discrete random variable with values x1 , x2 , . . ., then (1.3) implies, by taking A = {X = xi }, that
!
"
E Y I(X = xi )
, i = 1, 2, . . . .
(1.4)
h0 (xi ) =
P (X = xi )
−1
Similarly, if (X, Y ) has an absolutely continuous distribution with joint
probability density f (x, y), it can be shown that w.p. 1, E(Y |X) = h0 (X),
where
.
-&
yf (x, y)dy
h0 (x) =
(1.5)
fX (x)
12.1 Conditional expectation: Definitions and examples
385
&
if fX (x) > 0 and 0 otherwise, where fX (x) = f (x, y)dy is the probability
density function of X (Problem 12.2).
The definition of E(Y |X) can be generalized to the case when X is a
vector and more generally, as follows.
Theorem 12.1.2: Let (Ω, F, P ) be a probability space and G ⊂ F be a σalgebra. Let H ≡ L2 (Ω, F, P ) and H0 = L2 (Ω, G, P ). Then for any Y ∈ H,
there exist a Z0 ∈ H0 such that
E(Y − Z0 )2 = inf{E(Y − Z)2 : Z ∈ H0 }
(1.6)
and this Z0 is determined w.p. 1 by the condition
E(Y IA ) = E(Z0 IA )
for all
A ∈ G.
(1.7)
The proof is similar to that of Theorem 12.1.1.
Definition 12.1.2: The random variable Z0 in (1.7) is called the conditional expectation of Y given G and is written as E(Y |G).
When G = σ1X2, the σ-algebra generated by a random variable X,
E(Y |G) reduces to E(Y |X) in Definition 12.1.1. The following properties
of E(Y |G) are easy to verify by using the defining equation (1.7) (Problem
12.3).
Proposition 12.1.3: Let Y and G be as in Theorem 12.1.2.
(i) Y ≥ 0 w.p. 1 ⇒ E(Y |G) ≥ 0 w.p. 1
!
"
(ii) Y1 , Y2 ∈ H ⇒ E (αY1 + βY2 )|G = αE(Y1 |G) + βE(Y2 |G) for any
α, β ∈ R.
(iii) Y1 ≥ Y2 w.p. 1 ⇒ E(Y1 |G) ≥ E(Y2 |G) w.p. 1.
Using a natural approximation procedure it is possible to extend the
definition of E(Y |G) to all random variables with just the first moment,
i.e., E|Y | < ∞. This is done in the following result.
Theorem 12.1.4: Let (Ω, F, P ) be a probability space and G ⊂ F be a
sub-σ-algebra. Let Y : Ω → R be a F-measurable random variable with
E|Y | < ∞. Then there exists a random variable Z0 : Ω → R that is Gmeasurable, E|Z0 | < ∞, and is uniquely determined (up to equivalence
w.p. 1) by
E(Y IA ) = E(Z0 IA ) for all A ∈ G.
(1.8)
Proof: Since Y can be written as Y = Y + − Y − , it is enough to consider
the case Y ≥ 0 w.p. 1. Let Yn = min{Y, n} for n = 1, 2, . . .. Then EYn2 < ∞
386
12. Conditional Expectation and Conditional Probability
and by Theorem 12.1.2, Zn ≡ E(Yn |G) is well defined, it is G-measurable,
and satisfies
E(Yn IA ) = E(Zn IA ) for all A ∈ G.
(1.9)
Since 0 ≤ Yn ≤ Yn+1 , by Proposition 12.1.3, 0 ≤ Zn ≤ Zn+1 w.p. 1 and
so there exists a set B ∈ G such that P (B) = 0 and on B c , {Zn }n≥1 is
nondecreasing and nonnegative. Let Z0 = limn→∞ Zn on B c and 0 on B.
Then Z0 is G-measurable. Applying the MCT to both sides of (1.9), one
gets
E(Y IA ) = E(Z0 IA ) for all A ∈ G.
This proves the existence of a G-measurable Z0 satisfying (1.8). The
uniqueness follows from the fact that if Z0 and Z0$ are G-measurable with
E|Z0 | < ∞, E|Z0$ | < ∞ and
EZ0 IA = EZ0$ IA
for all A ∈ G,
then Z0 = Z0$ w.p. 1 (Problem 12.3).
!
Remark 12.1.2: An alternative to the proof of Theorem 12.1.4 above
leading to the definition of E(Y |G) is via the Radon-Nikodym theorem.
Here is an outline of this proof. Let Y be a nonnegative random variable
with E|Y | < ∞. Set µ(A) ≡ E(Y IA ) for all A ∈ G. Then µ is a measure
on (Ω, G) and it is dominated by PG , the restriction of P to G. By the
Radon-Nikodym theorem, there is a G-measurable function Z such that
%
E(Y IA ) = µY (A) =
ZdPG = EZIA .
A
Extension to the case when Y is real-valued with E|Y | < ∞, is via the
decomposition Y = Y + − Y − .
Remark 12.1.3: The arguments in the proof of Theorem 12.1.4 (and
Problem 12.3) show that the conclusion of the theorem holds for any nonnegative random variable Y for which EY may or may not be finite.
Definition 12.1.3: Let Y be a F-measurable random variable on a probability space (Ω, F, P ) such that either Y is nonnegative or E|Y | < ∞. A
random variable Z0 that is G-measurable and satisfies (1.8) is called the
conditional expectation of Y given G and is written as E(Y |G).
The following are some important consequences of (1.8):
(i) If Y is G-measurable then E(Y |G) = Y .
(ii) If G = F, then E(Y |G) = Y .
(iii) If G = {∅, Ω}, then E(Y |G) = EY .
12.1 Conditional expectation: Definitions and examples
387
(iv) By taking A to be Ω in (1.8),
!
"
EY = E E(Y |G) .
Furthermore, Proposition 12.1.3 extends to this case. When G = σ1X2 with
X discrete, (1.4) holds provided E|Y | < ∞.
Part (iv) is useful in computing EY without explicitly determining the
distribution of Y . Suppose E(Y |X) = m(X) and Em(X) is easy to compute
but finding the distribution of Y is not so easy. Then EY can still be
computed as Em(X). For example, let (X, Y ) have a bivariate distribution
with pdf

2
(x−1)2
 √1 √ 1 e− (y−x)
2x2
e− 2
if x )= 0,
2π
2π|x|
fX,Y (x, y) =

0
if x = 0,
x, y ∈ R2 . In this case, evaluating fY (y) is not easy. On the other hand, it
can be verified that for each x,
%
fX,Y (x, y)dy
= x,
m(x) ≡ y
fX (x)
(x−1)2
and that fX (x) = √12π e− 2 . Thus, EY = EX = 1. For more examples
of this type, see Problem 12.29.
The next proposition lists some useful properties of the conditional expectation.
Proposition 12.1.5: Let (Ω, F, P ) be a probability space and let Y be a
F-measurable random variable with E|Y | < ∞. Let G1 ⊂ G2 ⊂ F be two
sub-σ-algebras contained in F.
(i) Then
"
!
E(Y |G1 ) = E E(Y |G2 )|G1 .
(1.10)
(ii) For any bounded G1 -measurable random variable U ,
E(Y U |G1 ) = U E(Y |G1 ).
(1.11)
Proof: (i) Let A ∈ G1 , Z1 = E(Y |G1 ), and Z2 = E(Y |G2 ). Then E(Y IA ) =
E(Z1 IA ) by the definition of Z1 . Since G1 ⊂ G2 , A ∈ G2 and again by the
definition of Z2 ,
E(Y IA ) = E(Z2 IA ).
Thus,
E(Z2 IA ) = E(Z1 IA )
for all A ∈ G1
and by the definition of E(Z2 |G1 ), it follows that Z1 = E(Z2 |G1 ), proving
(i).
388
12. Conditional Expectation and Conditional Probability
(ii) Let Z1 = E(Y |G1 ). If U = IB some B ∈ G1 , then for any A ∈ G1 ,
A ∩ B ∈ G1 and by (1.8),
EY IB IA = EY IA∩B = E(Z1 IA∩B ) = E(Z1 IB · IA ).
So in this case E(Y U |G1 ) = Z1 U . By linearity (Proposition 12.1.3 (ii)),
it extends to all U that are simple and G1 -measurable. For any bounded
G1 -measurable U , there exists a sequence of bounded, G1 -measurable, and
simple random variables {Un }n≥1 that converge to U uniformly. Hence, for
any A ∈ G1 and for n ≥ 1,
EY Un IA = EZ1 Un IA .
The bounded convergence theorem applied to both sides yields
EY U IA = EZ1 U IA .
Since Z1 and U are both G1 -measurable, (ii) follows.
!
Remark 12.1.4: If the random variable U in Proposition 12.1.5 is G1 measurable and E|Y U | < ∞, then part (ii) of the proposition holds. The
proof needs a more careful approximation (see Billingsley (1995), pp. 447).
An Approximation Theorem
Theorem 12.1.6: Let H be a real Hilbert space and M be a nonempty
closed convex subset of H. Then for every v ∈ H, there is a unique u0 ∈ M
such that
(1.12)
7v − u0 7 = inf{7v − u7 : u ∈ M}
where 7x72 = 1x, x2, with 1x, y2 denoting the inner-product in H.
Proof: Let δ = inf{7v − u7 : u ∈ M}. Then, δ ∈ [0, ∞). By definition,
there exists a sequence {un }n≥1 ⊂ M such that
7v − un 7 → δ.
Also note that in any inner-product space, the parallelogram law holds,
i.e., for any x, y ∈ H,
7x + y72 + 7x − y72 = 2(7x72 + 7y72 ).
Thus
72v − (un + um )72 + 7un − um 72
= 2(7v − un 72 + 7v − um 72 ).
(1.13)
M2
M
m
mM
∈ M implying that Mv − un +u
≥ δ 2 . This,
Since M is convex, un +u
2
2
with (1.13), implies that
lim sup 7un − um 72 = 0,
m,n→∞
12.2 Convergence theorems
389
making {un }n≥1 a Cauchy sequence. Since H is a Hilbert space, there
exists a u0 ∈ H such that {un }n≥1 converges to u0 . Also, since M is closed,
u0 ∈ M. Since 7v − un 7 → δ, it follows that 7v − u0 7 = δ.
To show the uniqueness, let u$0 ∈ M also satisfies 7v − u$0 7 = δ. Then as
in (1.13),
M
M
M
u0 + u$0 M
M
M2 M u0 − u$0 M2
Mv −
M +M
M = δ2 ,
2
2
!
implying 7u0 − u$0 7 = 0.
Remark 12.1.5: The above theorem holds if M is a closed subspace of H.
12.2 Convergence theorems
From Proposition 12.1.3, it is seen that E(Y |G) is monotone and linear
in Y , suggesting that it behaves like an ordinary expectation. A natural
question is whether under appropriate conditions, the basic convergence
results extend to conditional expectations (CE). The answer is ‘yes,’ as
shown by the following results.
Theorem 12.2.1: (Monotone convergence theorem for CE). Let (Ω, F, P )
be a probability space and G ⊂ F be a sub-σ-algebra of F. Let {Yn }n≥1 be
a sequence of nonnegative F-measurable random variables such that 0 ≤
Yn ≤ Yn+1 w.p. 1. Let Y ≡ lim Yn w.p. 1. Then
n→∞
lim E(Yn |G) = E(Y |G) w.p. 1.
n→∞
(2.1)
Proof: By Proposition 12.1.3 (i), Zn ≡ E(Yn |G) is monotone nondecreasing in n, w.p. 1, and so Z ≡ limn→∞ Zn exists w.p. 1. By the MCT, for all
A ∈ G,
E(Y IA ) = lim EYn IA = lim E(Zn IA ) = E(ZIA ).
n→∞
n→∞
Thus, Z = E(Y |G) w.p. 1, proving (2.1).
!
Theorem 12.2.2: (Fatou’s lemma for CE). Let {Yn }n≥1 be a sequence
of nonnegative random variables on a probability space (Ω, F, P ) and let G
be a sub-σ-algebra of F. Then
lim inf E(Yn |G) ≥ E(lim inf Yn |G).
n→∞
n→∞
(2.2)
Proof: Let Ỹn = inf j≥n Yj . Then {Ỹn }n≥1 is a sequence of nonnegative
nondecreasing random variables and limn→∞ Ỹn = lim inf n→∞ Yn . By the
previous theorem,
lim E(Ỹn |G) = E(lim inf Yn |G).
n→∞
n→∞
(2.3)
390
12. Conditional Expectation and Conditional Probability
Also, since Ỹn ≤ Yj for each j ≥ n,
E(Ỹn |G) ≤ E(Yj |G)
for each
j ≥ n w.p. 1
implying that E(Ỹn |G) ≤ inf j≥n E(Yj |G) w.p. 1. The right side converges
!
to lim inf n→∞ E(Yn |G) w.p. 1. Now (2.2) follows from (2.3).
It is easy to deduce from Fatou’s lemma the following result (Problem
12.4).
Theorem 12.2.3: (Dominated convergence theorem for CE). Let {Yn }n≥1
and Y be random variables on a probability space (Ω, F, P ) and let G be
a sub-σ-algebra of F . Suppose that limn→∞ Yn = Y w.p. 1 and that there
exists a random variable Z such that |Yn | ≤ Z w.p. 1 and EZ < ∞. Then
lim E(Yn |G) = E(Y |G)
n→∞
w.p. 1.
(2.4)
Theorem 12.2.4: (Jensen’s inequality for CE). Let φ : (a, b) → R be
convex for some −∞ ≤ a < b ≤ ∞. Let Y be a random variable on a
probability space (Ω, F, P ) such that P (Y ∈ (a, b)) = 1 and E|φ(Y )| < ∞.
Let G be a sub-σ-algebra of F. Then
!
"
φ E(Y |G) ≤ E(φ(Y )|G).
(2.5)
Proof: By the convexity of φ on (a, b), for any c, x ∈ (a, b),
φ(x) − φ(c) − (x − c)φ$− (c) ≥ 0,
(2.6)
where φ$− (c) is the left derivative of φ at c. Taking c = E(Y |G) and x = Y
in (2.6), one gets
Z ≡ φ(Y ) − φ(E(Y |G)) − (Y − E(Y |G))φ$− (E(Y |G)) ≥ 0.
!
"
!
"
Since E φ(E(Y |G))|G = φ E(Y |G) , by (1.11),
B:!
" !
";J C
E
Y − E(Y |G) φ$− E(Y |G) JG
B!
" C
= φ$− (E(Y |G))E Y − E(Y |G) |G = 0.
Also, from (2.7), E(Z|G) ≥ 0 and hence,
!
"
E φ(Y )|G ≥ φ(E(Y |G)).
(2.7)
!
The following inequalities are a direct consequence of Theorem 12.2.4.
Corollary 12.2.5: Let Y be a random variable on a probability space
(Ω, F, P ) and let G be a sub-σ-algebra of F.
12.2 Convergence theorems
391
!
"2
(i) If EY 2 < ∞, then E(Y 2 |G) ≥ E(Y |G) .
(ii) If E|Y |p < ∞ for some p ∈ [1, ∞), then E(|Y |p |G) ≥ |(EY |G)|p .
Definition 12.2.1: Let EY 2 < ∞. The conditional variance of Y given
G, denoted by Var(Y |G), is defined as
Var(Y |G) = E(Y 2 |G) − (E(Y |G))2 .
(2.8)
This leads to the following formula for a decomposition of the variance
of Y , known as the Analysis of Variance formula.
Theorem 12.2.6: Let EY 2 < ∞. Then
!
"
Var(Y ) = Var(E(Y |G)) + E Var(Y |G) .
(2.9)
Proof: Var(Y ) = E(Y −EY )2 . But Y −EY = Y −E(Y |G)+E(Y |G)−EY .
Also by (1.11),
!B
CB
CJ "
E Y − E(Y |G) E(Y |G) − EY JG
B
C !B
CJ "
= E(Y |G) − EY E Y − E(Y |G) JG
B
C
= E(Y |G) − EY 0 = 0.
!B
CB
C"
Thus, E Y − E(Y |G) E(Y |G) − EY = 0 and so
!
"2
!
"2
E(Y − EY )2 = E Y − E(Y |G) + E E(Y |G) − EY .
(2.10)
B
C
B :
;C
B
C2
Now, noting that E Y E(Y |G) = E E Y (EY |G)|G = E E(Y |G) , one
gets
B
C
!
"2
E(Y − E(Y |G))2 = EY 2 − 2E Y E(Y |G) + E E(Y |G)
B!
"2 C
= EY 2 − E E(Y |G)
C
B!
"2 C
B
= E E(Y 2 |G) − E E(Y |G)
!
"
= E Var(Y |G) .
B
C
Also, since E E(Y |G) = EY ,
!
"2
!
"
E E(Y |G) − EY = Var E(Y |G) .
Hence, (2.9) follows from (2.10).
!
!
"
Remark
12.2.1:
E Var(Y |G) is called the variance within and
!
"
Var E(Y |G) is the variance between. The above proof also shows that
!
"2
!
"2
E(Y − Z)2 = E Y − E(Y |G) + E E(Y |G) − Z
(2.11)
for any random variable Z that is G-measurable. This is used to prove the
Rao-Blackwell theorem in mathematical statistics (Lehmann and Casella
(1998)) (Problem 12.27).
392
12. Conditional Expectation and Conditional Probability
12.3 Conditional probability
Let (Ω, F, P ) be a probability space and let G ⊂ F be a sub-σ-algebra.
Definition 12.3.1: For B ∈ F, the conditional probability of B given G,
denoted by P (B|G), is defined as
(3.1)
P (B|G) = E(IB |G).
Thus Z ≡ P (B|G) is a G-measurable function such that
P (A ∩ B) = E(ZIA )
for all A ∈ G.
(3.2)
Since 0 ≤ P (A∩B) ≤ P (A) for all A ∈ G, it follows that 0 ≤ P (B|G) ≤ 1
w.p. 1. It is easy to check that w.p. 1
P (Ω|G) = 1
and P (∅|G) = 0.
Also, by linearity, if B1 , B2 ∈ F, B1 ∩ B2 = ∅, then
P (B1 ∪ B2 |G) = P (B1 |G) + P (B2 |G)
w.p. 1.
This suggests that w.p. 1, P (B|G) is countably additive as a set function
in B. That is, there exists a set A0 ∈ G such that P (A0 ) = 0 and for all
ω )∈ A0 , the map B → P (B|G)(ω) is countably additive. However, this is
not true. Although for a given collection {Bn }n≥1 of disjoint sets in F,
there is an exceptional set A0 such that P (A0 ) = 0 and for ω )∈ A0 ,
P(
*
n≥1
Bn |G)(ω) =
∞
)
n=1
P (Bn |G)(ω).
However, this A0 depends on {Bn }n≥1 and as the collection varies, these
exceptional sets can be an uncountable collection whose union may not be
contained in a set of probability zero.
Definition 12.3.2: Let (Ω, F, P ) be a probability space and G be a subσ-algebra of F. A function µ : F × Ω → [0, 1] is called a regular conditional
probability on F given G if
(i) for all B ∈ F, µ(B, ω) = P (B|G) w.p. 1, and
(ii) for all ω ∈ Ω, µ(B, ω) is a probability measure on (Ω, F).
If a regular conditional probability (r.c.p.) µ(·, ·) exists on F given G,
then conditional expectation of Y given G can be computed as
%
E(Y |G)(ω) = Y (ω $ )µ(dω $ , ω) w.p. 1
12.4 Problems
393
for all Y such that E|Y | < ∞. The proof of this is via standard approximation using simple random variables (Problem 12.15).
A sufficient condition for the existence of r.c.p. is provided by the following result.
Theorem 12.3.1: Let (Ω, F, P ) be a probability space. Let S be a Polish
space and S be its Borel σ-algebra. Let X be an S-valued random variable
on (Ω, F). Then for any σ-algebra G ⊂ F, there is a regular conditional
probability on σ1X2 given G, where σ1X2 = {X −1 (D) : D ∈ S}.
Proof: (for S = R). Let Q = {rj } be the set of rationals. Let F (rj , ω) =
P (X ≤ rj |G)(ω) w.p. 1. Then, there is a set A0 ∈ G such that P (A0 ) = 0
and for ω )∈ A0 , F (r, ω) is monotone nondecreasing on Q. For x ∈ R, set
/
sup{F (r, ω) : r ≤ x} if ω )∈ A0
F (x, ω) ≡
F0 (x)
if ω ∈ A0 ,
where F0 (x) is a fixed cdf (say, F0 = Φ, the standard normal cdf). Then,
F (x, ω) is a cdf in x for each ω and for each x, F (x, ·) is G-measurable.
Let µ(B, ω) be the Lebesgue-Stieltjes measure induced by F (·, ω). Then
it can be checked using the π − λ theorem (Theorem 1.1.2) that µ(·, ·) is a
regular conditional probability on σ1X2 given G (Problem 12.16).
!
Remark 12.3.1: When F = σ1X2, the regular conditional probability on
F given G is also called the regular conditional probability distribution of
X given G.
Remark 12.3.2: For a proof for the general Polish case, see Durrett (2004)
and Parthasarathy (1967).
12.4 Problems
12.1 Let (X, Y ) be a bivariate random vector with EY 2 < ∞. Let H =
L2 (R2 , B(R), PX,Y ) and H0 = {h(X) | h : R → R is Borel measurable
and Eh(X)2 < ∞}. Suppose that for some h(X) ∈ H0 ,
EY IA = Eh(X)IA
for all A ∈ σ1X2.
Show that E(Y − h(X))Z1 = 0 for all Z1 ∈ H0 . Show also that H0 is
a closed subspace of H.
(Hint: For any Z1 ∈ H0 , there exists a sequence of simple random
variables {Wn }n≥1 ⊂ H0 such that |Wn | ≤ |Z1 | and Wn → Z1 a.s.
Now, apply the DCT. For the second part, use the fact that f : Ω → R
is σ1X2-measurable iff there is a Borel measurable function h : R → R
such that f = h(X).)
394
12. Conditional Expectation and Conditional Probability
12.2 Let (X, Y ) be a bivariate random vector that has an absolutely continuous distribution on (R2 , B(R2 )) w.r.t. the Lebesgue measure with
density f (x, y). Suppose that E|Y | < ∞. Show& that a version of
E(Y |X) is given by h0 (X), where, with fX (x) = f (x, y)dy,
@ # yf (x,y)dy
if
fX (x) > 0
fX (x)
h0 (x) =
0
otherwise.
(Hint: Verify (1.8) for all A ∈ σ1X2.)
12.3 Let Z1 and Z2 be two random variables on a probability space
(Ω, G, P ).
(a) Suppose that E|Z1 | < ∞, E|Z2 | < ∞ and
EZ1 IA = EZ2 IA
for all A ∈ G.
(4.1)
Show that P (Z1 = Z2 ) = 1.
(b) Suppose that Z1 and Z2 are nonnegative and (4.1) holds. Show
that P (Z1 = Z2 ) = 1.
(c) Prove Proposition 12.1.3.
(Hint: (a) Consider (4.1) with A1 = {Z1 − Z2 > 0} and A2 =
{Z1 − Z2 < 0} and conclude that P (A1 ) = 0 = P (A2 ).
(b) Let A1n = {Z1 ≤ n, Z2 ≤ n, Z1 − Z2 > 0} and A2n = {Z1 ≤
≥ 1. Then, by (4.1), P (A1n ) = 0 = P (A2n )
n, Z2 ≤ n, Z1 −Z2 < 0}, n #
for all n ≥ 1. But A1 = n≥1 Ain , i = 1, 2, . . . , where Ai ’s are as
above.)
12.4 Prove Theorem 12.2.3.
12.5 Let Xi be a ki -dimensional random vector, ki ∈ N, i = 1, 2 such that
X1 and X2 are independent. Let h : Rk1 +k2 → [0, ∞) be a Borel
measurable function. Show that
"
!
E h(X1 , X2 ) | X1 = g(X1 )
(4.2)
k1
where g(x) = Eh(x, X
J 2 ), x ∈ R J . Show that (4.2) is also valid for a
real valued h with E Jh(X1 , X2 )J < ∞.
(Hint: Let k = k1 + k2 , Ω = Rk , F = B(Rk ), P = PX1 × PX2 . Verify
(1.8) for all A ∈ {A1 × Rk2 : A1 ∈ B(Rk1 )} ≡ σ1X1 2.)
12.6 Let X be a random variable on a probability space (Ω, F, P ) with
EX 2 < ∞ and let G ⊂ F be a sub-σ-field.
(a) Show that for any A ∈ G,
J%
J '%
(1/2
J
J
E(X|G)dP J ≤
E(X 2 |G)dP
.
J
A
A
(4.3)
12.4 Problems
395
(b) Show that (4.3) is valid for all A ∈ F.
12.7 Let f : (Rk , B(Rk ), P ) → (R, B(R)) be an integrable function where
P (A) = 2
−k
%
exp
A
-
−
k
)
i=1
.
|xi | dx1 , . . . , dxk , A ∈ B(Rk ).
For each of the following cases, find a version of E(f |G) and justify
your answer:
(a) G = σ1{A ∈ B(Rk ) : A = −A}2,
(b) G = σ1{(j1 , j1 + 1] × · · · × (jk , jk + 1] : j1 , . . . , jk ∈ Z}2,
(c) G = σ1{B × {0} : B ∈ B(Rk−1 )}2.
12.8 Let (Ω, F, P ) be a probability space and G = {∅, B, B c , Ω} for some
B ∈ F with P (B) ∈ (0, 1). Determine P (A|G) for A ∈ F.
12.9 Let {Xn : n ∈ Z} be a collection of independent random variables
with E|X0 | < ∞. Show that
(a) E(X0 | X1 , . . . , Xn ) = EX0 for any n ∈ N,
(b) E(X0 | X−n , . . . , X−1 ) = EX0 for any n ∈ N,
(c) E(X0 | X1 , X2 , . . .) = EX0 = E(X0 | . . . , X−2 , X−1 ).
12.10 Let X be a random variable on a probability space (Ω, F, P ) with
E|X| < ∞ and let C be a π-system such that σ1C2 = G ⊂ F. Suppose
that there exists a G-measurable function f : Ω → R such that
%
%
f dP =
XdP for all A ∈ C.
A
A
Show that f = E(X|G).
12.11 Let X and Y be integrable random variables
& on (Ω, F,
& P ) and let C
be a semi-algebra, C ⊂ F. Suppose that A XdP ≤ A Y dP for all
A ∈ C. Show that
E(X|G) ≤ E(Y |G)
where G = σ1C2.
12.12 Let X, Y ∈ L2 (Ω, F, P ). If E(X|Y ) = Y and E(Y |X) = X, then
P (X = Y ) = 1.
(Hint: Show that E(X − Y )2 = EX 2 − EY 2 .)
12.13 Let {Xn }n≥1 , X be a collection of random variables on (Ω, F, P ) and
let G be a sub-σ-algebra of F. If limn→∞ E|Xn − X|r = 0 for some
r ≥ 1, then
J
Jr
lim E JE(Xn |G) − E(X|G)J = 0.
n→∞
396
12. Conditional Expectation and Conditional Probability
12.14 Let X, Y ∈ L2 (Ω, F, P ) and let G be a sub-σ-algebra of F. Show that
E{Y E(X|G)} = E{XE(Y |G)}.
12.15 Let Y be an integrable random variable& on (Ω, F, P ) and let µ be a
r.c.p. on F given G. Show that h(ω) ≡ Y (ω1 )µ(dω1 , ω), ω ∈ Ω is a
version of E(Y |G).
(Hint: Prove this first for Y = IA , A ∈ F. Extend to simple functions
by linearity. Use the DCT for CE for the general case.)
12.16 Complete the proof of Theorem 12.3.1 for S = R.
12.17 Let (Ω, F, P ) be a probability space, G be a sub-σ-algebra of F, and
let {An }n≥1 ⊂ F be a collection of disjoint sets. Show that
P
-*
n≥1
An |G
.
=
∞
)
n=1
P (An |G).
Definition 12.4.1: Let G be a σ-algebra and let {Gλ : λ ∈ Λ} be a
collection of subsets of F in a probability space (Ω, F, P ). Then, {Gλ : λ ∈
Λ} is called conditionally independent given G if for any λ1 , . . . , λk ∈ Λ,
k ∈ N,
k
" K
!
P (Ai |G)
P A1 ∩ · · · ∩ Ak |G =
i=1
for all A1 ∈ G1 , . . . , Ak ∈ Gk . A collection of random variables {Xλ : λ ∈ Λ}
on (Ω, F, P ) is called conditionally independent given G if {σ1Xλ 2 : λ ∈ Λ}
is conditionally independent given G.
12.18 Let G1 , G2 , G3 be three sub-σ-algebras of F. Recall that Gi ∨ Gj =
σ1Gi ∪ Gj 2, 1 ≤ i )= j ≤ 3. Show that G1 and G2 are conditionally
independent given G3 iff
P (A1 |G2 ∨ G3 ) = P (A1 |G3 )
for all A1 ∈ G,
iff E(X|G2 ∨ G3 ) = E(X|G3 )
for every X ∈ L1 (Ω, G1 ∨ G3 , P ).
12.19 Let G1 , G2 , G3 be sub-σ-algebra of F. Show that if G1 ∨ G3 is independent of G2 , then G1 and G2 are conditionally independent given
G3 .
12.20 Give an example where
!
"
!
"
E E(Y |X1 ) | X2 )= E E(Y |X2 ) | X1 .
12.4 Problems
397
12.21 Let X be an Exponential (1) random variable. For t > 0, let Y1 =
min{X, t} and Y2 = max{X, t}. Find E(X|Yi ) i = 1, 2.
(Hint: Verify that σ1Y1 2 is the σ-algebra generated by the collection
{X −1 (A) : A ∈ B(R), A ⊂ [0, t)} ∪ {X −1 [t, ∞)}.)
12.22 Let (X, Y ) be a bivariate random vector with a joint pdf w.r.t. the
Lebesgue measure f (x, y). Show that E(X|X + Y ) = h(X + Y ) where
.
-&
'%
(
xf (x, z − x)dx
f (x, z − x)dx .
I(0,∞)
h(z) = &
f (x, z − x)dx
12.23 Let {Xi }i≥1 be iid random variables with E|X1 | < ∞. Show that for
any n ≥ 1,
!
" X1 + X2 + · · · + Xn
E X1 | (X1 + X2 + · · · + Xn ) =
.
n
!
"
(Hint: Show that E Xi | (X1 + · · · + Xn ) is the same for all 1 ≤ i ≤
n.)
Definition 12.4.2: A finite collection of random variables {Xi : 1 ≤
i ≤ n} on a probability space (Ω, F, P ) is said to be exchangeable if for
any permutation (i1 , i2 , . . . , in ) of (1, 2, . . . , n), the joint distribution of
(Xi1 , Xi2 , . . . , Xin ) is the same as that of (X1 , X2 , . . . , Xn ). A sequence
of radom variables {Xi }i≥1 on a probability space (Ω, F, P ) is said to be
exchangeable if for any finite n, the collection {Xi : 1 ≤ i ≤ n} is exchangeable.
12.24 Let {Xi : 1 ≤ i ≤ n+1} be a finite collection of random variables such
that conditional Xn+1 , {X1 , X2 , . . . , Xn } are iid. Show that {Xi : 1 ≤
i ≤ n} is exchangeable.
12.25 Let {Xi : 1 ≤ i ≤ n} be exchangeable. Suppose E|X1 | < ∞. Show
that
" X1 + X2 + · · · + Xn
!
.
E X1 | (X1 + · · · + Xn ) =
n
12.26 Let (X1 , X2 , X3 ) be random variables such that
P (X2 ∈ · | X1 )
P (X3 ∈ · | X1 , X2 )
=
=
p1 (X1 , ·)
and
p2 (X2 , ·)
where for each i = 1, 2, pi (x, ·) is a probability transition function on
R as defined in Example 6.3.8. Suppose pi (x, ·) admits a pdf fi (x, ·)
i = 1, 2, . . .. Show that
P (X1 ∈ · | X2 , X3 ) = P (X1 ∈ · | X2 ).
(This says that if {X1 , X2 , X3 } has the Markov property, then so does
{X3 , X2 , X1 }.)
398
12. Conditional Expectation and Conditional Probability
12.27 (Rao-Blackwell theorem). Let Y ∈ L2 (Ω, F, P ) and G ⊂ F be a subσ-algebra. Show that there exists Z ∈ L2 (Ω, G, P ) such that EZ =
EY and Var(Z) ≤ Var(Y ).
(Hint: Consider Z = E(Y |G).)
12.28 Let (X, Y ) have an absolutely continuous bivariate distribution with
density fX,Y (x, y). Show that there is a regular conditional probability on σ1Y 2 given σ1X2 and that this probability measure induces an
absolutely continuous distribution on R. Find its density.
12.29 Suppose, in the above problem,
fX,Y (x, y) =
1 ' y − m(x) (
φ
g(x)
σ(x)
σ(x)
where m(·), σ(·), φ(·), and g(·) are all Borel measurable functions
on R to R with σ, φ, and g being nonnegative and φ and g being
probability densities.
(a) Find the marginal probability densities fX (·) and fY (·) of X
and Y , respectively. Set up the integrals for EX and EY .
(b) Using the conditioning argument in Proposition 12.1.5, show
that
%
'%
(' %
(
EY = m(x)g(x)dx +
uφ(u)du
σ(x)g(x)dx
(assuming that all the integrals are well defined).
(c) Find similar expressions for EY 2 and E(etY ).
12.30 Let X, Y , Z ∈ L1 (Ω, F, P ). Suppose that
E(X|Y ) = Z, E(Y |Z) = X, E(Z|X) = Y.
Show that X = Y = Z w.p. 1.
12.31 Let X, Y ∈ L2 (Ω, F, P ). Suppose E|Y |4 < ∞. Show that
;
:
min E|X − (a + bY + cY 2 )|2 : a, b, c ∈ R
:
= max E(XZ) : Z ∈ L2 (Ω, F, P ), EZ = 0, EZY = 0,
;
EZY 2 = 0, EZ 2 = 1 .
12.32 Let X ∈ L2 (Ω, F, P ) and G be a sub-σ-algebra of F
(a) Show that
;
:
min E(X − Y )2 : Y ∈ L2 (Ω, G, P )
;
:
= max (EXZ)2 : EZ 2 = 1, E(Z|G) = 0 .
(b) Find a random variable Z such that E(Z|G) = 0 w.p. 1 and
ρ ≡ corr(X, Z) is maximized.
13
Discrete Parameter Martingales
13.1 Definitions and examples
This section deals with a class of stochastic processes called martingales.
Martingales arise in a natural way in many problems in probability and
statistics. It provides a more general framework than the case of independent random variables where results, like the SLLN, the CLT, and other
convergence theorems, can be established. Much of the discrete parameter
martingale theory was developed by the great American mathematician
J. L. Doob, whose book (Doob (1953)) has been very influential.
Definition 13.1.1: Let (Ω, F, P ) be a probability space and let N =
{1, . . . , n0 } be a nonempty subset of N = {1, 2, . . .}, n0 ≤ ∞.
(a) A collection {Fn : n ∈ N } of sub-σ-algebras of F is called a filtration
if Fn ⊂ Fn+1 for all 1 ≤ n < n0 .
(b) A collection of random variables {Xn : n ∈ N } is said to be adapted
to the filtration {Fn : n ∈ N } if Xn is Fn -measurable for all n ∈ N .
(c) Given a filtration {Fn : n ∈ N } and random variables {Xn : n ∈ N },
the collection {(Xn , Fn ) : n ∈ N } is called a martingale if
(i) {Xn : n ∈ N } is adapted to {Fn : n ∈ N },
(ii) E|Xn | < ∞ for all n ∈ N , and
(iii) for all 1 ≤ n < n0 ,
E(Xn+1 |Fn ) = Xn .
(1.1)
400
13. Discrete Parameter Martingales
When N = N, there is no maximum element in N . In this case, Definition 13.1.1 is to be interpreted by setting n0 = +∞ in parts (a) and
(c) (iii). A similar convention applies to Definition 13.1.2 below. Also, recall that equalities and inequalities involving conditional expectations are
interpreted as being valid events w.p. 1.
If {(Xn , Fn ) : n ∈ N } is a martingale, then {Xn : n ∈ N } is also said to
be a martingale w.r.t. (the filtration) {Fn : n ∈ N }. Also {Xn : n ∈ N } is
called a martingale if it is a martingale w.r.t. some filtration. Observe that
if {Xn : n ∈ N } is a martingale w.r.t. any given filtration {Fn : n ∈ N },
it is also a martingale w.r.t. the natural filtration {Xn : n ∈ N }, where
Xn = σ1{X1 , . . . , Xn }2, n ∈ N . Clearly, {Xn : n ∈ N } is adapted to {Xn :
n ∈ N }. To see that E(Xn+1 |Xn ) = Xn for all 1 ≤ n < n0 , note that
Xn ⊂ Fn for all n ∈ N and hence,
E(Xn+1 |Xn )
= E(E(Xn+1 |Fn ) | Xn )
= E(Xn |Xn ) = Xn .
(1.2)
Thus, {(Xn , Xn ) : n ∈ N } is a martingale.
A classic interpretation of martingales in the context of gambling is given
as follows. Let Xn represent the fortune of a gambler at the end of the
nth play and let Fn be the information available to the gambler up to
and including the nth play. Then, Fn contains the knowledge of all events
like {Xj ≤ r} for r ∈ R, j ≤ n, making Xn measurable w.r.t. Fn . And
Condition (iii) in Definition 13.1.1 (c) says that given all the information
up until the end of the nth play, the expected fortune of the gambler at the
end of the (n + 1)th play remains unchanged. Thus a martingale represents
a fair game. In situations where the game puts the gambler in a favorable or
unfavorable position, one may express that by suitably modifying condition
(iii), yielding what are known as sub- and super-martingales, respectively.
Definition 13.1.2: Let {Fn : n ∈ N } be a filtration and {Xn : n ∈ N } be
a collection of random variables in L1 (Ω, F, P ) adapted to {Fn : n ∈ N }.
Then {(Xn , Fn ) : n ∈ N } is called a sub-martingale if
E(Xn+1 |Fn ) ≥ Xn
for all
1 ≤ n < n0 ,
(1.3)
for all
1 ≤ n < n0 .
(1.4)
and a super-martingale
E(Xn+1 |Fn ) ≤ Xn
Suppose that {(Xn , Fn ) : n ∈ N } is a sub-martingale. Then A ∈ Fn
implies that A ∈ Fn+1 ⊂ . . . ⊂ Fn+k for every k ≥ 1, n + k ∈ N and hence,
by (1.3),
%
%
%
Xn dP ≤
E(Xn+1 |Fn )dP =
Xn+1 dP
A
A
A
13.1 Definitions and examples
..
.
≤
%
401
(1.5)
Xn+k dP.
A
Therefore, E(Xn+k |Fn ) ≥ Xn and, by taking A = Ω in (1.5), EXn+k ≥
EXn . Thus, the expected values of a sub-martingale is nondecreasing. For
a martingale, by (1.2), equality holds at every step of (1.5) and hence,
E(Xn+k |Fn ) = Xn , EXn+k = EXn
(1.6)
for all k ≥ 1, n, n + k ∈ N . Thus, in a fair game, the expected fortune of
the gambler remains constant over time.
Here are some examples.
Example 13.1.1: (Random walk). Let Z1 , Z2 , . . . be a sequence of iid
random variables on a probability space (Ω, F, P ) with finite mean µ = EZ1
and let Fn = σ1Z1 , . . . , Zn 2, n ≥ 1. Let Xn = Z1 + . . . Zn , n ≥ 1. Then, for
all n ≥ 1, σ1Xn 2 ⊂ Fn and E|Xn | < ∞ for all n ≥ 1. Also,
"
!
E(Xn+1 |Fn ) = E (Z1 + . . . + Zn+1 ) | Z1 , . . . , Zn
= Z1 + . . . + Zn + EZn+1 (by independence)
= Xn + µ,
so that
E(Xn+1 |Fn )
=
Xn
if µ = 0
>
Xn
if µ > 0
<
Xn
if µ < 0.
Thus, {(Xn , Fn ) : n ≥ 1} is a martingale if µ = 0, a sub-martingale if µ ≥ 0
and a super-martingale if µ ≤ 0.
}n≥1 and {Fn }n≥1
Example 13.1.2: (Random walk continued). Let {Zn$
n
be as in Example 13.1.1 and let EZ12 < ∞. Let Yn = i=1 (Zi − µ)2 and
Ỹn = Yn − nσ 2 , where σ 2 = V ar(Z1 ). Then, check that {(Yn , Fn ) : n ≥ 1}
is a sub-martingale and {(Ỹn , Fn ) : n ≥ 1} is a martingale (Problem 13.3).
Example 13.1.3: (Doob’s martingale). Let Z be an integrable random
variable and let {Fn : n ≥ 1} be a filtration both defined on a probability
space (Ω, F, P ). Define
Xn = E(Z|Fn ), n ≥ 1.
(1.7)
Then, clearly, Xn is integrable and Fn -measurable for all n ≥ 1. Also,
E(Xn+1 |Fn )
= E(E(Z|Fn+1 ) | Fn )
= E(Z|Fn ) = Xn .
402
13. Discrete Parameter Martingales
Thus, {(Xn , Fn ) : n ≥ 1} is a martingale.
Example 13.1.4: (Generation of a martingale from a given sequence of
random variables). Let {Yn }n≥1 ⊂ L1 (Ω, F, P ) be an arbitrary collection of
integrable random variables and let Fn = σ1Y1 , . . . , Yn 2, n ≥ 1. For n ≥ 1,
define
n
)
Xn =
{Yj − E(Yj |Fj−1 )}
(1.8)
j=1
where F0 ≡ {∅, Ω}.
Then, for each n ≥ 1, Xn is integrable and Fn -measurable. Also, for
n ≥ 1,
E(Xn+1 |Fn )
=
=
n+1
)
j=1
n
)
j=1
=
E([Yj − E(Yj |Fj−1 )] | Fn )
[Yj − E(Yj |Fj−1 )] + [E(Yn+1 |Fn ) − E(Yn+1 |Fn )]
Xn .
Hence {(Xn , Fn ) : n ≥ 1} is a martingale. Thus, one can construct a martingale sequence starting from any arbitrary sequence of integrable$
random
n
variables. When {Yn }n≥1 are iid with EY1 = 0, (1.8) yields Xn = j=1 Yj
and one gets the martingale sequence of Example 13.1.1.
Example 13.1.5: (Branching process). Let {ξnk : n ≥ 1, k ≥ 1} be
a double array of iid nonnegative integer valued random variables with
Eξnk = µ ∈ (0, ∞). One may think of ξnk as the number of offspring of the
kth individual at time n in an evolving population. Let Zn denote the size
of the population at time n, n ≥ 0. If one considers the evolution of the
population starting with a single individual at time n = 0, then
Zn−1
Z0 = 1
and Zn =
)
k=1
ξnk , n ≥ 1.
The sequence Z0 , Z1 , . . . is called a branching process (cf. Chapter 18).
Let Fn = σ1Z1 , . . . , Zn 2, n ≥ 1. Then, for n ≥ 1,
E(Zn+1 |Fn ) = E
-)
Zn
k=1
ξn+1,k | Zn
.
= µZn
(1.9)
and therefore, {(Zn , Fn ) : n ≥ 1} is a martingale, sub-martingale or supermartingale according as µ = 1, µ ≥ 1 or µ ≤ 1. One can define a new
13.1 Definitions and examples
403
sequence {Xn }n≥1 from {Zn }n≥1 such that {Xn }n≥1 is a martingale w.r.t.
Fn for all values of µ ∈ (0, ∞). Let
Xn = µ−n Zn , n ≥ 1.
(1.10)
Then, using (1.9), it is easy to check that {(Xn , Fn ) : n ≥ 1} is a martingale.
Example 13.1.6: (Likelihood ratio). Let Y1 , Y2 , . . . be a collection of random variables on a probability space (Ω, F, P ) and let Fn = σ1Y1 , . . . , Yn 2,
n ≥ 1. Let Q be another probability measure on F. Suppose that under
both P and Q, the joint distributions of (Y1 , . . . , Yn ) are absolutely continuous w.r.t. the Lebesgue measure λn on Rn , n ≥ 1. Denote the corresponding
densities by pn (y1 , . . . , yn ) and qn (y1 , . . . , yn ), n ≥ 1 and for simplicity, suppose that pn (y1 , . . . , yn ) is everywhere positive. Then, a likelihood ratio for
discriminating between the probability measures P and Q on the basis of
the observations Y1 , . . . , Yn , is given by
Xn = qn (Y1 , . . . , Yn )/pn (Y1 , . . . , Yn ), n ≥ 1.
A higher value of Xn is supposed to provide “evidence” in favor of Q
(over P ) as the “true” probability measure determining the distribution of
(Y1 , . . . , Yn ). It will now be shown that {Xn }n≥1 is a martingale w.r.t. Fn ,
n ≥ 1 under P . Clearly, Xn is Fn -measurable for all n ≥ 1 and
%
%
qn (Y1 , . . . , Yn )
|Xn |dP =
dP
Ω pn (Y1 , . . . , Yn )
%
qn (y1 , . . . , yn )
pn (y1 , . . . , yn )dλn
=
p
Rn n (y1 , . . . , yn )
%
=
qn (y1 , . . . , yn )dλn = Q(Ω) = 1 < ∞,
Rn
so that Xn is integrable (w.r.t. P ) for all n ≥ 1. Noting that the sets in the
σ-algebra Fn are given by {(Y1 , . . . , Yn ) ∈ B} for B ∈ B(Rn ), one has, for
any n ≥ 1,
%
%
qn+1 (Y1 , . . . , Yn+1 )
dP
Xn+1 dP =
p
{(Y1 ,...,Yn )∈B}
{(Y1 ,...,Yn+1 )∈B×R} n+1 (Y1 , . . . , Yn+1 )
%
qn+1 (y1 , . . . , yn+1 )
=
p
B×R n+1 (y1 , . . . , yn+1 )
=
%
pn+1 (y1 , . . . , yn+1 )dλn+1
qn (y1 , . . . , yn )dλn
B
=
%
B
qn (y1 , . . . , yn )
pn (y1 , . . . , yn )dλn
pn (y1 , . . . , yn )
404
13. Discrete Parameter Martingales
=
%
{(Y1 ,...,Yn )∈B}
Xn dP,
implying that E(Xn+1 |Fn ) = Xn , n ≥ 1. This shows that {(Xn , Fn ) : n ≥
1} is a martingale for any arbitrary sequence of random variables {Yn }n≥1
under P . However, {(Xn , Fn ) : n ≥ 1} may not be a martingale under Q.
Example 13.1.7: (Radon-Nikodym derivatives). Let Ω = (0, 1], F =
B(0, 1], the Borel σ-algebra on (0, 1] and let P denote the Lebesgue measure
on (0, 1]. For n ≥ 1, let Fn be the σ-algebra generated by the partition
{((k − 1)2−n , k2n ], k = 1, . . . , 2n } of (0, 1] by dyadic intervals. Let ν be a
finite measure on (Ω, F). Let νn be the restriction of ν to Fn and Pn be
the restriction of P to Fn , for each n ≥ 1. As Fn consists of all disjoint
unions of the intervals ((k − 1)2−n , k2−n ], 1 ≤ k ≤ 2n , Pn (A) = 0 for some
A ∈ Fn iff A = ∅. Consequently, νn (A) = 0 whenever Pn (A) = 0, A ∈ Fn ,
i.e., νn ; Pn . Let Xn denote the Radon-Nikodym derivative of νn w.r.t.
Pn , given by
2n 8 '
( 9
)
Xn =
ν ((k − 1)2−n , k2−n ] 2n I((k−1)2−n ,k2−n ] .
(1.11)
k=1
Clearly, Xn is Fn -measurable and P -integrable. It is easy to check that
E(Xn+1 |Fn ) = Xn for all n ≥ 1. Hence {(Xn , Fn ) : n ≥ 1} is a martingale
on (Ω, F, P ). Note that the absolute continuity of νn w.r.t. Pn (on Fn )
holds for each 1 ≤ n < ∞ even though the measure ν may not be absolutely
continuous w.r.t. P on F = B((0, 1]).
Proposition 13.1.1:
(Convex functions of martingales and submartingales). Let φ : R → R be a convex function and let N =
{1, 2, . . . n0 } ⊂ N be a nonempty subset.
(i) If {(Xn , Fn ) : n ∈ N } is a martingale with E|φ(Xn )| < ∞ for all
n ∈ N , then {(φ(Xn ), Fn ) : n ∈ N } is a sub-martingale.
(ii) If {(Xn , Fn ) : n ∈ N } is a sub-martingale, E|φ(Xn )| < ∞ for all n ∈
N , and in addition, φ is nondecreasing, then {(φ(Xn ), Fn ) : n ∈ N }
is a sub-martingale.
Proof: By the conditional Jensen’s inequality (Theorem 12.2.4) for all
1 ≤ n < n0 ,
(1.12)
E(φ(Xn+1 )|Fn ) ≥ φ(E(Xn+1 |Fn )).
Parts (i) and (ii) follow from (1.12) on using the martingale and submartingale properties of {Xn }n∈N , respectively.
!
Proposition 13.1.2: (Doob’s decomposition of a sub-martingale). Let
{(Xn , Fn ) : n ∈ N } be a sub-martingale for some N = {1, . . . , n0 } ⊂ N.
Then, there exist two sets of random variables {Yn : n ∈ N } and {Zn : n ∈
N } satisfying Xn = Yn + Zn , n ∈ N such that
13.2 Stopping times and optional stopping theorems
405
(i) {(Yn , Fn ) : n ∈ N } is a martingale
(ii) For all n ∈ N , Zn+1 ≥ Zn ≥ 0 w.p. 1 and Zn is Fn−1 -measurable,
where F0 = {∅, Ω}.
(iii) If {Xn : n ∈ N } are L1 -bounded, i.e., M ≡ max{E|Xn | : n ∈ N } <
∞, then so are {Yn : n ∈ N } and {Zn : n ∈ N }.
Proof: Define the difference variables ∆n ’s by
∆1 = X1 , and ∆n = Xn − Xn−1 , n ≥ 2, n ∈ N.
$n
Note that Xn = j=1 ∆j , n ∈ N , and E(∆n |Fn−1 ) ≥ 0 for all n ≥ 2,
n ∈ N . Now, set
Y1
=
∆1 , Yn = Xn −
and
Z1
=
0, Zn =
n
)
j=2
n
)
j=2
E(∆j |Fj−1 ), n ≥ 2, n ∈ N
E(∆j |Fj−1 ), n ≥ 2, n ∈ N.
Check that the requirements (i) and (ii) above hold. To prove the L1 boundedness, notice that by (ii), for any n ≥ 2, n ∈ N ,
E|Zn | = EZn = E
1)
n
j=2
2 )
n
E(∆j |Fj−1 ) =
E(∆j )
j=2
= EXn − EX1 ≤ 2M.
(1.13)
Also, Xn = Yn + Zn for all n ∈ N implies that
|Yn | ≤ |Xn | + |Zn |, n ∈ N.
Hence, (iii) follows.
(1.14)
!
13.2 Stopping times and optional stopping
theorems
In the following (and elsewhere), ‘n ≥ 1’ is used as an alternative for the
statement ‘n ∈ N’ or equivalently for 1 ≤ n < ∞.
Definition 13.2.1: Let (Ω, F, P ) be a probability space, {Fn }n≥1 be a
filtration and T be a F-measurable random variable taking values in the
set N̄ ≡ N ∪ {∞} = {1, 2, . . . , } ∪ {∞}.
406
13. Discrete Parameter Martingales
(a) T is called a stopping time w.r.t. {Fn }n≥1 if
{T = n} ∈ Fn
for all n ≥ 1.
(2.1)
(b) T is called a finite or proper stopping time w.r.t. {Fn }n≥1 (under P )
if
P (T < ∞) = 1.
(2.2)
Given a filtration {Fn }n≥1 , define the σ-algebra
*
Fn 2.
F∞ = σ1
(2.3)
n≥1
#
Since {T = +∞}c = n≥1 {T = n} ∈ F∞ , (2.1) is equivalent to ‘{T =
n} ∈ Fn for all 1 ≤ n ≤ ∞’. It is also easy to check (Problem 13.7) that T
is a stopping time w.r.t. {Fn }n≥1 iff
{T ≤ n} ∈ Fn
for all n ≥ 1
(2.4)
iff {T > n} ∈ Fn for all n ≥ 1. However, the condition
‘{T ≥ n} ∈ Fn
for all n ≥ 1’
(2.5)
does not always imply that T is a stopping time w.r.t. {Fn }n≥1 (cf. Problem
13.7). Note that for a stopping time T w.r.t. {Fn }n≥1 ,
{T ≥ n} = {T ≤ n − 1}c ∈ Fn−1
for n ≥ 2,
and {T ≥ 1} = Ω. Since Fn−1 ⊂ Fn for all n ≥ 2, (2.5) is a weaker
requirement than T being a stopping time w.r.t {Fn }n≥1 .
Proposition 13.2.1: Let T be a stopping time w.r.t. {Fn }n≥1 and let F∞
be as in (2.3). Define
FT = {A ∈ F∞ : A ∩ {T = n} ∈ Fn
for all
n ≥ 1}.
(2.6)
Then, FT is a σ-algebra.
Proof: Left as an exercise (Problem 13.8).
!
If T is a stopping time w.r.t. {Fn }n≥1 , then for any m ∈ N, {T = m} ∩
{T = n} = ∅ ∈ Fn for all n )= m and {T = m}∩{T = n} = {T = m} ∈ Fm
for n = m. Thus, {T = m} ∈ FT for all m ≥ 1 and hence, σ1T 2 ⊂ FT . But
the reverse inclusion may not hold as shown below.
Example 13.2.1: Let T ≡ m for some given integer m ≥ 1. Then, T is a
stopping time w.r.t. any filtration {Fn }n≥1 . For this T ,
A ∈ FT ⇒ A ∩ {T = m} ∈ Fm ⇒ A ∈ Fm ,
13.2 Stopping times and optional stopping theorems
407
so that FT ⊂ Fm . Conversely, suppose A ∈ Fm . Then, A ∩ {T = n} = ∅ ∈
Fn for all n )= m, and A ∩ {T = m} = A ∈ Fm for n = m. Thus, Fm = FT .
But σ1T 2 = {Ω, ∅}.
Example 13.2.2: Let {Xn }n≥1 be a sequence of random variables adapted
to a filtration {Fn }n≥1 and let {Bn }n≥1 be a sequence of Borel sets in R.
Define the random variable
;
:
(2.7)
T = inf n ≥ 1 : Xn ∈ Bn .
Then, T (ω) < ∞ if Xn (ω) ∈ Bn for some n ∈ N and T (ω) = +∞ if
Xn (ω) )∈ Bn for all n ∈ N. Since, for any n ≥ 1,
;
:
{T = n} = X1 )∈ B1 , . . . , Xn−1 )∈ Bn−1 , Xn ∈ Bn ∈ Fn ,
T is a stopping time w.r.t. {Fn }n≥1 . Now, define a new random variable
XT by
@
Xm
if T = m
XT =
(2.8)
lim sup Xn if T = ∞.
n→∞
The, XT ∈ R̄ and for any n ≥ 1 and r ∈ R,
{XT ≤ r} ∩ {T = n} = {Xn ≤ r} ∩ {T = n} ∈ Fn .
Also, {XT = ±∞} ∩ {T = n} = {Xn = ±∞} ∩ {T = n} = ∅ for all n ≥ 1.
Hence, it follows that XT is 1FT , B(R̄)2-measurable.
Example 13.2.3: Let {Yn }n≥1 be a sequence of iid random variables
with EY1 = µ. Let Xn = (Y1 + . . . + Yn ), n ≥ 1 denote the random walk
corresponding to {Yn }n≥1 . For x > 0, let
:
√ ;
(2.9)
Tx = inf n ≥ 1 : Xn > nµ + x n .
exceeds
Then, Tx is the√first time the sequence of partial sums {Xn }n≥1 √
the level nµ + x n and is a special case of (2.7) with Bn = (nµ + x n, ∞),
n ≥ 1. Consequently, Tx is a stopping time w.r.t. Fn = σ1Y1 , Y2 , . . . , Yn 2,
n ≥ 1. Note that if EY12 < ∞, by the law of iterated logarithm (cf. 8.7),
lim sup L
n→∞
i.e.,
Xn − nµ
2σ 2 n log log n
Xn > nµ + C
L
n log log n
=1
w.p. 1,
infinitely often w.p. 1
for some constant C > 0. Thus, P (Tx < ∞) = 1 and hence, Tx is a finite
stopping time. This random variable Tx arises in sequential probability ratio
tests (SPRT) for testing hypotheses on the mean of a (normal) population.
See Woodroofe (1982), Chapter 3.
408
13. Discrete Parameter Martingales
Definition 13.2.2: Let {Fn }n≥0 be a filtration in a probability space
(Ω, F, P ). A betting sequence w.r.t. {Fn }n≥0 is a sequence {Hn }n≥1 of
nonnegative random variables such that for each n ≥ 1, Hn is Fn−1 measurable. The following result says that there is no betting scheme that can
beat a gambling system, i.e., convert a fair one into a favorable one or the
other way around.
Theorem 13.2.2: (Betting theorem). Let {Fn }n≥0 be a filtration in a
probability space. Let {Hn }n≥0 be a betting sequence w.r.t. {Fn }n≥0 . For
an adapted sequence
{Xn , Fn }n≥0 let {Yn }n≥0 be defined by Y0 = X0 ,
$n
Yn = Y0 + j=1 (Xj − Xj−1 )Hj , n ≥ 1. Let E|(Xj − Xj−1 )Hj | < ∞
for j ≥ 1. Then,
(i) {Xn , Fn }n≥0 a martingale ⇒ {Yn , Fn }n≥0 is also a martingale,
(ii) {Xn , Fn }n≥0 a sub-martingale ⇒ {Yn , Fn }n≥0 is also a submartingale,
Proof: Clearly, for all n ≥ 1, E|Yn | < ∞ and Yn is Fn -measurable.
Further,
J "
!
"
!
E Yn+1 JFn
= Yn + E (Xn+1 − Xn )Hn+1 | Fn
"
!
= Yn + Hn+1 E (Xn+1 − Xn ) | Fn
since Hn+1 is Fn+1 -measurable. Now the theorem follows from the defining
properties of {Xn , Fn }n≥0 .
!
The above theorem leads to the following results known as Doob’s optional stopping theorems.
Theorem 13.2.3: (Doob’s optional stopping theorem I ). Let {Xn , Fn }n≥0
be a sub-martingale. Let T be a stopping time w.r.t. {Fn }n≥0 . Let X̃n ≡
XT ∧n , n ≥ 0. Then {X̃n , Fn }n≥0 is also a sub-martingale and hence
E X̃n ≥ EX0 for all n ≥ 1.
Proof: For any A ∈ B(R) and n ≥ 0,
X̃n−1 (A)
=
{ω : X̃n ∈ A}
-*
.
n
=
{ω : Xj ∈ A, T = j} ∪ {ω : Xn ∈ A, T > n}.
j=1
Since T is a stopping time w.r.t. {Fn }n≥0 the right side above belongs to
$n
Fn for each n ≥ 0. Next, |X̃n | ≤ j=1 |Xj | and hence E|X̃n | < ∞.
Finally, let Hj = 1 if j ≤ T and 0 if j > T . Since for all j ≥ 1,
{ω : Hj = 1} = {ω : T ≤ j − 1}c ∈ Fj−1 , {Hj }j≥1 is a betting sequence
$n
w.r.t. {Fn }n≥0 . Also, X̃n = X0 + j=1 (Xj − Xj−1 )Hj . Now the betting
theorem (Theorem 13.2.2) implies the present theorem.
!
13.2 Stopping times and optional stopping theorems
409
Remark 13.2.1: If {Xn , Fn }n≥0 is a martingale, then both {Xn , Fn }n≥0
and {−Xn , Fn }n≥0 are sub-martingales, and hence the above theorem implies that if {Xn , Fn }n≥0 is a martingale, then so is {X̃n , Fn }n≥0 , and
hence E X̃n = EXT ∧n = EX0 = EXn for all n ≥ 1.
This suggests the question that if P (T < ∞) = 1, then on letting n → ∞,
does E X̃n → EXT ? Consider the following example. Let {Xn }n≥0 denote
the symmetric simple random walk on the integers with X0 = 0. Let T =
inf{n : n ≥ 1, Xn = 1}. Then P (T < ∞) = 1 and E X̃n = EXT ∧n =
EX0 = 0 but XT = 1 w.p. 1 and hence E X̃n )→ EXT = 1. So, clearly some
additional hypothesis is needed.
Theorem 13.2.4:
(Doob’s optional stopping theorem II ).
Let
{Xn , Fn }n≥0 be a martingale. Let T be a stopping time w.r.t. {Fn }n≥0 .
Suppose P (T < ∞) = 1 and there is a 0 < K < ∞ such that for all n ≥ 0
|XT ∧n | ≤ K w.p. 1.
Then EXT = EX0 .
Proof: Since P (T < ∞) = 1, XT ∧n → XT w.p. 1 and |XT | ≤ K < ∞
!
and hence E|XT | < ∞. Thus, E|XT − XT ∧n | ≤ 2KP (T > n) → 0.
Remark 13.2.2: Since
XT = XT I(T ≤n) + Xn I(T >n)
and
EXT I(T ≤n)
=
=
n
)
j=0
n
)
j=0
E(Xj : T = j)
E(X0 : T = j) = E(X0 : T ≤ n)
"
!
it follows that if E |Xn |I(T >n) → 0 as n → ∞ and P (T < ∞) = 1 then
EXT = EX0 .
A stronger version of Doob’s optional stopping theorem is given below
in Theorem 13.2.6.
Proposition 13.2.5: Let S and T be two stopping times w.r.t. {Fn }n≥1
with S ≤ T . Then, FS ⊂ FT .
Proof: For any A ∈ FS and n ≥ 1,
A ∩ {T = n}
= A ∩ {T = n} ∩ {S ≤ n}
1*
2
n
=
A ∩ {S = k} ∩ {T = n}
k=1
∈
Fn ,
410
13. Discrete Parameter Martingales
since A ∩ {S = k} ∈ Fk for all 1 ≤ k ≤ n. Thus, A ∈ FT , proving the
result.
!
Theorem 13.2.6: (Doob’s optional stopping theorem III ).
Let
{Xn , Fn }n≥1 be a sub-martingale and S and T be two finite stopping times
w.r.t. {Fn }n≥1 such that S ≤ T . If XS and XT are integrable and if
lim inf E|Xn |I(|Xn | > T ) = 0,
n→∞
(2.10)
then
E(XT |FS ) ≥ XS
a.s.
(2.11)
If, in addition, {Xn }n≥1 is a martingale, then equality holds in (2.11).
Thus, Theorem 13.2.6 shows that if a martingale (or a sub-martingale)
is stopped at random
time points S and
; T with S ≤ T , then under very
:
mild conditions, (XS , FS ), (XT , FT ) continues to have the martingale
(sub-martingale, respectively) property.
Proof: To show (2.11), it is enough to show that
%
(XT − XS )dP ≥ 0 for all A ∈ FS .
(2.12)
A
Fix A ∈ FS . Let {nk }k≥1 be a subsequence along which the “lim inf” is
attained in (2.10). Let Tk = min{T, nk } and Sk = min{S, nk }, k ≥ 1. The
proof of (2.12) involves showing that
%
(XTk − XSk )dP ≥ 0 for all k ≥ 1
(2.13)
A
and
lim
k→∞
%
A
B
C
(XT − XS ) − (XTk − XSk ) dP = 0.
(2.14)
Consider (2.13). Since Sk ≤ Tk ≤ nk ,
XTk − XSk
=
=
Tk
)
n=Sk +1
nk
)
n=2
(Xn − Xn−1 )
(Xn − Xn−1 )I(Sk + 1 ≤ n ≤ Tk ).
(2.15)
Note that for all 2 ≤ n ≤ nk , {Tk ≥ n} = {Tk ≤ n − 1}c = {T ≤ n − 1}c ∈
Fn−1 and {Sk + 1 ≤ n} = {Sk ≤ n − 1} = {S ≤ n − 1} ∈ Fn−1 . Also,
since A ∈ FS , Bn ≡ A ∩ {Sk + 1 ≤ n ≤ Tk } = (A ∩ {Sk + 1 ≤ n}) ∩ {Tk ≥
n} ∈ Fn−1 for all 2 ≤ n ≤ nk . Hence, by the sub-martingale property of
13.2 Stopping times and optional stopping theorems
{Xn }n≥1 , from (2.15),
%
A
(XTk − XSk )dP
=
=
nk %
)
n=2 A∩{Sk +1≤n≤Tk }
nk %
)
n=1
Bn
≥ 0.
411
(Xn − Xn−1 )dP
B
C
E(Xn |Fn−1 ) − Xn−1 dP
(2.16)
This proves (2.13). To prove (2.14), note that by (2.10) and the integrability
of XS and XT and the DCT,
J% B
C JJ
J
(XT − XS ) − (XTk − XSk ) dP J
lim J
k→∞
≤ lim
k→∞
≤ lim
k→∞
≤ lim
k→∞
= 0,
A
%
B
8%
C
|XT − XTk | + |XS − XSk | dP
{T >nk }
8%
{T >nk }
(|XT | + |Xnk |)dP +
|XT |dP + 2
%
{T >nk }
%
{S>nk }
(|XS | + |Xnk |)dP
|Xnk |dP +
%
{S>nk }
9
|XS |dP
9
since {S > nk } ⊂ {T > nk } and {T > nk } ↓ ∅ as k → ∞. This proves the
theorem for the case when {Xn }n≥1 is a sub-martingale. When {Xn }n≥1 is
a martingale, equality holds in the last line of (2.16), which implies equality
in (2.13) and hence, in (2.12). This completes the proof.
!
Remark 13.2.3: If there exists a t0 < ∞ such that P (T ≤ t0 ) = 1, then
(2.10) holds.
Corollary 13.2.7: Let {Xn , Fn }n≥1 be a sub-martingale and let T be a
finite stopping time w.r.t. {Fn }n≥1 such that E|XT | < ∞ and (2.10) holds.
Then,
EXT ≥ EX1 .
(2.17)
If, in addition, {Xn }n≥1 is a martingale, then equality holds in (2.17).
Proof: Follows from Theorem 13.2.6 by setting S ≡ 1.
!
Corollary 13.2.8: Let {Xn , Fn }n≥1 be a sub-martingale. Let {Tn }n≥1 be
a sequence of stopping times such that
(i) for all n ≥ 1, Tn ≤ Tn+1 w.p. 1,
(ii) for all n ≥ 1, there exist a nonrandom tn ∈ (0, ∞) such that P (Tn ≤
tn ) = 1.
412
13. Discrete Parameter Martingales
Let Gn ≡ FTn , Yn ≡ XTn , n ≥ 1. Then {Yn , Gn }n≥1 is a sub-martingale. If
{Xn , Fn }n≥1 is a martingale, then {Yn , Gn }n≥1 is a martingale.
Proof: Use Theorem 13.2.6 and Remark 13.2.3.
!
Corollary 13.2.9: Let {Xn , Fn }n≥1 be a sub-martingale. Let T be a stopping time. Let
Tn = min{T, n}, n ≥ 1.
Then {XTn , FTn }n≥1 is a sub-martingale.
Note that this is a stronger version of Theorem 13.2.3 as FTn ⊂ Fn for
all n ≥ 1.
Theorem 13.2.10: (Doob’s maximal inequality). Let {Xn , Fn }n≥1 be a
sub-martingale and let Mm = max{X1 , . . . , Xm }, m ∈ N. Then, for any
m ∈ N and x ∈ (0, ∞),
P (Mm > x) ≤
+
+
EXm
EXm
I(Mm > x)
≤
.
x
x
(2.18)
Proof: Fix m ≥ 1, x > 0. Define a random variable S by
/
inf{k : 1 ≤ k ≤ m, Xk > x} on A
S=
m
on Ac
where A = {Xk > x for some 1 ≤ k ≤ m} = {Mm > x}. Then it is easy
to check that S is a stopping time w.r.t. {Fn }n≥1 and S ≤ m. Set T ≡ m.
Then (2.10) holds and hence, by Theorem 13.2.6,
:
;
(XS , FS ), (Xm , Fm )
is a sub-martingale.
#m
#m
Note that A = {Mm > x} = k=1 {Mm > x, S = k} = k=1 {XS > x, S =
k} = {XS > x} ∈ FS . Hence, by Markov’s inequality,
%
%
1
1
P (A) = P (XS > x) ≤
XS dP ≤
Xm dP
x {XS >x}
x A
%
+
EXm
1
+
.
Xm
dP ≤
≤
x A
x
!
Remark 13.2.4: An alternative proof of (2.18) is as follows. Let A1 =
{X1 > x}, Ak = {X1 ≤ x, X2 ≤ x, . . . , Xk−1 ≤ x, Xk > x} for k =
#k
2, . . . , m. Then i=1 Ai = A ≡ {Mm > x} and Ak ∈ Fk for all k. Now for
x > 0,
m
m
)
1)
P (Mm > x) =
P (Ak ) ≤
E(Xk IAk ).
x
k=1
k=1
13.2 Stopping times and optional stopping theorems
413
By the sub-martingale property of {Xn , Fn }n≥1 ,
E(Xk IAk ) ≤ E(Xm IAk )
for k ≤ m.
Thus
.
m
)
1
E Xm
IAk
x
P (Mm > x) ≤
k=1
1
E(Xm IA )
x
1 ! + " 1 ! +"
E Xm IA ≤ E Xm .
x
x
≤
≤
Theorem 13.2.11: (Doob’s Lp -maximal inequality for sub-martingales).
Let {Xn , Fn }n≥1 be a sub-martingale and let Mn = max{Xj : 1 ≤ j ≤ n}.
Then, for any p ∈ (1, ∞),
E(Mn+ )p ≤
-
p
p−1
.p
E(Xn+ )p ≤ ∞.
(2.19)
Proof: If E(Xn+ )p = ∞, then (2.19) holds trivially. Let E(Xn+ )p < ∞.
Since for p > 1, φ(x) = (x+ )p is a convex nondecreasing function on R.
Hence, {(Xn+ )p , Fn }n≥1 is a sub-martingale and E(Xj+ )p ≤ E(Xn+ )p < ∞
$n
for all j ≤ n. Since (Mn+ )p ≤ j=1 (Xj+ )p , this implies that E(Mn+ )p < ∞.
For any nonnegative random variable Y and p > 0, by Tonelli’s theorem,
EY p
= pE
-%
Y
- %0 ∞
xp−1 dx
.
= pE
xp−1 I(Y > x)dx
0
% ∞
=
pxp−1 P (Y > x)dx.
.
0
Thus,
E(Mn+ )p
=
=
%
∞
0
%
0
∞
pxp−1 P (Mn+ > x)dx
pxp−1 P (Mn > x)dx.
By Theorem 13.2.10, for x > 0
P (Mn > x) ≤
1
E(Xn+ I(Mn > x)),
x
414
13. Discrete Parameter Martingales
and hence
E(Mn+ )p
≤
=
≤
%
0
∞
pxp−2 E(Xn+ I(Mn > x))dx
p
E(Xn+ Mnp−1 )
(p − 1)
.
p
(E(Xn+ )p )1/p (E(Mn(p−1)q )1/q
p−1
(by Holder’s inequality) where q is the conjugate of p, i.e. q =
Thus,
'
(1/p - p .'
(1/p
+ p
E(Mn )
≤
,
E(Xn+ )p
p−1
proving (2.19).
p
(p−1) .
!
Corollary 13.2.12: Let {Xn , Fn }n≥1 be a martingale and let M̃n =
sup{|Xj | : 1 ≤ j ≤ n}. Then, for p ∈ (1, ∞),
.p '
(
p
E |Xn |p .
E M̃np ≤
p−1
Proof: Since {|Xn |, Fn }n≥1 is a sub-martingale, this follows from Theorem 13.2.11.
!
Theorem 13.2.13:
(Doob’s L log L maximal inequality for submartingales). Let {Xn , Fn }n≥1 be a sub-martingale and Mn = max{Xj :
1 ≤ j ≤ n}. Then
.'
(
e
EMn+ ≤
(2.20)
1 + E(Xn+ log Xn+ ) ,
e−1
where 0 log 0 is interpreted as 0.
Proof: As in the proof of Theorem 13.2.11,
% ∞
P (Mn+ > x)dx
EMn+ =
0
% ∞
1
≤ 1+
E(Xn+ I(Mn+ > x))dx
x
1
= 1 + E(Xn+ log Mn+ ) .
For x > 0, y > 0,
(2.21)
y
.
x
y
x
Now x log x = yφ( y ) where φ(t) ≡ −t log t, t > 0. It can be verified φ(t)
attains its maximum 1e at t = 1e . Thus
x log y = x log x + x log
y
x log y ≤ x log x + .
e
13.2 Stopping times and optional stopping theorems
415
So
EMn+
.
(2.22)
e
If EXn+ log Xn+ = ∞, (2.20) is trivially true. If EXn+ log Xn+ < ∞, then as
in the proof of Theorem 13.2.11, it can be shown that EMn+ < ∞. Hence,
the theorem follows from (2.21) and (2.22).
!
EXn+ log Mn+ ≤ EXn+ log Xn+ +
A special case of Theorem 13.2.10 is the maximal inequality of Kolmogorov (cf. Section 8.3) as shown by the following example.
Example 13.2.4: Let {Yn }n≥1 be a sequence of independent random
variables with EYn = 0 and E|Yn |α < ∞ for all n ≥ 1 for some α ∈ (1, ∞).
Let Sn = Y1 +. . .+Yn , n ≥ 1. Then, φ(x) ≡ |x|α , x > 0 is a convex function,
and hence, by Proposition 13.1.1, Xn ≡ φ(|Sn |), n ≥ 1 is a sub-martingale
w.r.t. Fn = σ1{Y1 , . . . , Yn }2, n ≥ 1. Now, by Theorem 13.2.10, for any
x > 0 and m ≥ 1,
(
'
(
'
= P max Xn > xα
P max |Sn | > x
1≤n≤m
1≤n≤m
+
≤ x EXm
−α
≤ x E|Sm |α .
−α
(2.23)
Kolmogorov’s inequality corresponds to the case where α = 2.
Another application of the optimal stopping theorem yields the following
useful result.
Theorem 13.2.14: (Wald’s lemmas). Let {Yn }n≥1 be a sequence of iid
random variables and let {Fn }n≥1 be a filtration such that
(i) Yn is Fn -measurable and (ii) Fn and σ1{Yk : k ≥ n + 1}2 are
independent for all n ≥ 1.
(2.24)
Also, let T be a finite stopping time w.r.t. {Fn }n≥1 and E|T | < ∞. Let
Sn = Y1 + . . . + Yn , n ≥ 1. Then,
(a) E|Y1 | < ∞ implies
EST = (EY1 )(ET ).
(2.25)
E(ST − T EY1 )2 = Var(Y1 )E(T ).
(2.26)
(b) EY12 < ∞ implies
Proof: W.l.o.g., suppose that EY1 = 0. Then, {Sn , Fn }n≥1 is a martingale.
By Corollary 13.2.7, (2.25) would follow if one showed that (2.10) holds with
$n
$T
Xn = Sn and that E|ST | < ∞. Since |Sn | ≤ i=1 |Yi | ≤ i=1 |Yi | on the
$T
set {T ≥ n}, both these conditions would hold if E( i=1 |Yi |) < ∞. Now,
416
13. Discrete Parameter Martingales
by the MCT and the independence of Yi and {T ≥ i} = {T ≤ i−1}c ∈ Fi−1
for i ≥ 2 and the fact that {T ≥ 1} = Ω, it follows that
E
T
)
i=1
|Yi | =
=
E
-)
∞
i=1
∞
)
i=1
.
|Yi |I(i ≤ T )
E|Yi |I(i ≤ T )
= E|Y1 |
∞
)
i=1
P (T ≥ i)
= E|Y1 |E|T | < ∞.
(2.27)
This proves (a).
To prove (b), set σ 2 = Var(Y1 ) and note that EY1 = 0 ⇒ {Sn2 −
nσ 2 , Fn }n≥1 is a martingale. Let Tn = T ∧ n, n ≥ 1. Then, Tn is a bounded
stopping time w.r.t. {Fn }n≥1 and hence, by Theorem 13.2.6,
E[ST2n − (ETn )σ 2 ] = E(S12 − σ 2 ) = 0 for all n ≥ 1.
(2.28)
Thus, (2.26) holds with T replaced by Tn . Since T is a finite stopping time,
Tn ↑ T < ∞ w.p. 1 and therefore, STn → ST as n → ∞, w.p. 1. Now
applying Fatou’s lemma and the MCT, from (2.28), one gets
EST2 ≤ lim inf EST2n = lim inf (ETn )σ 2 = (ET )σ 2 .
n→∞
n→∞
(2.29)
Also, note that for any n ≥ 1
E(ST2 − ST2n )
=
=
E(ST2 − Sn2 )I(T > n)
E[(ST − Sn )2 + 2Sn (ST − Sn )]I(T > n)
≥ 2ESn (ST − Sn )I(T > n)
= 2ESn (ST1n − Sn )
=
2E[Sn · E{(ST1n − Sn )|Fn }],
(2.30)
where T1n = max{T, n}. Since ET1n ≤ ET + n < ∞, and {T1n > k} =
{T > k} for all k > n, the conditions of Theorem 13.2.6 hold with Xn =
Sn , S = n and T = T1n . Hence, E(ST1n − Sn |Fn ) = 0 a.s. and by (2.30),
EST2 ≥ EST2n for all n ≥ 1. Now letting n → ∞ and using (2.28), one gets
!
EST2 ≥ (ET )σ 2 , as in (2.29). This completes the proof of (b).
This section is concluded with the statement of an inequality relating the
pth moment of a martingale to the (p/2)th moment of its squared variation.
Theorem 13.2.15: (Burkholder’s inequality). Let {Xn , Fn }n≥1 be a martingale sequence. Let ξj = Xj − Xj−1 , α ≥ 1, with X0 = 0. Then for any
13.3 Martingale convergence theorems
417
p ∈ [2, ∞), there exist positive constants Ap and Bp such that
E|Xn |p ≤ Ap E
and
-)
n
ξi2
i=1
.p/2
/ -)
0
.
n
n
" p/2 )
!
E ξi2 |Fi−1
+
E|ξi |p .
E|Xn |p ≤ Bp E
i=1
i=1
For a proof, see Chow and Teicher (1997).
13.3 Martingale convergence theorems
The martingale (or sub- or super-martingale) property of a sequence of
random variables {Xn }n≥1 implies, under some mild additional conditions,
a remarkable regularity, namely, that {Xn }n≥1 converges w.p. 1 as n →
∞. For example, any nonnegative super-martingale converges w.p. 1. Also
any sub-martingale {Xn }n≥1 for which {E|Xn |}n≥1 is bounded converges
w.p. 1. Further, if {E|Xn |p }n≥1 is bounded for some p ∈ (1, ∞), then Xn
converges w.p. 1 and in Lp as well.
The proof of these assertions depend crucially on an ingenious inequality
due to Doob. Recall that one way to prove that a sequence of real numbers
{xn }n≥1 converges as n → ∞ is to show that it does not oscillate too much
as n → ∞. That is, for all a < b, the number of times the sequence goes
from below a to above b is finite. This number is referred to as the number
of upcrossings from a to b. Doob’s upcrossing lemma (see Theorem 13.3.1
below) shows that for a sub-martingale, the mean of the upcrossings can be
bounded above. First, a formal definition of upcrossings of a given sequence
{xj : 1 ≤ j ≤ n} of real numbers from level a to level b with a < b is given.
Let
N1
N2
=
=
min{j : 1 ≤ j ≤ n, xj ≤ a}
min{j : N1 < j ≤ n, xj ≥ b}
and, define recursively,
N2k−1
N2k
=
=
min{j : N2k−2 < j ≤ n, xj ≤ a}
min{j : N2k−1 < j ≤ n, xj ≥ b}.
If any of these sets on the right side is empty, all subsequent ones will be
empty as well and the corresponding
: Nk ’s will ;not be well defined. If N1 or
N2 is not well defined, then set U {xj }nj=1 ; a, b , the number of upcrossings
of the interval (a, b) by {xj }j=1 equal to zero. Otherwise let N, be the last
418
13. Discrete Parameter Martingales
:
;
one that is well defined. Set U {xj }nj=1 ; a, b =
is odd.
,
2
if 2 is even and
,−1
2
if 2
Theorem 13.3.1: (Doob’s upcrossing lemma). Let {Xj , Fj }nj=1 be a sub:
;
martingale and let a < b be real numbers. Let Un ≡ U {Xj }nj=1 ; a, b .
Then
EXn+ + |a|
E(Xn − a)+ − E(X1 − a)+
≤
.
(3.1)
EUn ≤
(b − a)
(b − a)
Proof: Consider first the special case when Xj ≥ 0 w.p. 1 for all j ≥ 1
and a = 0. Let Ñ0 = 1. Let


 Nj if j = 2k, k ≤ Un or if
j = 2k − 1, k ≤ Un ,
Ñj =

 n
otherwise.
If j is odd and j + 1 ≤ 2Un , then
XÑj+1 ≥ b > 0.
If j is odd and j + 1 ≥ 2Un + 2, then
XÑj+1 = Xn = XÑj .
$
Thus j odd (XÑj+1 − XÑj ) ≥ bUn . It is easy to check that {Ñj }nj=1 are
stopping times. By Theorem 13.2.6,
E(XÑj+1 − XÑj ) ≥ 0
for j = 1, 2, . . . , n.
Thus,
E(Xn − X1 )
=
E
- n−1
)
j=0
≥
(XÑj+1 − XÑj )
bEUn + E
≥ bEUn .
' )
j
even
.
(XÑj+1 − XÑj )
(
(3.2)
Hence, both inequalities of (3.1) hold for the special case.
Now for the general case, let Yj ≡ (Xj − a)+ ,: 1 ≤ j ≤ n. Then
;
{Yj , Fj }nj=1 is a nonnegative sub-martingale and Un {Yj }nj=1 , 0, b − a ≡
:
;
Un {Xj }nj=1 , a, b . Thus, from (3.2)
EUn
≤
=
E(Yn − Y1 )
(b − a)
E((Xn − a)+ ) − E((X1 − a)+ )
,
(b − a)
13.3 Martingale convergence theorems
419
proving the first inequality of (3.1). The second inequality follows by noting
!
that (x − a)+ ≤ x+ + |a| for any x, a ∈ R.
The first convergence theorem is an easy consequence of the above theorem.
Theorem 13.3.2: Let {Xn , Fn }n≥1 be a sub-martingale such that
sup EXn+ < ∞.
n≥1
Then {Xn }n≥1 converges to a finite limit X∞ w.p. 1 and E|X∞ | < ∞.
Proof: Let
A = {ω : lim inf Xn < lim sup Xn },
n→∞
n→∞
and for a < b, let
A(a, b) = {ω : lim inf Xn < a < b < lim sup Xn }.
n→∞
n→∞
Then, A = ∪A(a, b) where the union is taken over all rationals a, b such
that a < b. To establish convergence of {Xn }n≥1 it suffices to show that
P (A(a, b)) =
: 0 for each a;< b, as this implies P (A) = 0. Fix a < b and
let Un = U {Xj }nj=1 ; a, b . For ω ∈ A(a, b), Un → ∞ as n → ∞. On the
other hand, by the upcrossing lemma
EUn ≤
EXn+ + |a|
(b − a)
and by hypothesis, supn≥1 EXn+ < ∞, implying that
sup EUn < ∞.
n≥1
C
B
By the MCT, E limn→∞ Un = limn→∞ EUn , and hence
lim Un < ∞ w.p. 1.
n→∞
Thus, P (A(a, b)) = 0 for all a < b, and hence limn→∞ Xn = X∞ exists
w.p. 1. By Fatou’s lemma
E|X∞ | ≤ lim E|Xn | ≤ sup E|Xn | .
n→∞
n≥1
But E|Xn | = 2E(Xn+ ) − EXn ≤ 2EXn+ − EX1 , as {Xn , Fn }n≥1 a
sub-martingale implies EXn ≥ EX1 . Thus, supn≥1 EXn+ < ∞ implies
!
supn≥1 E|Xn | < ∞. So, E|X∞ | < ∞ and hence |X∞ | < ∞ w.p. 1.
Corollary 13.3.3: Let {Xn , Fn )n≥1 be a nonnegative super-martingale.
Then {Xn }n≥1 converges to a finite limit w.p. 1.
420
13. Discrete Parameter Martingales
Proof: Since {−Xn , Fn }n≥1 is a nonpositive sub-martingale, supn≥1
E(−Xn )+ = 0 < ∞. By Theorem 13.3.2, {−Xn }n≥1 converges to a finite limit w.p. 1.
!
Corollary 13.3.4: Every nonnegative martingale converges w.p. 1.
A natural question is that if a sub-martingale converges w.p. 1 to a finite
limit, does it do so in L1 or in Lp for p > 1. It turns out that if a submartingale is Lp bounded for some p > 1, then it converges in Lp . But this
is false for p = 1 as the following examples show.
Example 13.3.1: (Gambler’s ruin problem).
Let {Sn }n≥1 be the simple
$n
ξ
,
n
≥ 1, where {ξn }n≥1 is a
symmetric random walk, i.e., Sn =
i
i=1
sequence of iid random variables with P (ξ1 = 1) = 12 = P (ξ1 = −1). Let
N = inf{n : n ≥ 1, Sn = 1}.
As noted earlier, N is a finite stopping time and that {Sn }n≥1 is a martingale. Let Xn = SN ∧n , n ≥ 1. Then by the optional sampling theorem,
{Xn }n≥1 is a martingale. Clearly, limn→∞ Xn ≡ X∞ = SN = 1 exists w.p.
1. But EXn ≡ 0 while EX∞ = 1 and so Xn does not converge to X∞ in
L1 .
Example 13.3.2: Suppose that {ξn }n≥1 is a sequence of iid nonnegative
random variables with Eξ1 = 1. Let Xn = Πni=1 ξi , n ≥ 1. Then {Xn }n≥1
is a nonnegative martingale and hence converges w.p. 1 to X∞ , say. If
P (ξ1 = 1) < 1, it can be shown that X∞ = 0 w.p. 1. Thus, Xn " X∞ in
L1 . In particular, {Xn }n≥1 is not UI (Problem 13.19).
Example 13.3.3: If {Zn }n≥0 is a$branching process with offspring dis∞
tribution {pj }j≥0 and mean m = j=1 jpj then Xn ≡ Zn /mn (cf. 1.9)
exists w.p. 1. It is
is a nonnegative martingale and hence limn Xn = X∞ $
∞
known that Xn converges to X∞ in L1 iff m > 1 and j=1 j log pj < ∞
(cf. Chapter 18). See also Athreya and Ney (2004).
Theorem 13.3.5: Let {Xn , Fn }n≥1 be a sub-martingale. Then the following are equivalent:
(i) There exists a random variable X∞ in L1 such that Xn → X∞ in L1 .
(ii) {Xn }n≥1 is uniformly integrable.
Proof: Clearly, (i) ⇒ (ii) for any sequence of integrable random variables
{Xn }n≥1 . Conversely, if (ii) holds, then {E|Xn |}n≥1 is bounded and hence
by Theorem 13.3.2, Xn → X∞ w.p. 1 and by uniform integrability, Xn →
!
X∞ in L1 , i.e., (i) holds.
13.3 Martingale convergence theorems
421
Remark 13.3.1: Let (ii) of Theorem 13.3.5 hold. For any A ∈ Fn and
m > n, by the sub-martingale property
E(Xn IA ) ≤ E(Xm IA ).
By uniform integrability, for any A ∈ F,
EXn IA → EX∞ IA
as
n → ∞.
This implies
# that {Xn , Fn }n≥1 ∪ {X∞ , F∞ } is a sub-martingale, where
F∞ = σ1 n≥1 Fn 2. That is, the sub-martingale is closable at right. Further,
EXn → EX∞ . Conversely, it can be shown that if there exists a random
variable X∞ , measurable w.r.t. F∞ , such that
(a) E|X∞ | < ∞,
(b) {Xn , Fn }n≥1 ∪ {X∞ , F∞ } is a sub-martingale, and
(c) EXn → EX∞ ,
then by (a) and (b), {Xn }n≥1 is uniformly integrable and (i) of Theorem
13.3.5 holds.
Corollary 13.3.6: If {Xn , Fn }n≥1 is a martingale, then it is closable at
right iff {Xn }n≥1 is uniformly integrable iff Xn converges in L1 .
This follows from the previous remark since for a martingale, EXn is
constant for 1 ≤ n ≤ ∞.
Remark 13.3.2: A sufficient condition for a sequence {Xn }∞
n≥1 of
random variables to be uniformly integrable is that there exists a random variable M such that EM < ∞ and |Xn | ≤ M w.p. 1 for all
n ≥ 1. Suppose that {Xn }n≥1 is a nonnegative sub-martingale and
M = supn≥1 Xn = limn→∞ Mn where Mn = sup1≤j≤n Xj . By the MCT,
EM = limn→∞ EMn . But by Doob’s L log L maximal inequality (Theorem
13.2.13),
"9
!
e 8
1 + E Xn (log Xn )+ ,
EMn ≤
e−1
for all n ≥ 1. Thus, if {Xn }n≥1 is a nonnegative sub-martingale and
supn≥1 E(Xn (log Xn )+ ) < ∞, then EM < ∞ and hence {Xn }n≥1 is uniformly integrable and converges w.p. 1 and in L1 . Similarly, if {Xn }n≥1 is
a martingale such that supn≥1 E(|Xn |(log |Xn |)+ ) < ∞, then {Xn }n≥1 is
uniformly integrable.
L1 Convergence of the Doob Martingale
Definition 13.3.1: Let X be a random variable on a probability space
(Ω, F, P ) and {Fn }n≥1 a filtration in F. Let E|X| < ∞ and Xn ≡
422
13. Discrete Parameter Martingales
E(X|Fn ), n ≥ 1. Then {Xn , Fn }n≥1 is called a Doob martingale (cf. Example 13.1.3).
For a Doob martingale, E|Xn | ≤ E|X| and it can be shown that {Xn }n≥1
is uniformly integrable (Problem 13.20). Hence, lim
#n→∞ Xn exists w.p. 1
and in L1 , and equals E(X|F∞ ), where F∞ = σ1 n≥1 Fn 2. This may be
summarized as:
Theorem 13.3.7: Let {Fn }n≥1 be a filtration and let X be an F∞ measurable with E|X| < ∞. Then
E(X|Fn ) → X
w.p. 1 and in
L1 .
Corollary 13.3.8: Let {Fn }n≥1 be a filtration and F∞ = σ1
(i) For any A ∈ F∞ , one has
P (A|Fn ) → IA
#
n≥1
Fn 2.
w.p. 1.
(ii) For any random variable X with E|X| < ∞,
E(X|Fn ) → E(X|F∞ )
w.p. 1.
Proof: Take X = IA for (i) and in Theorem 13.3.7, replace X by E(X|F∞ )
for (ii).
!
Kolmogorov’s zero-one law (Theorem 7.2.4) is an easy consequence of
this. If {ξn }n≥1 are independent random variables and A is a tail event
and Fn ≡ σ1ξj : 1 ≤ j ≤ n2, then P (A|Fn ) = P (A) for each n and hence
P (A) = IA w.p. 1, i.e., P (A) = 0 or 1.
Theorem 13.3.9: (Lp convergence of sub-martingales, p > 1). Let
{Xn , Fn }n≥1 be a nonnegative sub-martingale. Let 1 < p < ∞ and
supn≥1 E|Xn |p < ∞. Then limn→∞ Xn = X∞ exists w.p. 1 and in Lp ,
and {(Xn , Fn )}n≥1 ∪ {X∞ , F∞ } is a Lp -bounded sub-martingale.
Proof: By Doob’s maximal Lp inequality (Theorem 13.2.11), for any n ≥
1,
.p
.p
p
p
p
p
p
EXn ≤
sup EXm
,
(3.3)
EMn ≤
p−1
p − 1 m≥1
where Mn = max{Xj : 1 ≤ j ≤ n}. Let M = lim Mn . Then (3.3) yields
n→∞
p
EM < ∞ .
This makes {|Xn |p }n≥1 uniformly integrable. Also supn≥1 E|Xn |p < ∞ and
p > 1 ⇒ supn E|Xn | < ∞ and hence, limn→∞ Xn = X∞ exists w.p. 1 as a
13.3 Martingale convergence theorems
423
finite limit. The uniform integrability of {|Xn |p }n≥p implies Lp convergence
(cf. Problem 2.36). The closability also follows as in Remark 13.3.1.
!
Corollary 13.3.10: Let {Xn , Fn }n≥1 be a martingale. Let 1 < p < ∞
and supn≥1 E|Xn |p < ∞. Then the conclusions of Theorem 13.3.9 hold.
Proof: Since {Yn ≡ |Xn |, Fn }n≥1 is a nonnegative sub-martingale, Theorem 13.3.9 applies.
!
Reversed Martingales
Definition 13.3.2: Let {Xn , Fn }n≤−1 be an adapted family with
(Ω, F, P ) as the underlying probability space, i.e., for n < m, Fn ⊂ Fm ⊂ F
and Xn is Fn -measurable for each n ≤ −1. Such a sequence is called a reversed martingale if
(i) E|Xn | < ∞ for all n ≤ −1,
(ii) E(Xn+1 |Fn ) = Xn for all n ≤ −1.
The definitions of reversed sub- and super-martingales are similar.
Reversed martingales are well behaved since they are closed at right.
Theorem 13.3.11: Let {Xn , Fn }n≤−1 be a reversed martingale. Then
(a)
lim Xn = X−∞ exists w.p. 1 and in L1 ,
n→−∞
(b) X−∞ = E(X−1 |F−∞ ), where F−∞ ≡
,
n≤−1
Fn .
Proof: Fix a < b. For n ≤ −1, let Un be the number of (a, b) upcrossings of
{Xj : n ≤ j ≤ −1}. Then by Doob’s upcrossing lemma (Theorem 13.3.1),
EUn ≤
E(X1 − a)+
.
(b − a)
Let U = lim Un . Letting n → −∞, by the MCT, it follows that
n→−∞
EU < ∞.
Thus, U < ∞ w.p. 1. This being true for every a < b, one may conclude as in Theorem 13.3.2 that P (limn→−∞ Xn > limn→−∞ Xn ) = 0. So
limn→−∞ Xn = X−∞ exists w.p. 1. Also, by Jensen’s inequality, {Xn }n≤−1
is uniformly integrable. So Xn → X−∞ in L1 , proving (a).
To prove (b), note that for any A ∈ F−∞ , by uniform integrability,
%
%
X−∞ dP =
lim
X−n dP
n→−∞ A
A
%
=
X−1 dP, by the martingale property.
A
424
13. Discrete Parameter Martingales
!
Corollary 13.3.12: (The Strong Law of Large Numbers for iid random
variables). Let$
{ξn }n≥1 be a sequence of iid random variables with E|ξ1 | <
n
∞. Then, n−1 i=1 ξi → Eξ1 as n → ∞, w.p. 1.
Proof: For k ≥ 1, let Sk = ξ1 + · · · + ξk and
F−k = σ1{Sk , ξk+1 , ξk+2 , . . .}2.
Let Xn ≡ E(ξ1 |Fn )n≤−1 . By the independence of {ξi }i≥1 , for any n ≤ −1,
with k = −n,
Xn = E(ξ1 |σ1Sk 2).
Also, by symmetry, for any k ≥ 1,
E(ξ1 |σ1Sk 2) = E(ξj |σ1Sk 2) for 1 ≤ j ≤ k.
$
k
Thus, Xn = k1 j=1 E(ξj |σ1Sk 2) = Skk , for all k = −n ≥ 1. It is easy
to check that {Xn , Fn }n≤−1 is a reversed martingale and so by Theorem
13.3.11,
Sk
lim Xn = lim
exists w.p. 1 and in L1 .
n→−∞
k→∞ k
By Kolmogorov’s zero-one law, limk→∞ Skk is a tail random variable, and
!
so a constant, which by L1 convergence must equal Eξ1 .
13.4 Applications of martingale methods
13.4.1 Supercritical branching processes
Recall Example 13.1.5 on branching processes. Assume that it is supercritical, i.e., µ = Eξ11 > 1 and that σ 2 = Var(ξ11 ) < ∞.
Proposition 13.4.1: Let Xn = µ−n Zn be the martingale defined in (1.9).
Then, {Xn }n≥1 is an L2 -bounded martingale.
Proof: Let vn = Var(Xn ), n ≥ 1. Then
Var(E(Xn+1 |Fn )) + E(Var(Xn+1 |Fn ))
E(Zn σ 2 )
= Var(Xn ) + 2(n+1)
µ
= vn + σ 2 µ−2 µ−n , n ≥ 1.
$n
Thus, vn+1 = σ 2 µ−2 j=1 µ−j . Since µ > 1, {vn }n≥1 is bounded. Now
!
since EXn ≡ 1, {Xn }n≥1 is L2 -bounded.
vn+1
=
A direct consequence of Proposition 13.4.1 and Theorem 13.3.8 is that
limn→∞ Xn = X∞ exists w.p. 1 and in mean-square.
13.4 Applications of martingale methods
13.4.2
425
Investment sequences
Let Xn be the value of a portfolio at (the end of) the nth period. Suppose
the returns on the investment are random and satisfy
E(Xn+1 |X0 , X1 , . . . , Xn ) ≤ ρn+1 Xn , n ≥ 1
where ρn+1 is a strictly positive random variable that is Fn -measurable,
where Fn ≡ σ1X1 , X2 , . . . , Xn 2. Let ρ1 ≡ 1 and
Xn
Zn = <n
j=1
ρj
, n≥1.
Then, {Zn , Fn }n≥1 is a nonnegative super-martingale and hence, it converges w.p. 1 to a limit Z, with EZ ≤ EZ1<= EX1 . This implies that
n
{Xn }n≥1 converges w.p. 1 on the event A ≡ { j=1 ρj converges}.
13.4.3
A conditional Borel-Cantelli lemma
Let {An }n≥1 be a sequence of events in a probability space (Ω, F, P ) and
let {Fn }n≥1 be a filtration in F. Let An ∈ Fn for all n ≥ 1 and pn =
P (An |Fn−1 ), n $
≥ 1, where F0 is the trivial σ-algebra ≡ {Ω, ∅}. Let δn =
n
IAn , and Xn = j=1 (δj − pj ), n ≥ 1. Then
{Xn , Fn }n≥1
$n
is a martingale. Let Vj = Var(δj |Fj−1 ) = pj (1 − pj ), j ≥ 1, sn = j=1 Vj
and s̃n = max{s
$nn , 1}, n ≥ 1. Since Vn is Fn−1 -measurable, so are sn and
s̃n . Let Yn = j=1 (δj − pj )/s̃j , n ≥ 1. Then, {Yn , Fn }n≥1 is a martingale.
Clearly, EYn = 0 and by the martingale property
.
n
)
δ j − pj
2
Var
EYn = Var(Yn ) =
s̃j
j=1
- .
n
)
Vj
=
E 2
s̃j
j=1
-)
.
n
Vj
= E
.
s̃2
j=1 j
But V1 = s1 and Vj = sj − sj−1 for j ≥ 2 and so
∞
)
Vj
s̃2
j=1 j
≤
%
1
∞
Vj
s̃2j
1
dt = 1.
t2
≤
& sj
1
dt
sj−1 t2
and hence,
So supn≥1 EYn2 ≤ 1. Thus, {Yn }n≥1 converges to some Y w.p. 1 and in L2 .
426
13. Discrete Parameter Martingales
If sn → ∞, then by Kronecker’s lemma (cf. Lemma 8.4.2).
But
$n
j=1
n
1 )
(δj − pj ) → 0
sn j=1
- $n
. $n
j=1 δj
j=1 pj
⇒ $n
−1
→0.
sn
j=1 pj
pj ≥ sn and hence
$n
$nj=1
δj
j=1 pj
→1
w.p. 1 on the event B ≡ {sn → ∞} .
(4.1)
Next it is claimed that on B c ≡ {limn→∞ sn < ∞}, limn→∞ Xn = X
exists and is finite w.p. 1. To prove the claim, fix 0 < t < ∞. Let Nt =
inf{n : sn+1 > t}. Since sn+1 is Fn -measurable, Nt is a stopping time and
by the optional stopping theorem I (Theorem 13.2.3), {Zn ≡ XNt ∧n }n≥1
is a martingale. By Doob’s L2 -maximal inequality,
(
'
E sup Zj2 ≤ 4E(Zn2 ).
1≤j≤n
Also it is easy to verify that {Xn2 − s2n }n≥1 is a martingale and by the
optional sampling theorem (Theorem 13.2.4),
2
E(Xn∧N
− s2n∧Nt ) = 0.
t
Thus, EZn2 = Es2n∧Nt ≤ t and hence for each t, limn→∞ Zn exists w.p.
1 and in L2 . Thus, limn→∞ XNt ∧n exists w.p. 1 for each t. But, on B c ,
Nt = ∞ for all large t. So limn→∞ Xn = X exists w.p. 1 on B c . This
proves the claim.
$∞
It follows that on B c ∩ { j=1 pj = ∞},
$n
$nj=1
j=1
δj
pj
Xn
− 1 = $n
j=1
pj
→ 0.
$∞
Also, since B ≡ {sn → ∞} is a subset of { j=1 pj = ∞} and it has been
shown in (4.1) that
$n
δj
$nj=1
→ 1 w.p. 1 on B,
p
j=1 j
it follows that
$n
δj
$nj=1
p
j=1 j
→1
w.p. 1 on
/)
∞
j=1
0
pj = ∞ .
13.4 Applications of martingale methods
427
Summarizing the above, one gets the following result.
Theorem 13.4.2: (A conditional Borel-Cantelli lemma). Let {An }n≥1
be a sequence of events in a probability space (Ω, F, P ) and {Fn }n≥1 be
a filtration such that An ∈ Fn , for all n ≥ 1.$Let pn = P (An |Fn−1 ) for
∞
n ≥ 2, p1 = P (A1 ). Then on the event B0 ≡ { j=1 pj = ∞},
$n
$j=1
n
IAj
j=1
pj
→1
w.p. 1,
and in particular, infinitely many An ’s happen w.p. 1 on B0 .
13.4.4
Decomposition of probability measures
The almost sure convergence of a nonnegative martingale yields the following theorem on the Lebesgue decomposition of two probability measures
on a given measurable space.
Theorem 13.4.3: Let (Ω, F) be a measurable space and {Fn }n≥1 be a
filtration with Fn ⊂ F for all n ≥ 1. Let P and Q be two probability
measures on (Ω, F) such that for each n ≥ 1, Pn ≡ the restriction of P
to Fn is absolutely continuous w.r.t. Qn ≡ on the restriction
# of Q to Fn ,
dPn
.
Let
F
≡
σ1
with the Radon-Nikodym derivative Xn = dQ
∞
n≥1 Fn 2 and
n
X ≡ limn→∞ Xn . Then for any A ∈ F∞ ,
%
XdQ + P (A ∩ (X = ∞))
P (A) =
A
≡ Pa (A) + Ps (A), say,
(4.2)
and Pa ; Q and Ps ⊥ Q.
Proof: For 1 ≤ k ≤ n, let Mk,n = maxk≤j≤n Xj . Let Mk =
supn≥k Mk,n = limn→∞ Mk,n . Then X ≡ limn→∞ Xn = limk→∞ Mk . Fix
1 ≤ k0 , N < ∞ and A ∈ Fk0 . Then for n ≥ k ≥ k0 , Bk,n ≡ A ∩ {Mk,n ≤
N } ∈ Fn and hence
%
P (Bk,n ) = Xn IBk,n dQ.
(4.3)
As n → ∞, Mk,n ↑ Mk and so IBk,n ↓ IBk , where Bk ≡ A ∩ {Mk ≤ N } =
+
n≥k Bk,n . Also, since {Xn , Fn }n≥1 is a nonnegative martingale under the
probability measure Q, limn→∞ Xn exists w.p. 1 and hence coincides with
X. Thus, by the bounded convergence theorem applied to (4.3),
%
P (Bk ) = XIBk dQ.
428
13. Discrete Parameter Martingales
Now let N → ∞ and use the MCT to conclude that
%
P (A ∩ (Mk < ∞)) = XI{Mk <∞} dQ.
Since {Mk < ∞} ↑ {X < ∞}, another application of the MCT yields
%
P (A ∩ {X < ∞}) = XI{X<∞} dQ.
(4.4)
Since Q(X < ∞) = 1 and
P (A) = P (A ∩ (X < ∞)) + P (A ∩ (X = ∞)),
#
(4.2) is proved for all A ∈ Fk0 and hence, also for all A#∈ k≥1 Fk . Since
#
k≥1 Fk is an algebra, (4.2) holds for all A ∈ F∞ ≡ σ1 k≥1 Fk 2.
To prove the second part, simply note that Q(A) = 0 implies Pa (A) = 0
and Q(X = ∞) = 0.
!
Remark 13.4.1: The right side of (4.2) provides the Lebesgue decomposition of P w.r.t. Q.
Corollary 13.4.4:
(i) P ; Q iff EQ X = 1 iff P (X = ∞) = 0.
(ii) P ⊥ Q iff Q(X = 0) = 1 iff P (X = ∞) = 1.
Proof: Follows easily from (4.2).
!
For some applications of Corollary 13.4.4 to branching processes see
Athreya (2000).
Corollary 13.4.5: Let {Xn , Fn }n≥1 be a nonnegative martingale
# on a
probability space (Ω, F, Q), with EQ X1 = 1. Let P be defined on n≥1 Fn
by P (A) = EQ Xn IA if A ∈ Fn . Then P is well defined. Suppose further
that P admits
an extension to a probability measure on (Ω, F∞ ) where
#
F∞ ≡ σ( n≥1 Fn ). Then (4.2) holds.
Xm IA = EQ Xn IA
Proof: If A ∈ Fn , then A ∈ Fm for any m > n. But EQ#
by the martingale property, and so P is well defined on n≥1 Fn . The rest
of the corollary follows from the theorem.
!
Remark 13.4.2: It can be shown that if Fn ≡ σ1Xj : 1 ≤ j ≤ n2, then P
does admit an extension to F∞ (cf. see Athreya (2000)).
Corollary 13.4.6: Let (Ω, F) be a measurable space. Let for each n ≥ 1,
An ≡ {An1 , An2 , . . . , Ankn } ⊂ F be a partition of Ω. Let An ⊂ σ1An+1 2
for all n ≥ 1. Let P and Q be two probability measures on
# (Ω, F). Let
Q be such that Q(Ani ) > 0 for all n and i. Let F = σ1 n≥1 An 2. Let
13.4 Applications of martingale methods
429
$kn P (Ani )
Xn ≡ i=1
Q(Ani ) IAni . Then {Xn , Fn }n≥1 is a martingale on (Ω, F, Q)
and P satisfies the decomposition (4.2).
The proof of Corollary 13.4.6 is left as an exercise (Problem 13.22).
Remark 13.4.3: This yields the Lebesgue decomposition of P w.r.t. Q
when F is countably generated, i.e., when there exists a countable collection
k
A of subsets
! of Ω" such that F = σ1A2. In particular, this holds if Ω = R
and F ≡ B (Rk ) for k ∈ N.
13.4.5
Kakutani’s theorem
Theorem 13.4.7: !(Kakutani’s "theorem). Let P and Q be the probability distributions on R∞ , B(R∞ ) of the sequences of independent random
variables {Xj }j≥1 and {Yj }j≥1 , respectively. Let for each j ≥ 1, the distribution of Xj be dominated by that of Yj . Then
either
P ;Q
or
P ⊥ Q.
(4.5)
Proof: Let fj be the density of λj w.r.t. µj where λj (·) = P (Xj ∈ ·) and
µj (·) = Q(Yj ∈ ·). Let Ω = R∞ , F = (B(R))∞ . Then P = Πj≥1 λj , Q =
Πj≥1 µj . Let ξn (ω) ≡ ω(n), the nth co-ordinate of ω = (ω1 , ω2 , . . .) ∈ Ω,
and Fn ≡ σ1ξj : 1 ≤ j ≤ n2. Also let Pn be the restriction of P to Fn and
Qn be that of Q to Fn . Then Pn ; Qn with probability density
Ln =
n
K
dPn
=
fj (ξj ).
dQn
j=1
Since {limn→∞ Ln < ∞} is a tail event, by the independence of {ξj }j≥1
under P and the Kolmogorov’s zero-one law, P (limn→∞ Ln < ∞) = 0 or
1. Now, by Corollary 13.4.4, (4.5) follows.
!
Remark
L 13.4.4: It can be shown that P ; Q or P ⊥ Q according as
<
∞
E
fj (Yj ) > 0 or = 0. For a proof, see Durrett (2004).
j=1
Remark 13.4.5: If {Xj }j≥1 are iid and {Yj }j≥1 are√also iid, then P = Q
or P ⊥ Q. This is√because fj = f√1 for all j and EQ f1 ≤ (EQ f1 )1/2 ≤ 1
< 1 or EQ f1√= 1. In the latter case f1 ≡ 1, since
and √
so either EQ f1 √
EQ ( f1 )2 = 1 = EQ ( f1 ) ⇒ VarQ ( f1 ) = 0.
Remark 13.4.6: The above result can be extended to Markov chains.
Let P and Q be two irreducible stochastic matrices on a countable set
and let Q be positive recurrent. Also, let Px0 denote the distribution of a
Markov chain {Xn }n≥1 starting at x0 and with transition probability P ,
and similarly, let Qy0 denote the distribution of a Markov chain {Yn }n≥1
430
13. Discrete Parameter Martingales
starting at y0 and with transition probability Q. Then
either
Px0 ⊥ Qy0
or P = Q.
(4.6)
The proof of this is left as an exercise (Problem 13.23).
13.4.6
de Finetti’s theorem
Let {Xn }n≥1 be a sequence of exchangeable random variables on a
probability space (Ω, F, P ), i.e., for each n ≥ 1, the distribution of
(Xσ(1) , Xσ(2) , . . . , Xσ(n) ) is the same as that of (X1 , X2 , . . . , Xn ) where
(σ(1), σ(2), . . . , σ(n)) is a permutation of (1, 2, . . . , n). Then there is a σalgebra G ⊂ F such that for each n ≥ 1,
P (Xi ∈ Bi , i = 1, 2, . . . , n | G) =
n
K
i=1
P (Xi ∈ Bi | G)
(4.7)
for all B1 , . . . , Bn ∈ B(R).
This is known as de Finetti’s theorem. For a proof, see Durrett (2004)
and Chow and Teicher (1997). This theorem says that conditioned on G
the {Xi }i≥1 are iid random variables with distribution P (X1 ∈ · | G). The
converse to this result, i.e., if for some σ-algebra G ⊂ F (4.7) holds, then
the sequence {Xi }i≥1 is exchangeable is not difficult to verify (Problem
13.26).
13.5 Problems
13.1 Let Ω be a nonempty set and let {Aj }j≥1 be a countable partition of
Ω. For n ≥ 1, let Fn = σ-algebra generated by {Aj }nj=1 .
(a) Show that {Fn }n≥1 is a filtration.
#
(b) Find F∞ = σ1 n≥1 Fn 2.
13.2 Let Ω be a nonempty set. For each n ≥ 1, let πn ≡ {Anj : j =
1, 2, . . . , kn } be a partition of Ω. Suppose that each n and j, Anj is a
union of sets of πn+1 . Let Fn ≡ σ1πn 2 for n ≥ 1.
(a) Show that {Fn }n≥1 is a filtration.
B
"
j
(b) Suppose ∆ = [0, 1)#and πn ≡ { j−1
: j = 1, 2, . . . , 2n }.
2n , 2n
Show that F∞ = σ1 n≥1 Fn 2 is the Borel σ-algebra B([0, 1)).
13.3 Let {(Yn , Fn ) : n ≥ 1} and {(Ỹn , Fn ) : n ≥ 1} be as in Example 13.1.2. Verify that {(Yn , Fn ) : n ≥ 1) is a sub-martingale and
{(Ỹn , F̃n ) : n ≥ 1} is a martingale.
13.5 Problems
431
13.4 Give an example of a random variable T and two filtrations {Fn }n≥1
and {Gn }n≥1 such that T is a stopping time w.r.t. the filtration
{Fn }n≥1 but not w.r.t. {Gn }n≥1 .
13.5 Let T1 and T2 be stopping times w.r.t. a filtration {Fn }n≥1 . Verify
2
that min(T1 , T2 ), max(T1 , T2 ), T1 + T2 , and T1√
are stopping times
w.r.t. {Fn }n≥1 . Give an example to show that T1 and T1 − 1 need
not be stopping times w.r.t. {Fn }n≥1 .
13.6 Let T be a random variable taking values in {1, 2, 3, . . .}. Show that
there is a filtration {Fn }n≥1 w.r.t. which T is a stopping time.
13.7 Let {Fn }n≥1 be a filtration.
(a) Show that T is a stopping time w.r.t. {Fn }n≥1 iff
{T ≤ n} ∈ Fn
for all n ≥ 1 .
(b) Show by an example that if a random variable T satisfies {T ≥
n} ∈ Fn for all n ≥ 1, it need not be a stopping time w.r.t.
{Fn }n≥1 .
(Hint: Consider a T of the form
T = inf{k : k ≥ 1, Xk+1 ∈ A}
and Fn = σ1Xj : j ≤ n2.)
13.8 Show that FT defined in (2.6) is a σ-algebra.
13.9 Let {Xn }n≥1 be a sequence of random variables. Let Gn = σ1{Xj :
1 ≤ j ≤ n}2. Let {Fn }n≥1 be a filtration such that Gn ⊂ Fn for each
n ≥ 1.
(a) Show that if {Xn , Fn }n≥1 is a martingale, then so is
{Xn , Gn }n≥1 .
(b) Show by an example that the converse need not be true.
(c) Let {Xn , Fn }n≥1 be a martingale. Let 1 ≤ k1 < k2 < k3 · · · be
a sequence of integers. Let Yn ≡ Xkn , Hn ≡ Fkn , n ≥ 1. Show
that {Yn , Hn }n≥1 is also a martingale.
13.10 A branching random walk is a branching process and a random walk
associated with it. Individuals reproduce according to a branching
process and the offspring move away from the parent a random distance. If Xn ≡ {xn1 , xn2 , . . . xnZn } denotes the position vector of the
Zn individuals in the nth generation and the individual at location
xni produces ρni offspring, then each of them chooses a new position
by moving a random distance from xni and these are assumed to be
iid. Let ηnij be the random distance moved by the jth offspring of the
432
13. Discrete Parameter Martingales
individual at xni . Then the position vector of the (n + 1)st generation
is given by
3
4
ni
Xn+1 =
{xni + ηnij }ρj=1
, i = 1, 2, . . . , n
;
:
≡
xn+1,k : k = 1, 2, . . . Zn+1 , say,
where
Zn+1
=
=
population size of the (n+1)st generation
Zn
)
ρni .
i=1
Let the offspring distribution be {pk }k≥0 and the jump size distribution be denoted by F (·). Assume that the η’s are real valued and
also that the collection {ρni }i≥1,n≥0 , {ηnij }i≥1,j≥1,n≥0 are all independent with the ρ’s being iid with distribution {pk }k≥0 and the η’s
iid with distribution F . Fix θ ∈ R. For n ≥ 0, let
Zn (θ) ≡
-)
Zn
eθxni
i=1
.
$∞
!
"!
"−n
and Yn (θ) = Zn (θ) ρφ(θ)
where ρ = k=0 kpk , φ(θ) = E(eθη111 ) =
φ(θ) < ∞, 0 < ρ < ∞.
&
eθx dF (x). Assume 0 <
(a) Verify that {Yn (θ)}n≥0 is a martingale w.r.t. an appropriate
filtration {Fn }n≥0 .
(b) Show that
"
!
Var Zn+1 (θ)
=
!
"
Var Zn (θ)ρφ(θ)
"
"!
!
+ EZn (2θ) ρψ(θ) + (φ(θ))2 σ 2 ,
!
"2
where ψ(θ) = φ(2θ) − φ(θ) and σ 2 is the variance of the
distribution {pk }k≥0 .
(c) State a sufficient condition on ρ, σ 2 , ψ(·) and φ(·) and θ for
{Yn (θ)} to be L2 -bounded.
13.11 Let {ηj }j≥1 be adapted to a filtration {F$
j }j≥1 . Let E(ηj |Fj−1 ) = 0
n
and Vj = E(ηj2 |Fj−1 ) for j ≥ 2. Let s2n = j=1 Vj , n ≥ 2.
$n η
(a) Verify that {Yn ≡ j=2 s̃jj , Fn }n≥1 is a martingale, where s̃j =
max(sj , 1).
(
'$
Vj
n
(b) Show that Var(Yn ) = E
2
j=2 s̃j .
&
$∞ V
∞
(c) Show that j=2 s̃2j ≤ 1 t12 dt + 1.
j
13.5 Problems
(d) Conclude that Yn converges w.p. 1 and in L2 .
(e) Now suppose that sn → ∞ w.p. 1. Show that
w.p. 1.
1
sn
(Hint: Use Kronecker’s lemma (cf. Chapter 8).)
$n
j=1
433
ηj → 0
13.12 Let {ξi }i≥1 be iid random variable $
with distribution P (ξ1 = 1) =
n
1
=
P
(ξ
=
−1).
Let
S
=
0,
S
=
1
0
n
i=1 ξi , n ≥ 1. Let −a < 0 < b
2
be integers and T = T−a,b = inf{n : n ≥ 1, Sn = −a or b}. Show,
using Wald’s lemmas (Theorem 13.2.14), that
(a) P (ST ≡ −a) =
b
b+a .
(b) ET = 4ab.
(c) Extend the above arguments to find Var(T ).
(Hint: Consider T ∧ n first and then let n ↑ ∞.)
13.13 Use Problem 13.12 to conclude that for any positive integer b
P (Tb < ∞) = 1,
but ETb = ∞,
where for any integer i,
Ti = inf{n : n ≥ 1, Sn = i}.
(Hint: Use the relation Tb = limi→∞ T−i,b .)
13.14 Let {ξi }i≥1 be iid random variables with distribution
1
<1.
2
' (x
$n
Let S0 = 0, Sn = i=1 ξi , n ≥ 1. Let ψ(x) = pq , x ∈ R where
q = 1 − p.
P (ξi = 1) = p = 1 − P (ξi = −1), 0 < p )=
(a) Show that Xn = ψ(Sn ), n ≥ 0 is a martingale w.r.t. the filtration
Fn = σ1ξ1 , . . . , ξn 2, n ≥ 1, and F0 = {Ω, ∅}.
(b) Let Ta,b = inf{n : n ≥ 1, Sn = −a or b}, for positive integers a
and b. Show that P (T−a,b < ∞) = 1.
(Hint: Use the strong law of large numbers.)
(c) Use (a) to show that for positive integers a, b,
θ ≡ P (T−a < Tb ) =
ψ(b) − 1
,
ψ(b) − ψ(−a)
where for any integer i, Ti = inf{n : n ≥ 1, Sn = i}.
434
13. Discrete Parameter Martingales
(d) Show that ET−a,b =
b−θ(b−a)
(p−q)
(e) Show that if p > q then ET−a = ∞ and ETb =
b
(p−q) .
13.15 Let {Xn }n≥0 be a Markov chain with state space S = {1, 2, 3, . . . , }
and transition probability matrix P = ((pij )). That is, for each n ≥ 1,
P (X0 = i0 , X1 = i, . . . , Xn = in )
= P (X0 = i0 )pi0 i1 . . . pin−1 in
for all i0 , i1 , i2 , . . . , in ∈ S. Let h : S → R and ρ ∈ R be such that
∞
)
j=1
and
∞
)
|h(j)|pij < ∞
h(j)pij = ρh(i)
for all i
for all i.
j=1
(a) Verify that {Xn }n≥1 has the Markov property, namely, for all
n ≥ 0,
P (Xn+1 = in+1 | Xn = in , Xn−1 = in−1 , X0 = i0 )
= P (Xn+1 = in+1 | Xn = in ) = pin in+1 .
(b) Verify that {Yn ≡ h(Xn )ρ−n }n≥0 is a martingale w.r.t. the filtration Fn ≡ σ1X0 , X1 , . . . , Xn 2.
(c) Suppose ρ = 1 and h is bounded below and attains its lower
bound. Suppose also that {Xn }n≥0 is irreducible and recurrent.
That is, P (Xn = j for some n ≥ 1 | X0 = i) = 1 for all i, j.
Show that h is a constant function.
(Hint: Use the optional stopping theorem II.)
13.16 Let {Yj }j≥1 be a sequence of random variables such that P (|Yj | ≤
1) = 1 for all j ≥ 1 and E(Y
$nj |Fj−1 ) = 0 for j ≥ 2 where Fj =
σ1Y1 , Y2 , . . . , Yj 2. Let Xn = j=1 Yj , n ≥ 1 and let τ be a stopping
time w.r.t. {Fn }n≥1 and Eτ < ∞. Show that E|Xτ | < ∞ and EXτ =
EX1 .
13.17 Let θ, {ξj }j≥1 be a sequence of random variables such that E|θ| < ∞
and {ξj }j≥1 are iid with mean zero. For j ≥ 1 let Xj = θ + ξj and
Fj = σ1X1 , X2 , . . . , Xj 2. Show that Yj ≡ E(θ|Fj ) → θ w.p. 1.
(Hint: Use the convergence
$n result for a Doob martingale and the
SLLN to X̄n = θ + n−1 j=1 ξj to show that θ is F∞ -measurable.)
13.5 Problems
435
13.18 Let {Xn , Fn }n≥1 be a martingale sequence such that EXn2 < ∞ for
all n ≥ 1. Let Y1 = X1 , Yj = Xj − Xj−1 , j ≥ 2. Let Vj = E(Yj2 |Fj−1 )
$n
for j ≥ 2 and An = j=2 Vj , n ≥ 2. Verify that
(a) An is Fn−1 measurable and nondecreasing in n.
(b) {Xn2 − An , Fn }n≥2 is a martingale.
(c) Verify that Xn2 = Xn2 − An + An is the Doob decomposition of
the sub-martingale {Xn2 , Fn } (Proposition 13.1.2).
13.19 Show that the random variable X∞ defined in Example 13.3.2 is zero
w.p. 1.
(Hint: Show that E log ξ1 < 0 and use the SLLN.)
13.20 Show that the Doob martingale in Definition 13.3.1 is uniformly integrable.
(Hint: Show that for any λ > 0, λ0 > 0
!
"
!
E |Xn |I(|Xn | > λ) ≤ E |X|I|Xn | > λ
"
"
!
≤ E |X|I(|X| > λ0 ) + λ0 P (|Xn | > λ) . )
13.21 Consider the following urn scheme due to Polyā. Let an urn contain
w0 white and b0 black balls at time n = 0. A ball is drawn from the
urn at random. It is returned to the urn with one more ball of the
color drawn. Repeat this procedure for all n ≥ 1. Let Wn and Bn
denote the number of white and black balls in the urn after n draws.
n
, n ≥ 0. Let Fn = σ1Z0 , Z1 , . . . , Zn 2.
Let Zn = WnW+B
n
(a) Show that {(Zn , Fn )}n≥0 is a martingale.
(b) Conclude that Zn converges w.p. 1 and in L1 to a random variable Z.
(c) Show that for any k ∈ N, limn→∞ EZnk converges and evaluate
the limit. Deduce that Z has Beta (w0 , b0 ) distribution, i.e., its
0 +b0 −1)!
z w0 −1 (1 − z)b0 −1 I[0,1] (z).
pdf fZ (z) ≡ (w(w
0 −1)!(b0 −1)!
(d) Generalize (a) to the case when at the nth stage a random number αn of balls of the color drawn are added where {αn }n≥1 is
any sequence of nonnegative integer valued random variables.
13.22 Prove Corollary 13.4.6.
(Hint: Argue as in Example 13.1.7.)
13.23 Prove the last equation (4.6) of Section 13.4.
(Hint: Show using the strong law for the Q chain that under Q, the
martingale Xn converges
to 0 w.p. 1,
!
" where Xn !is the Radon-Nikodym
"
derivative of Px0 (X0 , . . . , Xn ) ∈ · w.r.t. Qx0 (X0 , . . . , Xn ) ∈ · .)
436
13. Discrete Parameter Martingales
13.24 Let {Fn }n≥0 be a filtration ⊂ F where (Ω, F, P ) is a probability
space. Let {Yn }n≥0 ⊂ L1 (Ω, F, P ). Suppose
Z ≡ sup |Yn | ∈ L1 (Ω, F, P )
and
n≥1
lim Yn ≡ Y
n→∞
exists w.p. 1.
Show that E(Yn |Fn ) → E(Y |F∞ ) w.p. 1.
(Hint: Fix m ≥ 1 and let Vm = supn≥m |Yn − Y |. Show that
limn E(|Yn − Y ||Fn ) ≤ lim E(Vm |Fn ) = E(Vm |F∞ ).
n→∞
Now show that E(Vm |F∞ ) → 0 as m → ∞.)
13.25 Let {Xt , Ft : t ∈ I ≡ Q ∩ (0, 1)} be a martingale, i.e., for all t1 < t2
in I,
E(Xt2 |Ft1 ) = Xt1 .
Show that for each t in I
lim Xs
s↑t,s∈I
and
lim Xs
s↓t,s∈I
both exist w.p. 1 and in L1 and equal Xt w.p. 1.
13.26 Let {Xi }i≥1 be random variables on a probability space (Ω, F, P ).
Suppose for some σ-algebra G ⊂ F (4.7) holds. Show that {Xi }i≥1
are exchangeable.
13.27 Let {Xn }n≥0 , {Yn }n≥0 be martingales in L2 (Ω, F, P ) w.r.t. the same
filtration {Fn }n≥1 . Let X0 = Y0 = 0. Show that
E(Xn Yn ) =
n
)
k=1
E(Xk − Xk−1 )(Yk − Yk−1 ), n ≥ 1
and, in particular,
E(Xn2 ) =
n
)
k=1
E(Xk − Xk−1 )2 .
13.28 Let {Xn , Fn }n≥1 be a martingale in L2 (Ω, F, P ). Suppose 0 ≤ bn ↑ ∞
$n E(Xj −Xj−1 )2
n
such that j=2
< ∞. Show that X
bn → 0 w.p. 1.
b2
j
$n (X −X )
(Hint: Consider the sequence Yn ≡ j=2 j bj j−1 , n ≥ 2. Verify
that {Yn , Fn }n≥2 is a L2 bounded martingale and use Kronecker’s
lemma (cf. Chapter 8).)
13.5 Problems
437
!
"
13.29 Let f ∈ L1 [0, 1], B([0, 1]), m where m(·) is Lebesgue measure on
[0, 1]. Let {Hk (·)}k≥1 be the Haar functions defined by
H1 (t) ≡ 1,
@
H2 (t) ≡
H2n +1 (t)
H2n +j (t)
Let ak ≡
&1
1
−1
0 ≤ t < 12
1
2 ≤ t < 1,
 n/2
0 ≤ t < 2−(n+1)

 2
=
−2n/2 2−(n+1) ≤ t < 2−n , n = 1, 2, . . .


0
otherwise,
'
j − 1(
= H2n +1 t − n , j = 1, 2, . . . , 2n .
2
f (t)Hk (t)dt, k = 1, 2, . . ..
$n
(a) Verify that {Xn (t) ≡ k=1 ak Hk (t)}n≥1 is a martingale w.r.t.
the natural filtration.
0
(b) Show that Xn converges w.p. 1 and in L1 to f .
13.30 Let {Xn }n≥1 be a sequence of nonnegative random variables on some
probability space (Ω, F, P ) such that E(Xn+1 |Fn ) ≤ Xn + Yn where
Fn ≡ σ1X1 , . . . , Xn 2$where {Yn }n≥1 is a sequence of nonnegative
∞
constants such that n=1 Yn < ∞. Show that {Xn }n≥1 converges
w.p. 1.
random variables with
13.31 Let {τj }j≥1 be independent
$∞ exponential
$n λj =
1
<
∞.
Let
T
=
0,
T
=
Eτj , j ≥ 1 such that
0
n
j=1 λ2j
j=1 τj ,
$n
n ≥ 1, sn = j=1 λj . Show that {Xn ≡ Tn − sn }n≥1 converges w.p.
1 and in mean square.
(Hint: Show that {Xn }n≥1 is an L2 -bounded martingale.)
14
Markov Chains and MCMC
14.1 Markov chains: Countable state space
14.1.1
Definition
Let S = {aj : j = 1, 2, . . . , K}, K ≤ ∞ be a finite or countable set.
Let P = ((pij ))K×K be a stochastic matrix, i.e., pij ≥ 0, for every i,
$K
j=1 pij = 1 and µ = {µj : 1 ≤ j ≤ K} be a probability distribution, i.e.,
$K
µj ≥ 0 for all j and j=1 µj = 1.
Definition 14.1.1: A sequence {Xn }∞
n=0 of S-valued random variables on
some probability space (Ω, F, P ) is called a Markov chain with stationary
transition probabilities P = ((pij )), initial distribution µ, and state space
S if
(i) X0 ∼ µ, i.e., P (X0 = aj ) = µj for all j, and
!
"
(ii) P Xn+1 = aj | Xn = ai , Xn−1 = ain−1 , . . . , X0 = ai0 = pij for all
ai , aj , ain−1 , . . . , ai0 ∈ S and n = 0, 1, 2, . . . ,
i.e., the sequence is memoryless. Given Xn , Xn+1 is independent of {Xj :
j ≤ n − 1}. More generally, given the present (Xn ), the past ({Xj : j ≤
n − 1}) and the future ({Xj : j > n}) are stochastically independent
(Problem 14.1).
A few questions arise:
Question 1: Does such a sequence {Xn }∞
n=0 exist for every µ and P , and
440
14. Markov Chains and MCMC
if so, how does one generate them?
The answer is yes. There are two approaches, namely, (i) using Kolmogorov’s consistency theorem and (ii) an iid random iteration scheme.
Question 2: How does one describe the finite time behavior, i.e., the
joint distribution of (X0 , X1 , . . . , Xn ) for any n ∈ N?
One may use the Markov property repeatedly to obtain the joint distribution.
Question 3: What can one say about the long-term behavior? One can
ask questions like:
(a) Does the trajectory n → Xn converge as n → ∞?
(b) Does the distribution of Xn converge as n → ∞?
(c) Do the laws of large numbers hold
$n for a suitable class of functions
f ’s, i.e., do the limits limn→∞ n1 j=1 f (Xj ) exist w.p. 1?
(d) Do stationary distributions exist? (A probability distribution π =
{πi }i∈S is called a stationary distribution for a Markov chain {Xn }n≥0
if X0 has distribution π, then Xn also has distribution π for all n ≥ 1.)
The key to answering these questions are the concepts of communication, irreducibility, aperiodicity, and most importantly recurrence. The main tools
are the laws of large numbers, renewal theory, and coupling.
14.1.2
Examples
Example 14.1.1: (IID sequence). Let {Xn }∞
n=0 be a sequence of iid Svalued random variables with distribution µ = {µj }. Then {Xn }∞
n=0 is a
Markov chain with initial distribution µ and transition probabilities given
by pij = µj for all i, i.e., all rows of P are identical. It is also easy to prove
the converse, i.e., if all rows of P are identical, then {Xn }∞
n=1 are iid and
independent of X0 .
To answer Question 3 in this case, note that P [Xn = j] = µj for all n and
thus Xn converges in distribution. But the trajectories do not converge.
However, the law of large numbers holds and µ is the unique stationary
distribution.
Example 14.1.2: (Random walks). Let S = Z, the set of integers. Let
{#n }n≥1 be iid with distribution {pj }j∈Z , i.e., P [#1 = j] = pj for j ∈
Z and {#n }n≥1 are independent. Let X0 be a Z-valued random variable
independent of {#n }n≥1 . Then, define for n ≥ 0,
Xn+1 = Xn + #n+1 = Xn−1 + #n + #n−1 = · · · = X0 +
n+1
)
j=1
#j .
14.1 Markov chains: Countable state space
441
In this case, with probability one, the trajectories of Xn go to +∞ (respectively, −∞), if E(#1 ) > 0 (respectively < 0). If E(#1 ) = 0, then the
trajectories fluctuate infinitely often provided p0 )= 1.
Example 14.1.3: (Branching processes). Let S = Z+ = {0, 1, 2, . . .}.
Let {pj }∞
j=0 be a probability distribution. Let {ξni }i∈N,n∈Z+ be iid random
variables with distribution {pj }∞
j=0 . Let Z0 be a Z+ -valued random variable
independent of {ξni }. Let
Zn+1 =
Zn
)
ξni
i=1
for n ≥ 0.
If p0 = 0 and p1 < 1, then Zn → ∞ w.p. 1. If p0 > 0, then P [Zn →
∞] + P [Zn → 0] = 1. Also, P [Zn → 0 | Z0 = 1] = q is the smallest solution
in [0,1] to the equation
q = f (q) =
∞
)
pj q j .
j=0
So q = 1 iff m ≡
$∞
j=1
jpj (1) ≤ 1 (see Chapter 18 also).
Example 14.1.4: (Birth and death chains). Again take S = Z+ . Define
P by
pi,i+1 = αi ,
p0,1 = α0 ,
pi,i−1 = βi = 1 − αi , for i ≥ 1,
p0,0 = β0 = 1 − α0 .
The population increases at rate αi and decreases at rate 1 − αi .
Example 14.1.5: (Iterated function systems). Let G := {hi : hi : S →
S, i = 1, 2, . . . , L}, L ≤ ∞. Let µ = {pi }L
i=1 be a probability distribution.
Let {fn }∞
n=1 be iid, such that P (fn = hi ) = pi , 1 ≤ i ≤ L. Let X0 be a
S-valued random variable independent of {fn }∞
n=1 . Then, the iid random
iteration scheme
X1
X2
Xn+1
=
f1 (X0 )
=
..
.
=
f2 (X1 )
!
"
fn+1 (Xn ) = fn+1 fn (· · · (f1 (X0 )) · · ·)
is a Markov chain with transition probability matrix
pij = P (f1 (i) = j) =
L
)
r=1
!
"
pr I hr (i) = j .
Remark 14.1.1: Any discrete state space Markov chain can be generated
in this way (see II in 14.1.3 below).
442
14. Markov Chains and MCMC
14.1.3
Existence of a Markov chain
I. Kolmogorov’s approach. Let Ω = SZ+ = {ω : ω ≡ {xn }∞
n=0 , xn ∈
with
values
in S. Let
S for all n} be the set of all sequences {xn }∞
n=0
F0 consist of all finite dimensional subsets of Ω of the form
:
;
A = ω : ω = {xn }∞
n=0 , xj = aj , 0 ≤ j ≤ m ,
where m < ∞ and aj ∈ S for all j = 0, 1, . . . , m. Let F be the
σ-algebra generated by F0 . Fix µ and P . For A as above let
λµ,P (A) := µa0 pa0 a1 pa1 a2 · · · pam−1 am .
Then it can be shown, using the extension theorem from Chapter 2
or Kolmogorov’s consistency theorem of Chapter 6, that λµ,P can
be extended to be a probability measure on F. Let Xn (ω) = xn , if
∞
ω = {xn }∞
n=0 , be the coordinate projection. Then {Xn }n=0 will be a
sequence of S-valued random variables on (Ω, F, λµ,P ), such that it is
a Markov chain with initial distribution µ and transition probability
P . A typical element ω = {xn }∞
n=0 of Ω is called a sample path or a
sample trajectory.
The following are examples of events (sets) in F, which are not finitedimensional:
A1
A2
4
1)
h(xj ) exists for a given h : S → R,
n→∞ n
j=1
:
;
=
ω : the set of limit points of {xn }∞
n=0 = {a, b} .
=
3
n
ω : lim
Thus, it is essential to go to (Ω, F, λµ,P ) to discuss the events involving asymptotic (long term) behavior, i.e., as n → ∞.
II. IIIDRM approach (iteration of iid random maps). Let P
((pij ))K×K be a stochastic matrix. Let f : S × [0, 1] → S be

a1




a2




.


 ..
aj
f (ai , u) =






..



.


aK
=
if 0 ≤ u < pi1
if pi1 ≤ u < pi1 + pi2
if pi1 + pi2 + · · · + pi(j−1) ≤ u < pi1 + pi2
+ · · · + pij
if pi1 + pi2 + · · · + pi(K−1) ≤ u < 1 .
(1.1)
Let U1 , U2 , . . . be iid Uniform [0, 1] random variables. Let fn (·) :=
f (·, Un ). Then for each n, fn maps S to S. Also {fn }∞
n=1 are iid.
14.1 Markov chains: Countable state space
443
Let X0 be independent of {Ui }∞
i=1 and X0 ∼ µ. Then the sequence
defined
by
{Xn }∞
n=0
Xn+1 = fn+1 (Xn ) = f (Xn , Un+1 )
is a Markov chain with initial distribution µ and transition probability
P . The underlying probability space on which X0 and {Ui }∞
i=1 are
defined can be taken to be the Lebesgue space ([0, 1], B([0, 1]), m),
where m is the Lebesgue measure.
Finite Time Behavior of {Xn }
For each n ∈ N,
"
!
P X0 = a0 , X1 = a1 , . . . , Xn = an
-K
.
n
"
"
!
!
=
P Xj = aj | Xj−1 = aj−1 , . . . , X0 = a0 P X0 = a0
j=1
=
-K
n
j=1
.
(1.2)
paj−1 aj µa0 .
Thus, the joint distribution for any finite n is determined by µ and P . In
particular,
n
" )K
!
paj−1 aj = (P n )a0 an ,
P Xn = an | X0 = a0 =
(1.3)
j=1
where the sum in the middle term runs over all a1 , a2 , . . . , aj−1 and P n
is the nth power of P . So the behavior of the distribution of Xn can be
studied via that of P n for large n. But this analytic approach is not as
comprehensive as the probabilistic one, via the concept of recurrence, which
will be described next.
14.1.4
Limit theory
Let {Xn }∞
n=0 be a Markov chain with state
" S = {1, 2, . . . , K}, K ≤
! space
∞, and transition probability matrix P = (pij ) K×K .
Definition 14.1.2: (Hitting times). For any set A ⊂ S, the hitting time
TA is defined as
TA = inf{n : Xn ∈ A, n ≥ 1}
i.e., it is the first time after 0 that the chain enters A. The random variable
TA is also called the first passage time for A or the first entrance time for
A. Note that TA is a stopping time (cf. Chapter 13) w.r.t. the filtration
{Fn ≡ σ1{Xj : 0 ≤ j ≤ n}2}n≥0 .
444
14. Markov Chains and MCMC
By convention, inf ∅ = ∞. If A = {i}, then write T{i} = Ti for notational
simplicity.
A concept of fundamental importance is recurrence of a state.
Definition 14.1.3: (Recurrence). A state i is recurrent or transient according as
Pi [Ti < ∞] = 1 or < 1,
where Pi denotes the probability distribution of {Xn }∞
n=0 with X0 = i with
probability 1. Thus i is recurrent iff
!
"
fii ≡ P Xn = i for some 1 ≤ n < ∞ | X0 = i = 1.
(1.4)
Definition 14.1.4: (Null and positive recurrence). A recurrent state i is
called null recurrent if Ei (Ti ) = ∞ and positive recurrent if Ei (Ti ) < ∞,
where Ei refers to expectation w.r.t. the probability distribution Pi .
Example 14.1.6: (Frog in the well). Let S = {1, 2, . . .} and P = ((pij ))
be given by
pi,i+1 = αi
and pi,1 = 1 − αi
for all i ≥ 1, 0 < αi < 1.
Then
P1 [T1 > r] = α1 α2 · · · αr .
<r
$∞
0 as r"→ ∞ iff i=1 (1−αi ) = ∞. Further
So P1 [T1 < ∞] = 1 iff i=1
i →
!
<
$α
∞
r
1
1 is positive recurrent iff r=1
1 αi < ∞. Thus, if αi = (1 − 2i2 ), then
1 is transient; but if αi ≡ α, 0 < α < 1 for all i, then 1 is positive recurrent.
1
If αi = 1 − ci
, c > 1, then 1 is null recurrent (Problem 14.2).
Example 14.1.7: (Simple random walk). Let S = Z, the set of all integers,
pi,i+1 = p, pi,i−1 = q, 0 < p = 1 − q < 1. Then it can be shown by using
the SLLN that for p )= 12 , 0 is transient. But it is harder to show that for
p = 12 , 0 is null recurrent (see Corollary 14.1.5 below).
The next result says that after each return to i, the Markov chain starts
afresh.
Proposition 14.1.1: (The strong Markov property). For any i ∈ S
and any initial distribution µ of X0 and any k < ∞, a1 , . . . , ak in S,
Pµ (XTi +j = aj , j = 1, 2, . . . , k, Ti < ∞) = Pµ (Ti < ∞)Pi (Xj = aj , j =
1, 2, . . . , k).
Proof: For any n ∈ N,
Pµ (XTi +j = aj , 1 ≤ j ≤ k, Ti = n)
= Pµ (Xn+j = aj , 1 ≤ j ≤ k, Xn = i, Xr )= i, 1 ≤ r ≤ n − 1)
J
= Pµ (Xn+j = aj , 1 ≤ j ≤ k J Xn = i, Xr )= i, 1 ≤ r ≤ n − 1)
14.1 Markov chains: Countable state space
445
· Pµ (Xn = i, Xr )= i, 1 ≤ r ≤ n − 1)
= Pi (Xj = aj , 1 ≤ j ≤ k)Pµ (Xn = i, Xr =
) i, 1 ≤ r ≤ n − 1)
= Pi (Xj = aj , 1 ≤ j ≤ k)Pµ (Ti = n).
Adding both sides over n yields the result.
!
The strong Markov property leads to the important useful technique
of breaking up the time evolution of a Markov chain into iid cycles. This
combined with the law of large numbers yield the basic convergence results.
(0)
Definition 14.1.5: (IID cycles). Let i be a state. Let Ti = 0 and
@
(k)
(k)
inf{n : n > Ti , Xn = i}, if Ti < ∞,
(k+1)
=
Ti
(1.5)
(k)
∞,
if Ti = ∞,
(k)
is the successive return times to state i.
! (k)
"
Proposition 14.1.2: Let i be a recurrent state. Then Pi Ti < ∞ = 1
for all k ≥ 1.
i.e., for k = 0, 1, 2, . . ., Ti
Proof: By definition of recurrence, the claim is true for k = 1. If it is true
for k > 1, then
'
(
(k+1)
Pi Ti
<∞
=
=
∞
'
(
)
(k+1)
(k)
P Ti
< ∞, Ti = j
j=k
∞
)
j=k
=
( '
(
'
(1)
(k)
Pi Ti < ∞ Pi Ti = j
(by Markov property)
'
( '
(
(1)
(k)
Pi Ti < ∞ Pi Ti < ∞ = 1.
(r)
{Xj , Ti
(r+1)
Ti
≤j ≤
− 1;
Let ηr =
The ηr ’s are called cycles or excursions.
(r+1)
Ti
!
−
(r)
Ti }
for r = 0, 1, 2, . . ..
Theorem 14.1.3: Let i be a recurrent state. Under Pi , the sequence
{ηr }∞
r=0 are iid as random vectors with a random number of components.
More precisely, for any k ∈ N,
(
'
(r+1)
(r)
Pi ηr = (xr0 , xr1 , . . . , xrjr ), Ti
− Ti = jr , r = 0, 1, . . . , k
=
k
K
r=0
'
(
(1)
Pi η1 = (xr0 , xr1 , . . . , xrjr ), Ti = jr
for any {xr0 , xr1 , . . . , xrjr , jr }, r = 0, 1, . . . , k.
(1.6)
446
14. Markov Chains and MCMC
Proof: Use the strong Markov property repeatedly (Problem 14.7).
$∞
Proposition 14.1.4: For any state i, let Ni ≡ n=1 I{i} (Xn ) be the total
number of visits to i. Then,
(i) i recurrent ⇒ P (Ni = ∞ | X0 = i) = 1.
(ii) i transient ⇒ P (Ni = j | X0 = i) = fiij (1 − fii ) for j = 0, 1, 2, . . .
where fii = P (Ti < ∞ | X0 = i) is the probability of returning to i
starting from i, i.e., under Pi , Ni has a geometric distribution with
parameter (1 − fii ).
Proof: Follows from the strong Markov property (Proposition 14.1.1).
Corollary 14.1.5: A state i is recurrent iff
Ei Ni ≡ E(Ni | X0 = i) =
∞
)
n=1
(n)
pii = ∞.
(1.7)
Proof: If i is recurrent then Pi (N
i = ∞) = 1 and so Ei Ni = ∞.
$∞
fii
< ∞. Also by
If i is transient then Ei Ni = j=0 jfiij (1 − fii ) = 1−f
ii
the monotone convergence theorem,
E(Ni | X0 = i) =
∞
)
n=1
E(δXn i | X0 = i) =
∞
)
(n)
pii .
n=1
!
An Application
For the simple symmetric random walk (SSRW) in Z, 0 is recurrent.
Indeed,
/
0
if n is odd
(n)
!2k"! 1 "2k
p00 =
(1.8)
if
n = 2k.
2
k
By Stirling’s formula,
- .
1√
e−2k (2k)2k+ 2 2π
4k 1
2k
(2k)!
√ √
∼
∼
=
1√
k!k!
k
k π
(e−k k k+ 2 2π)2
and hence
1 1
(2k)
p00 ∼ √ √ .
π k
$∞ (2k)
Thus k=1 p00 = ∞, implying that 0 is recurrent.
By a similar argument, it can be shown that the simple symmetric random walk in Z (2) , the integer lattice in R2 , the origin is recurrent, but in
Z (d) for d ≥ 3, the origin is transient (Problem 14.3).
14.1 Markov chains: Countable state space
447
Corollary 14.1.6: If the state space S is finite, then at least one state
must be recurrent.
$K $n
Proof: Let S ≡ {1, 2, . . . , K}, K < ∞. Since n = i=1 j=1 δXj i , there
$n
exists an i0 such that as n → ∞, j=1 δXj i0 → ∞ with positive probability.
!
This implies that i0 must be recurrent.
Definition 14.1.6: (Communication). A state i leads to j (write i → j)
(n)
if for some n ≥ 1, pij > 0. A pair of states i, j are said to communicate
(n)
if i → j and j → i, i.e., if there exist n ≥ 1, m ≥ 1 such that pij > 0,
(m)
pji
> 0.
Definition 14.1.7: (Irreducibility). A Markov chain with state space
S ≡ {1, 2, . . . , K}, K ≤ ∞ and transition probability matrix P ≡ ((pij )) is
irreducible if for each i, j in S, i and j communicate.
Definition 14.1.8: A state i is absorbing if pii = 1.
It is easy to show that if i is absorbing and j → i, then j is transient
(Problem 14.4).
Proposition 14.1.7: (Solidarity property). Let i be recurrent and i → j.
Then fji = 1 and j is recurrent, where fji ≡ P (Ti < ∞ | X0 = j).
Proof: By the (strong) Markov property, 1 − fii = P (Ti = ∞ | X0 = i) ≥
P (Tj < ∞, Ti = ∞ | X0 = i) = P (Ti = ∞ | X0 = j)P (Tj < Ti | X0 = i)
(intuitively speaking, one possibility of not returning to i (starting from i) is
∗
∗
to visit j and then not returning to i) = (1 − fji )fij
, where fij
= P (visiting
j before visiting i | X0 = i). Now i recurrent and i → j yield 1−fii = 0 and
∗
> 0 (Problem 14.4) and so 1 − fji = 0, i.e., fji = 1. Thus, starting from
fij
j, the chain visits i w.p. 1. From i, it keeps returning to i infinitely often.
∗
of visiting j is positive
In each of these excursions, the probability fij
and since there are infinite number of such excursions and they are iid,
the chain does visit j in one of these excursions w.p. 1. That is j is recurrent.
!
Also an alternate proof using the Corollary 14.1.5 is possible (Problem
14.5).
Proposition 14.1.8: In a finite state space irreducible Markov chain all
states are recurrent.
Proof: By Corollary 14.1.6, there is at least one state i0 that is recurrent.
By irreducibility and solidarity, this implies all states are recurrent.
Remark 14.1.2: A stronger result holds, namely, that for a finite state
space irreducible Markov chain, all states are positive recurrent (Problem
14.6).
448
14. Markov Chains and MCMC
Theorem 14.1.9: (A law of large numbers). Let i be positive recurrent.
Let
Nn (j) = #{k : 0 ≤ k ≤ n, Xk = j}, j ∈ S
(1.9)
;
:
n (j)
be
be the number of visits to j during {0, 1, . . . , n}. Let Ln (j) ≡ Nn+1
the empirical distribution at time n. Let X0 = i, with probability 1. Then
Ln (j) →
Vij
, w.p. 1
Ei (Ti )
(1.10)
! $Ti −1
"
where Vij = Ei
k=0 δXk ,j is the mean number of visits to j during
{0, 1, . . . , Ti − 1} starting from i. In particular, Ln (i) → Ei1Ti , w.p. 1.
(k)
Proof: For each n, let k ≡ k(n) be such that Ti
(k+1)
≤ n < Ti
. Then,
NT (k) (j) ≤ Nn (j) ≤ NT (k+1) (j),
i
i
i.e.,
k
)
r=0
(r)
ξr ≤ Nn (j) ≤
(r+1)
k+1
)
ξr ,
r=0
where ξr = #{l : Ti ≤ l < Ti
, Xl = j} is the number of visits to
j during the rth excursion. Since Vij ≡ Ei ξ1 ≤ Ei Ti < ∞, by the SLLN,
with probability 1,
k(n)
1 )
ξr → Ei (ξ1 ) = Vij
k(n) r=0
(k)
(k(n))
and k1 Ti → Ei (Ti ). Note that since n is between Ti
n
so k(n)
→ Ei Ti . Since
(k(n)+1)
and Ti
,
k
k+1
k+1 1 )
Nn (j)
k1)
≤
ξr ≤
ξr ,
n k r=0
n
n k + 1 r=0
it follows that Ln (j) =
n Nn (j)
n+1
n
→
Ei (ξ1 )
Ei (Ti ) .
!
Note that the above proof works for any initial distribution µ such that
Pµ (Ti < ∞) = 1 and further, the limit of Ln (j) is independent of any such
µ. Thus, if (S, P ) is irreducible and one state i is positive recurrent, then
for any initial distribution µ, Pµ (Ti < ∞) = 1.
Note finally that the proof in Theorem 14.1.9 can be adapted to yield a
criterion for transience, null recurrence, and positive recurrence of a state.
Thus, the following holds.
Corollary 14.1.10: Fix a state i. Then
14.1 Markov chains: Countable state space
449
(i) i is transient iff limn→∞ Nn (i) exists and is finite w.p. 1 for any
initial distribution iff
lim Ei Nn (i) =
n→∞
(ii) i is null recurrent iff
$∞
k=0
∞
)
k=0
(k)
pii < ∞.
(k)
pii = ∞ and
n
1 ) (k)
pii = 0.
n→∞ n
lim Ei Ln (i) = lim
n→∞
k=0
(iii) i is positive recurrent iff
n
1 ) (k)
pii > 0.
n→∞ n
lim Ei Ln (i) = lim
n→∞
k=0
(iv) Let (S, P ) be irreducible and let one state i be positive recurrent.
Then, for any j and any initial distribution µ,
Ln (j) →
1
∈ (0, ∞) w.p. 1.
Ej T j
Thus, for the symmetric simple random walk on the integers
c
(2k)
p00 ∼ √ as k → ∞, 0 < c < ∞
k
and hence
∞
)
n
(n)
n=0
p00 = ∞
Thus 0 is null recurrent.
and
1 ) (k)
p00 → 0.
n
k=0
It is not difficult to show that if j leads to i, i.e., if fji ≡ Pj (Ti < ∞)
is positive then the number ξ1 of visits to j between consecutive visits to i
has all moments (Problem 14.9).
Using the SLLN, Theorem 14.1.9 can be extended as follows to cover
both null and positive recurrent cases.
Theorem 14.1.11: Let i be a recurrent state. Then, for any j and any
initial distribution µ such that Pµ (Ti < ∞) = 1,
n
Ln (j) ≡
1 )
Vij
δXk j → πij ≡
,
n+1
Ei T i
w.p. 1
k=0
as n → ∞ where Vij < ∞. (If Ei Ti = ∞, then πij = 0 for all j.)
(1.11)
450
14. Markov Chains and MCMC
Corollary 14.1.12: Let (S, P ) be irreducible and let one state be recurrent.
Then, for any j and any initial distribution,
Ln (j) → cj
as
w.p. 1,
n → ∞,
(1.12)
where cj = 1/Ej Tj if Ej Tj < ∞ and cj = 0 otherwise.
The Basic Ergodic Theorem
Taking expectation in Theorem 14.1.11 and using the bounded convergence theorem leads to
Corollary 14.1.13: Let i be recurrent. Then, for any initial distribution
µ with Pµ (Ti < ∞) = 1,
n
Eµ (Ln (j)) =
1 )
Vij
:= πij
Pµ (Xk = j) →
n+1
Ei (Ti )
as
k=0
n → ∞.
(1.13)
:= πij for j ∈
Theorem 14.1.14: Let i be positive recurrent. Let πj $
}
is
a
stationary
distribution
for
P
,
i.e.,
S.
Then
{π
j j∈S
j πj = 1 and
$
π
p
=
π
,
for
all
j.
j
l∈S l lj
Proof:
n
1 ) (k)
pij
n+1
n
1
1 ) (k)
δij +
pij
n+1
n+1
=
k=0
k=1
n )
)
1
1
(k−1)
δij +
pil
plj
n+1
n+1
k=1 l
.
n
)- 1 )
1
(k−1)
δij +
pil
plj .
(n + 1)
n+1
=
=
l
k=1
Taking limit as n → ∞ and using Fatou’s lemma yields
)
πj ≥
πl plj .
(1.14)
l
If strict inequality were to hold for some j0 , then adding over j would yield
))
) ') ( )
)
πj >
πl plj =
πl
plj =
πl .
j
$
j
Since j∈S Vij = Ei Ti ,
in (1.14) for any j.
l
$
j
l
j
l
πj = 1, so there cannot be a strict inequality
!
Therefore, the following has been established.
14.1 Markov chains: Countable state space
451
Theorem 14.1.15: Let i be a positive recurrent state. Let
$Ti −1
Ei ( k=0
δXk ,j )
πj :=
.
Ei (Ti )
Then
(i) {πj }j∈S is a stationary distribution.
(ii) For any j and any initial distribution µ with Pµ (Ti < ∞) = 1,
(a) Ln (j) ≡
(b)
1
n+1
n
)
k=0
1
n+1 #{k
: Xk = j, 0 ≤ j ≤ n} → πj w.p. 1 (Pµ )
Pµ (Xk = j) → πj .
In particular, if j = i, we have
n
1
1 ) (k)
.
pii → πi =
n+1
Ei (Ti )
(1.15)
k=0
Now let i be a positive recurrent state and j be such that i → j. Then
Vij > 0 and by the solidarity property, fji = 1 and j is recurrent. Now taking µ = δj in (ii) above and using Corollary 14.1.13, leads to the conclusions
that
n
1 ) (k)
pjj → πj > 0
(1.16)
n+1
k=0
and
πj =
1
.
Ej T j
(1.17)
Thus, j is positive recurrent.
Now Theorem 14.1.15 leads to the basic ergodic theorem for Markov
chains.
Theorem 14.1.16: Let (S, P ) be irreducible and let one state be positive
recurrent. Then
(i) all states are positive recurrent,
(ii) π ≡ {πj ≡ (Ej Tj )−1 : j ∈ S} is a stationary distribution for (S, P ),
(iii) for any initial distribution µ and any j ∈ S
(a)
1
n+1
n
)
k=0
Pµ (Xk = j) → πj , i.e., π is the unique limiting distri-
bution (in the average sense) and hence the unique stationary
distribution,
452
14. Markov Chains and MCMC
(b) Ln (j) ≡
1
n+1
n
)
δXk j → πj w.p. 1 (Pµ ).
k=0
There is a converse to the above result. To develop this, note first that if
j is a transient state, then the total number Nj of visits to j is finite w.p.
1 (for any initial distribution µ) and hence Ln (j) → 0 w.p. 1 and taking
expectations, for any i
n
1 ) (k)
pij → 0
n+1
as
k=0
n → ∞.
Now, suppose that π $
≡ {πj : j ∈ S} is a stationary distribution for (S, P ).
Then, for all j, πj = i∈S πi pij and hence
))
πr pri pij
πj =
i∈S,r∈S
)
=
(2)
πr prj
r∈S
and by induction,
πj =
)
(k)
πi pij
i∈S
implying
πj =
)
i∈S
πi
-
Now if j is transient then for any i
for all k ≥ 0
.
n
1 ) (k)
pij .
n+1
k=0
n
1 ) (k)
pij → 0
n+1
k=0
as
n→∞
and so by the bounded convergence theorem
. )
n
) - 1 )
(k)
πi
pij
πi · 0 = 0.
=
πj = lim
n→∞
n+1
i∈S
k=0
i∈S
Thus, πj > 0 implies j is recurrent. For j recurrent, it follows from arguments similar to those used to establish Theorem 14.1.15 that
n
1
1 ) (k)
pij → fij
.
n
Ej T j
k=0
(
'$
$
1
Thus, πj =
i∈S πi fij Ej Tj . But
i∈S πi fij ≥ πj fjj = πj > 0. So
Ej Tj < ∞, i.e., j is positive recurrent. Summarizing the above discussion
leads to
14.1 Markov chains: Countable state space
453
Proposition 14.1.17: Let π ≡ {πj : j ∈ S} be a stationary distribution
for (S, P ). Then, πj > 0 implies that j is positive recurrent.
It is now possible to state a converse to Theorem 14.1.16.
Theorem 14.1.18: Let (S, P ) be irreducible and admit a stationary distribution π ≡ {πj : j ∈ S}. Then,
(i) all states are positive recurrent,
(ii) π ≡ {πj : πj =
1
Ej Tj ,
j ∈ S} is the unique stationary distribution,
(iii) for any initial distribution µ and for all j ∈ S,
(a)
(b)
1
n+1
1
n+1
n
)
k=0
n
)
k=0
Pµ (Xk = j) → πj ,
δXk j → πj w.p. 1 (Pµ ).
In summary, for an irreducible Markov chain (S, P ) with a countable
state space, a stationary distribution π exists iff all states are positive
recurrent. In which case, π is unique and for any initial distribution µ, the
distribution at time n converges to π in the Cesaro sense (i.e., average)
and the (LLN) law of large numbers holds. For the finite state space case,
irreducibility suffices (Problem 14.6). $
If h : S → R is a function such that j∈S |h(j)|πj < ∞, then the LLN
can be strengthened to
∞
n
)
1 )
h(Xk ) →
h(j)πj w.p. 1 (Pµ )
n+1
j=0
k=0
for any µ (Problem 14.10). In particular, if A is any subset of S, then
Ln (A) ≡
n
)
1 )
IA (Xk ) → π(A) ≡
πj
n+1
k=0
j∈A
w.p. 1 (Pµ ) for any µ.
An important question that remains is whether the convergence of
Pµ (Xn = j) to πj can be strengthened$
from the average sense to full,
n
1
i.e., from the convergence to πj of (n+1)
k=0 Pµ (Xk = j) to the convergence to πj of Pµ (Xn = j) as n → ∞. For this, the additional hypothesis
needed is aperiodicity.
Definition 14.1.9: For any state i, the period di of the state i is the
:
;
(n)
g.c.d. n : n ≥ 1, pii > 0 .
454
14. Markov Chains and MCMC
Further, i is called aperiodic if di = 1.
Example 14.1.8: Let S = {0, 1, 2} and


0 1 0
P =  12 0 12  .
0 1 0
Then di = 2 for all i.
Note that in this example, since (S, P ) is finite and irreducible, it has a
unique stationary distribution, given by π = ( 14 , 12 , 14 ) and
n
1
1 ) (k)
p00 →
n+1
4
as
k=0
(2n+1)
n→∞
(2n)
= 0 for each n and p00 → 14 as n → ∞. This suggests that
but p00
aperiodicity will be needed. It turns out that if (S, P ) is irreducible, the
period di is the same for all i (Problem 14.5).
Theorem 14.1.19: Let (S, P ) be irreducible, positive recurrent and aperiodic (i.e., di = 1 for all i). Let {Xn }n≥0 be a (S, P ) Markov chain. Then,
for any initial distribution µ, limn→∞ Pµ (Xn = i) ≡ πi exists for all i,
where π ≡ {πj : j ∈ S} is the unique stationary distribution.
There are many proofs known for this, and two of them are outlined
below. The first uses the discrete renewal theorem and the second uses a
coupling argument.
Proof 1: (via the discrete renewal theorem). Fix a state i. Recall that for
(n)
(n)
n ≥ 1, pii = P (Xn = i | X0 = i), n ≥ 0 and fii = P (Ti = n | X0 = i),
n ≥ 1. Using the Markov property, for n ≥ 1,
(n)
pii = P (Xn = i | X0 = i)
=
=
=
n
)
k=1
n
)
k=1
n
)
P (Xn = i, Ti = k | X0 = i)
P (Ti = k | X0 = i)P (Xn = i | Xk = i)
(k) (n−k)
fii pii
.
k=1
(n)
(n)
Let an ≡ pii , n ≥ 0, pn ≡ fii , n ≥ 1. Then {pn }n≥1 is a probability
distribution and {an }n≥0 satisfies the discrete renewal equation
an = bn +
n
)
k=0
an−k pk , n ≥ 0
14.1 Markov chains: Countable state space
455
where bn = δn0 and p0 = 0. It can be shown that di = 1 iff
;
:
g.c.d. k : k ≥ 1, pk > 0 = 1.
$∞
Further, Ei Ti = k=1 kpk < ∞, by the assumption of positive recurrence.
Now it follows from the discrete renewal theorem (see Section 8.5) that
$∞
1
j=0 bj
lim an exists and equals $∞
=
= πi .
n→∞
Ei T i
k=1 kpk
!
Proof 2: (Using coupling arguments). Let {Xn }n≥0 and {Yn }n≥0 be
independent (S, P ) Markov chains such that Y0 has distribution π and X0
has distribution µ. Then {Zn = (Xn , Yn )}n≥0 is a Markov
chain with state
!
"
space S × S and transition probability P × P ≡ (p(i,j),(k,,) = pik pj, ) .
Further, it can be shown that (see Hoel, Port and Stone (1972))
(a) {πi,j ≡ πi πj : (i, j) ∈ S × S} is a stationary distribution for {Zn },
(b) since (S, P ) is irreducible and aperiodic, the pair (S × S, P × P ) is
irreducible.
Since (S × S, P × P ) is irreducible and admits a stationary distribution, it
is necessarily recurrent and so from any initial distribution the first passage
time TD for the diagonal D ≡ {(i, i) : i ∈ S} is finite with probability one.
Thus, Tc ≡ min{n : n ≥ 1, Xn = Yn } is finite w.p. 1. The random variable
Tc is called the coupling time. Let
/
Xn , n ≤ Tc
X̃n =
Yn , n > Tc .
Then, it can be verified that {Xn }n≥0 and {X̃n }n≥0 are identically distributed Markov chains. Thus
P (Xn = j)
and
= P (X̃n = j)
= P (X̃n = j, Tc < n) + P (X̃n = j, Tc ≥ n)
P (Yn = j) = P (Yn = j, Tc < n) + P (Yn = j, Tc ≥ n)
implying that
J
J
JP (Xn = j) − P (Yn = j)J ≤ 2P (Tc ≥ n).
Since P (Tc ≥ n) → 0 as n → ∞ and by the stationarity of π, P (Yn = j) =
πj for all n and j it follows that for any j
lim P (Xn = j)
n→∞
exists and
= πj .
456
14. Markov Chains and MCMC
!
In order to obtain results on rates of convergence for |P (Xn = j) − πj |,
one needs more assumptions on the distribution of return time Ti or the
coupling time Tc . For results on this, the books of Hoel et al. (1972), Meyn
and Tweedie (1993), and Lindvall (1992) are good sources. It can be shown
that in the irreducible
case if for some i, Pi (Ti > n) = O(λn1 ) for some
$
0 < λ1 < 1, then j∈S |Pi (Xn = j) − πj | = O(λn2 ) for some λ1 < λ2 < 1.
In particular, this geometric convergence holds for the finite state space
irreducible case.
The main results of this section are summarized below.
Theorem 14.1.20: Let {Xn }n≥0 be a Markov chain with a countable
state !space"S = {0, 1, 2, . . . , K}, K ≤ ∞ and transition probability matrix
P ≡ (pij ) . Let (S, P ) be irreducible. Then,
(a) All states are recurrent iff for some i in S,
∞
)
n=1
(n)
pii = ∞.
(b) All states are positive recurrent iff for some i in S,
n
1 ) (k)
pii
n→∞ n
lim
exists and is strictly positive.
k=0
(c) There exists a stationary probability distribution π iff there exists a
positive recurrent state.
(d) If there exists a stationary distribution π ≡ {πj : j ∈ S}, then
(i) it is unique, all states are positive recurrent and for all j ∈ S,
πj = (Ej Tj )−1 ,
(ii) for all i, j ∈ S,
n
1 ) (k)
pij → πj
n+1
as
k=0
n → ∞,
(iii) for any initial distribution and any j ∈ S,
n
1 )
δXk j → πj
n+1
w.p. 1,
k=0
(iv) if
$
j∈S
|h(j)|πj < ∞, then
n
)
1 )
h(Xk ) →
h(j)πj
n+1
k=0
j∈S
w.p. 1,
14.2 Markov chains on a general state space
457
(v) if, in addition, di = 1 for some i ∈ S, then dj = 1 for all j ∈ S
and for all i, j,
(n)
pij → πj
as
n → ∞.
14.2 Markov chains on a general state space
14.2.1
Basic definitions
Let {Xn }n≥0 be a sequence of random variables with values in some space
S that is not necessarily finite or countable. The Markov property says that
conditioned on Xn , Xn−1 , . . . , X0 , the distribution of Xn+1 depends only
on Xn and not on the past, i.e., Xj : j ≤ n − 1. When S is not countable,
to make this notion of Markov property precise, one needs the following set
up.
Let (S, S) be a measurable space. Let (Ω, F, P ) be a probability space
and {Xn (ω)}n≥0 be a sequence of maps from Ω to S such that for each
n, Xn is (F , S) measurable. Let Fn ≡ σ1{Xj : 0 ≤ j ≤ n}2 be the subσ-algebra of F generated by {Xj : 0 ≤ j ≤ n}. In what follows, for any
sub-σ-algebra Y of F, let P (· | Y) denote the conditional probability given,
Y as defined in Chapter 12.
Definition 14.2.1: The sequence {Xn }n≥0 is a Markov chain if for all
A ∈ S,
!
"
!
"
P (Xn+1 ∈ A) | Fn = P (Xn+1 ∈ A) | σ1Xn 2 w.p. 1, for all n ≥ 0,
(2.1)
for any initial distribution of X0 , where σ1Xn 2 is the sub-σ-algebra of F
generated by Xn .
It is easy to verify that (2.1)
holds" for all A ∈ S iff for any bounded
!
measurable h from (S, S) to R, B(R) ,
"
!
"
!
E h(Xn+1 ) | Fn = E h(Xn+1 ) | σ1Xn 2 w.p. 1 for all n ≥ 0 (2.2)
for any initial distribution of X0 .
Another equivalent formulation that makes the Markov property symmetric with respect to time is the following that says that given the present,
the past and future are independent.
Proposition 14.2.1: A sequence of random variables {Xn }n≥0 satisfies
⊂ S,
(2.1) iff for any {Aj }n+k
0
!
"
P Xj ∈ Aj , j = 0, 1, 2, . . . , n − 1, n + 1, . . . , n + k | σ1Xn 2
"
!
= P Xj ∈ Aj , j = 0, 1, 2, . . . , n − 1 | σ1Xn 2 ×
!
"
P Xj ∈ Aj , j = n + 1, . . . , n + k | σ1Xn 2
458
14. Markov Chains and MCMC
w.p. 1.
The proof is somewhat involved but not difficult. The countable state
space case is easier (Problem 14.1).
An important tool for studying Markov chains is the notion of a transition
probability function.
Definition 14.2.2: A function P : S × S → [0, 1] is called a transition
probability function on S if
(i) for all x in S, P (x, ·) is a probability measure on (S, S),
(ii) for all A ∈ S, P (·, A) is an S-measurable function from (S, S) → [0, 1].
Under some general conditions guaranteeing the existence of regular conditional probabilities, the right side of (2.1) can be expressed as Pn (Xn , A),
where Pn (·, ·) is a transition probability function on S. In such a case, yet
another formulation of Markov property is in terms of the joint distributions of {X0 , X1 , . . . , Xn } for any finite n.
Proposition 14.2.2: A sequence {Xn }n≥0 satisfies (2.1) iff for any n ∈ N
and A0 , A1 , . . . , An ∈ S,
P (Xj ∈ Aj , j = 0, 1, 2, . . . , n)
% %
%
=
Pn−1 (xn−1 , An )Pn−2 (xn−2 , dxn−1 )
A0
An−2
An−1
. . . P1 (x0 , dx1 )µ0 (dx0 ),
where µ0 (A) = P (x0 ∈ A), A ∈ S.
The proof is by induction and left as an exercise (Problem 14.16). In
what follows, it will be assumed that such transition functions exist.
Definition 14.2.3: A sequence of S-valued random variables {Xn }n≥0 is
called a Markov chain with transition function P (·, ·) if (2.1) holds and the
right side equals P (Xn , A) for all n ∈ N.
14.2.2
Examples
Example 14.2.1: (IID sequence). Let {Xn }n≥0 be iid S-valued random
variables with distribution µ. Then {Xn }n≥0 is a Markov chain with transition function P (x, A) ≡ µ(A) and initial distribution µ.
Example 14.2.2: ((Additive) random walk in Rk ). Let {ηj }j≥1 be iid
Rk -valued random variables with distribution ν. Let X0 be an Rk -valued
random variable independent of {ηj }j≥1 and with distribution µ. Let
Xn+1
=
Xn + ηn+1
14.2 Markov chains on a general state space
=
X0 +
n+1
)
j=1
459
ηj , n ≥ 0.
Then {Xn }n≥0 is a Rk -valued Markov chain with transition function
P (x, A) ≡ ν(A − x) and initial distribution µ.
Example 14.2.3: (Multiplicative random walk on R+ ). Let {ηn }n≥1 be
iid nonnegative random variables with distribution ν and X0 be a nonnegative random variable with distribution µ and independent of {ηn }n≥1 . Let
Xn+1 = Xn ηn+1 , n ≥ 0. Then {Xn }n≥0 is a Markov chain with state space
R+ and transition function
P (x, A) = ν(x−1 A) if x > 0 and IA (0) if x = 0
and initial distribution µ.
This is a model for the value of a stock portfolio subject to random
growth
<n
rates. Clearly, the above iteration scheme leads to Xn = X0 · i=1 ηi which
when normalized appropriately leads to what is known as the geometric
Brownian notion model in financial mathematics literature.
Example 14.2.4: (AR(1) time series). Let ρ ∈ R and ν be a probability
measure on (R, B(R)). Let {ηj }j≥1 be iid with distribution ν and X0 be a
random variable independent of {ηj }j≥1 and with distribution µ. Let
Xn+1 = ρXn + ηn+1 , n ≥ 0.
Then {Xn }n≥0 is a R-valued Markov chain with transition function
P (x, A) ≡ ν(A − ρx) and initial distribution µ.
Example 14.2.5: (Random AR(1) vector time series). Let {(Ai , bi )}i≥1
be iid such that Ai is a k × k matrix and bi is a k × 1 vector. Let µ
be a probability measure on (Rk , B(Rk )). Let X0 be a Rk -valued random
variable independent of (Ai , bi )i≥1 and with distribution µ. Let
Xn+1 = An+1 Xn + bn+1 , n ≥ 0.
Then {Xn }n≥0 is a Rk -valued Markov chain with transition function
P (x, B) ≡ P (A1 x + b1 ∈ B) and initial distribution µ.
Example 14.2.6: (Waiting time chain). Let {ηi }i≥1 be iid real valued
random variable with distribution ν and X0 be independent of {ηi }i≥1 with
distribution µ. Let
:
;
Xn+1 = max Xn + ηn+1 , 0 .
Then {Xn }n≥0 is !a nonnegative valued "Markov chain with transition function P (x, A) ≡ P max{x + η1 , 0} ∈ A and initial distribution µ. In the
460
14. Markov Chains and MCMC
queuing theory context, if ηn represents the difference between the nth interarrival time and service time, then Xn represents the waiting time at
the nth arrival.
All the above are special cases of the following:
Example 14.2.7: (Iterated function system (IFS)). Let (S, S) be a measurable space. Let (Ω, F, P ) be a probability space. Let {fi (x, ω)}i≥1 be
such that for each i, fi : S × Ω → S is (S × F, S)-measurable and the
stochastic processes {fi (·, ω)}i≥1 are iid. Let X0 be a S-valued random
variable on (Ω, F, P ) with distribution µ and independent of {fi (·, ·)}i≥1 .
Let
Xn+1 (ω) = fn+1 (Xn (ω), ω), n ≥ 0.
Then {Xn }n≥0 is an S-valued Markov chain with transition function
P (x, A) ≡ P (f1 (x, ω) ∈ A) and initial distribution µ.
It turns out that when S is a Polish space with S as the Borel σalgebra and P (·, ·) is a transition function on S, any S-valued Markov chain
{Xn }n≥0 with transition function P (·, ·) can be generated by an IFS as in
Example 14.2.7. For a proof, see Kifer (1988) and Athreya and Stenflo
(2003). When {fi }i≥1 are iid such that f1 has only finite many choices
{hj }kj=1 , where each hj is an affine contraction on Rp , then the Markov
chain {Xn } converges in distribution to some π(·). Further, the limit point
set of {Xn } coincides w.p. 1 with the support M of the limit distribution
π(·). This has been exploited by Barnsley and others to solve the inverse
problem: given a compact set M in Rp , find an IFS {hj }kj=1 , of affine contractions so that by generating the Markov chain {Xn }, one can get an
approximate picture of M . This is called data compression or image generation by Markov chain Monte Carlo. See Barnsley (1992) for details on
this. More generally, when {fi } are Lipschitz maps, the following holds.
Theorem 14.2.3: Let {fi (·, ·)}i≥1 be iid Lipschitz maps on S. Assume
(i) E| log!s(f1 )| < ∞ "and E log s(f1 ) < 0, where s(f1 )
d f1 (x, ω), f1 (y, ω)
and d(·, ·) is the metric on S, and
sup
d(x, y)
x.=y
=
!
"+
(ii) for some x0 , E log d(f1 (x0 , ω), x0 ) < ∞.
Then,
!
"
(i) X̂n (x, ω) ≡ f1 f2 (. . . fn−1 (fn (x, ω), ω)) . . . converges w.p. 1 to a
random variable X̂(ω) that is independent of x,
!
"
(ii) for all x, Xn (x, ω) ≡ fn fn−1 . . . (f1 (x, ω), ω) . . . converges in distribution to X̂(ω).
14.2 Markov chains on a general state space
461
That (ii) is a consequence of (i) is clear since for each n, x, Xn (x, ω) and
X̂n (x, ω) are identically distributed.
The proof of (i) involves showing that {X̂n } is a Cauchy sequence in S
w.p. 1 (Problem 14.17).
14.2.3
Chapman-Kolmogorov equations
Let P (·, ·) be a transition function on (S, S). For each n ≥ 0, define a
sequence of functions {P (n) (·, ·)}n≥0 by the iteration scheme
P
(n+1)
(x, A) =
%
S
P (n) (y, A)P (x, dy), n ≥ 0,
(2.3)
where P (0) (x, A) ≡ IA (x). It can be verified by induction that for each n,
P (n) (·, ·) is a transition probability function.
Definition 14.2.4: P (n) (·, ·) defined by (2.3) is called the n-step transition
function generated by P (·, ·).
It is easy to show by induction that if X0 = x w.p. 1, then
P (Xn ∈ A) = P (n) (x, A)
for all n ≥ 0
(2.4)
(Problem 14.18). This leads to the Chapman-Kolmogorov equations.
Proposition 14.2.4: Let P (·, ·) be a transition probability function on
(S, S) and let P (n) (·, ·) be defined by (2.3). Then for any n, m ≥ 0,
%
P (n+m) (x, A) = P (n) (y, A)P (m) (x, dy).
(2.5)
Proof: The analytic verification is straightforward by induction. One can
verify this probabilistically using the Markov property. Indeed, from (2.4)
the left side of (2.5) is
Px (Xn+m ∈ A)
"
!
Ex P (Xn+m ∈ A | Fm )
"
!
= Ex P (n) (Xm , A) (by Markov property)
= right-hand side of (2.5),
=
where Ex , Px denote expectation and probability distribution of {Xn }n≥0
when X0 = x w.p. 1.
!
From the above, one sees that the study of the limit behavior of the
distribution of Xn as n → ∞ can be reduced analytically to the study of
the n-step transition probabilities. This in turn can be done in terms of
462
14. Markov Chains and MCMC
the operator P on the Banach space B(S, R) of bounded measurable real
valued functions from S to R (with sup norm), defined by
%
(P h)(x) ≡ Ex h(X1 ) ≡ h(y)P (x, dy).
(2.6)
It is easy to verify that P is a positive bounded linear operator on B(S, R)
of norm one. The Chapman-Kolmogorov equation (2.4) is equivalent to
saying that Ex h(Xn ) = (P n h)(x). Thus, analytically the study of the limit
distribution of {Xn }n≥0 can be reduced to that of the sequence {P n }n≥0 of
the operator P . However, probabilistic approaches via the notion of Harris
irreducibility and recurrence when applicable and via the notion of Feller
continuity when S is a Polish space are more fruitful and will be developed
below.
14.2.4
Harris irreducibility, recurrence, and minorization
14.2.4.1 Definition of irreducibility
Recall that a Markov chain {Xn }n≥0 with a discrete state space S and
transition probability matrix P ≡ ((pij )) is irreducible if for every i, j in S,
i leads to j, i.e.,
P (Xn = j
for some
n ≥ 1 | X0 = i)
≡ Pi (Tj < ∞) ≡ fij > 0,
where Tj = min{n : n ≥ 1, Xn = j} is the time of first visit to j.
To generalize this to the case of general state spaces, one starts with the
notion of first entrance time or hitting time (also called the first passage
time).
Definition 14.2.5: For any A ∈ S, the first entrance time to A is defined
as
/
min{n : n ≥ 1, Xn ∈ A}
TA ≡
∞ if Xn )∈ A for any n ≥ 1.
Since the event {TA = 1} = {X1 ∈ A} and for k ≥ 2, {TA = k} = {X1 )∈
A, X2 )∈ A, . . . , Xk−1 )∈ A, Xk ∈ A} is an element of Fk = σ1{Xj : j ≤ k}2,
TA is a stopping time w.r.t. the filtration {Fn }n≥1 (cf. Chapter 13).
Definition 14.2.6: Let φ be a nonzero σ-finite measure on (S, S). The
Markov chain {Xn }n≥0 (or equivalently, its transition function P (·, ·)) is
said to be φ-irreducible (or irreducible in the sense of Harris with reference
measure φ) if for any A in S,
φ(A) > 0 ⇒ L(x, A) ≡ Px (TA < ∞) > 0
(2.7)
for all x in S. This says that if a set A in S is considered important by the
measure φ (i.e., φ(A) > 0), then so does the chain {Xn }n≥0 starting from
14.2 Markov chains on a general state space
463
$∞
any x in S. If G(x, A) ≡ n=1 P n (x, A) is the Greens function associated
with P , then (2.7) is equivalent to
φ(A) > 0 ⇒ G(x, A) > 0
for all x ∈ S,
(2.8)
i.e., φ(·) is dominated by G(x, ·) for all x in S.
14.2.4.2 Examples
Example 14.2.7: If S is countable and φ is the counting measure on S,
then the irreducibility of a Markov chain {Xn }n≥0 with state space S is
the same as φ-irreducibility.
Example 14.2.8: If {Xn }n≥0 are iid with distribution ν, then it is νirreducible.
Example 14.2.9: It can be verified that the random walk with jump
distribution ν (Example 14.2.2) is φ-irreducible with the Lebesgue measure
as φ if ν has a nonzero absolutely continuous component with a positive
density on some open interval (Problem 14.19 (a)).
Example 14.2.10: The AR(1) with η1 having a nontrivial absolutely
continuous component can be shown to be φ-irreducible for some φ. On
the other hand the AR(1) chain
Xn+1 =
1
Xn
+ ηn , n ≥ 0,
2
2
where {ηn }n≥1 are iid Bernoulli ( 12 ) random variables, is not φ-irreducible
for any φ. In general, if {Xn }n≥0 is a Markov chain that has a discrete
distribution for each n and has a limit distribution that is nonatomic, then
{Xn }n≥0 cannot be φ-irreducible for any φ (Problem 14.19 (b)).
The waiting time chain (Example 14.2.6) is irreducible w.r.t. φ ≡ δ0 , the
delta measure at 0 if P (η1 < 0) > 0 (Problem 14.20).
It can be shown that if {Xn }n≥0 is φ-irreducible for some σ-finite φ, then
there exists a probability measure ψ such that {Xn }n≥0 is ψ-irreducible and
it is maximal in the sense that if {Xn }n≥0 is φ̃-irreducible for some φ̃, then
φ̃ is dominated by ψ. See Nummelin (1984).
14.2.4.3 Harris recurrence
A Markov chain {Xn }n≥0 that is Harris irreducible with reference measure
φ is said to be Harris recurrent if
A ∈ S, φ(A) > 0 ⇒ Px (TA < ∞) = 1
for all x
in S.
(2.9)
Recall that irreducibility requires only that Px (TA < ∞) be > 0.
When S is countable and φ is the counting measure, this reduces to the
usual notion of irreducibility and recurrence. If S is not countable but has
464
14. Markov Chains and MCMC
a singleton ∆ such that Px (T∆ < ∞) = 1 for all x in S, then the chain
{Xn }n≥0 is Harris recurrent with respect to the measure φ(·) ≡ δ∆ (·), the
delta measure at ∆. The waiting time chain (Example 14.2.6) has such a ∆
in 0 if Eη1 < 0 (Problem 14.20). If such a recurrent singleton ∆ exists, then
the sample paths of {Xn }n≥0 can be broken into iid excursions by looking
at the chain between consecutive returns to ∆. This in turn will allow a
complete extension of the basic limit theory from the countable case to this
special case. In general, such a singleton may not exist. For example, for the
AR(1) sequence with {ηi }i≥1 having a continuous distribution, Px (Xn = x
for some n ≥ 1) = 0 for all x. However, it turns out that for Harris
recurrent chains, such a singleton can be constructed via the regeneration
theorem below established independently by Athreya and Ney (1978) and
Nummelin (1978).
14.2.5
The minorization theorem
A remarkable result of the subject is that when S is countably generated, a
Harris recurrent chain can be embedded in a chain that has a recurrent singleton. This is achieved via the minorization theorem and the fundamental
regeneration theorem below.
Theorem 14.2.5: (The minorization theorem). Let (S, S) be such that
S is countably generated. Let {Xn }n≥0 be a Markov chain with state space
(S, S) and transition function P (·, ·) such that it is Harris irreducible with
reference measure φ(·). Then the following minorization hypothesis holds.
(i) (Hypothesis M). For every B0 ∈ S such that φ(B0 ) > 0, there exists
a set A0 ⊂ B0 , an integer n0 ≥ 1, a constant 0 < α < 1, and a
probability measure ν on (S, S) such that φ(A0 ) > 0 and for all x in
A0 ,
P n0 (x, A) ≥ αν(A) for all A ∈ S.
(ii) (The C-set lemma). For any set B0 ∈ S such that φ(B0 ) > 0, there
exists a set A0 ⊂ B0 , an n0 ≥ 1, a constant 0 < α$ < 1 such that for
x, y in A0 ,
pn0 (x, y) ≥ α$ ,
where pn0 (x, ·) is the Radon-Nikodym derivative of the absolutely continuous component of
P n0 (x, ·)
w.r.t.
φ(·).
The proof of the C-set lemma is a nice application of the martingale
convergence theorem (see Orey (1971)). The proof of Theorem 14.2.5 (i)
using the C-set lemma is easy and is left as an exercise (Problem 14.21).
14.2 Markov chains on a general state space
14.2.6
465
The fundamental regeneration theorem
Theorem 14.2.6: Let {Xn }n≥0 be a Markov chain with state space (S, S)
and transition function P (·, ·). Suppose there exists a set A0 ∈ S, a constant
0 < α < 1, a probability measure ν(·) on (S, S) such that for all x in A0 ,
P (x, A) ≥ αν(A)
for all
A ∈ S.
(2.10)
Suppose, in addition, that for all x in S,
Px (TA0 < ∞) = 1.
(2.11)
Then, for any initial distribution µ, there exists a sequence of random times
{Ti }i≥1 such that under Pµ , the sequence of excursions ηj ≡ {XTj +r , 0 ≤
r < Tj+1 − Tj , Tj+1 − Tj }, j = 1, 2, . . . are iid with XTj ∼ ν(·).
Proof: For x in A0 , let
Q(x, ·) =
P (x, ·) − αν(·)
.
(1 − α)
(2.12)
Then, (2.10) implies that for x in A0 , Q(x, ·) is a probability measure on
(S, S). For each x in A0 and n ≥ 0, let ηn+1 , δn+1 and Yn+1,x be independent
random variables such that P (ηn+1 ∈ ·) = ν(·), δn+1 is Bernoulli (α), and
P (Yn+1,x ∈ ·) = Q(x, ·). Then given Xn = x in A0 , Xn+1 can be chosen to
be
@
ηn+1
if δn+1 = 1
(2.13)
Xn+1 =
Yn+1,x if δn+1 = 0
to ensure that Xn+1 has distribution P (x, ·). Indeed, for x in A0 ,
P (ηn+1 ∈ ·, δn+1 = 1) + P (Yn+1,x ∈ ·, δn+1 = 0)
= ν(·)α + (1 − α)Q(x, ·) = P (x, ·).
Thus, each time the chain enters A0 , there is a probability α that the
position next time will have distribution ν(·), independent of x ∈ A0 as
well as of all past history, i.e., that of starting afresh with distribution ν(·).
Now if (2.11) also holds, then for any x in S, by Markov property, the chain
enters A0 infinitely often w.p. 1 (Px ).
Let τ1 < τ2 < τ3 < · · · denote the times of successive visits to A0 . Since
Xτi ∈ A0 , by the above construction (cf. (2.13)), there is a probability
α > 0 that Xτi +1 will be distributed as ν(·), completely independent of
Xj for j ≤ τi and τi . By comparison with coin tossing, this implies that
for any x, w.p. 1 (Px ), there is a finite index i0 such that δτi0 +1 = 1 and
hence Xτi0 +1 will be distributed as ν(·), independent of all history of the
chain {Xn }n≥0 including that of the δτi +1 and Yτi +1 , i ≤ i0 up to the time
τi0 . That is, at τi0 +1 the chain starts afresh with distribution ν(·). Thus,
466
14. Markov Chains and MCMC
it follows that for any initial distribution µ, there is a random time T such
that XT is distributed as ν(·) and is independent of all history up to T − 1.
More precisely, for any µ, Pµ (T < ∞) = 1 and
Pµ (Xj ∈ Aj , j = 0, 1, 2, . . . , n + k, T = n + 1)
= Pµ (Xj ∈ Aj , j = 0, 1, 2, . . . , n, T = n + 1) ×
Pν (X0 ∈ An+1 , X1 ∈ An+2 , . . . , Xk−1 ∈ An+k ) .
Since this is true for any µ, it is true for µ = ν and hence the theorem
follows.
!
A consequence of the above theorem is following.
Theorem 14.2.7: Suppose in Theorem 14.2.6, instead of (2.10) and
(2.11), the following holds.
There exists an n0 ≥ 1 such that for all x in A0 , A in S,
P n0 (x, A) ≥ αν(A)
(2.14)
and for all x in A0 ,
Px (Xnn0 ∈ A0
for some
n ≥ 1) = 1
(2.15)
x in S.
(2.16)
and
Px (TA0 < ∞) = 1
for all
Let Yn ≡ Xnn0 , n ≥ 0 (where nn0 stands for the product of n and n0 ).
Then, for any initial distribution µ, there exist random times {Ti }i≥1 such
that under Pµ , the sequence ηj ≡ {Yj : Tj+r , 0 ≤ r < Tj+1 − Tj , Tj+1 − Tj }
for j = 1, 2, . . . are iid with YTj ∼ ν(·).
Proof: For any initial distribution µ such that µ(A0 ) = 1, the theorem
follows from Theorem 14.2.6 since (2.14) and (2.15) are the same as (2.10)
and (2.11) for the transition function P n0 (·, ·) and the chain {Yn }n≥0 . By
!
(2.16) for any other µ, Pµ (TA0 < ∞) = 1.
Given a realization of the Markov chain {Yn ≡ Xnn0 }n≥0 , it is possible
to construct a realization of the full Markov chain {Xn }n≥0 by “filling the
gaps” {Xj : kn0 + 1 ≤ j ≤ (k + 1)n0 − 1} as follows: Given Xkn0 = x,
X(k+1)n0 = y, generate an observation from the conditional distribution of
(X1 , X2 , . . . , Xn0 −1 ), given X0 = x, Xn0 = y. This leads to the following.
Theorem 14.2.8: Under the set up of Theorem 14.2.7, the “excursions”
:
;∞
η̃j ≡ Xn0 Tj +k : 0 ≤ k < n0 (Tj+1 − Tj ), Tj+1 − Tj j=1 ,
are identically distributed and are one dependent, i.e., for each r ≥ 1, the
collections {η̃1 , η̃2 , . . . , η̃r } and {η̃r+2 , η̃r+3 , . . .} are independent.
14.2 Markov chains on a general state space
467
Proof: Note that in applying the regeneration method to the sequence
{Yn }n≥0 and then doing the “filling the gaps” lead to the common portion
X(Tj −1)n0 +r 0 ≤ r ≤ n0
with given the values X(Tj −1)n0 and XTj n0 . This makes two successive η̃j−1
and η̃j dependent. But Markov property renders η̃j and η̃j+2 independent.
!
This yields the one-dependence of {η̃j }j≥1 .
By the C-set lemma and the minorization Theorem 14.2.5, φ-recurrence
yields the hypothesis of Theorem 14.2.7.
Theorem 14.2.9: Let {Xn }n≥0 be a φ-recurrent Markov chain with state
space (S, S), where S is countably generated. Then there exist an A0 in S,
n0 ≥ 1, 0 < α < 1 and a probability measure ν such that (2.14), (2.15),
and (2.16) hold and hence, the conclusions of Theorem 14.2.8 hold.
Thus, φ-recurrence implies that the Markov chain {Xn }n≥0 is regenerative (defined fully below). This makes the law of large numbers for iid
random variables available to such chains. The limit theory of regenerative
sequences developed in Section 8.5 is reviewed below and by the above
results, such a theory will hold for φ-recurrent chains.
14.2.7
Limit theory for regenerative sequences
Definition 14.2.7: Let (Ω, F, P ) be a probability space and (S, S) be
a measurable space. A sequence of random variables {Xn }n≥0 defined on
(Ω, F, P ) with values in (S, S) is called regenerative if there exists a sequence of random times 0 < T1 < T2 < T3 < · · · such that the excursions
ηj ≡ {Xn : Tj ≤ n < Tj+1 , Tj+1 − Tj }j≥1 are iid, i.e.,
"
!
P Tj+1 − Tj = kj , XTj +, ∈ A,,j , 0 ≤ 2 < kj , j = 1, 2, . . . , r
r
K
!
"
=
P T2 − T1 = kj , XT1 +, ∈ A,,j , 0 ≤ 2 < kj
(2.17)
j=1
for all k1 , k2 , . . . , kr ∈ N and A,,j ∈ S, 1 ≤ 2 ≤ kj , j = 1, . . . , r.
Example 14.2.11: Any Markov chain {Xn }n≥0 with a countable state
space S that is irreducible and recurrent is regenerative with {Ti }i≥1 being
the times of successive returns to a given state ∆.
Example 14.2.12: Any Harris recurrent chain satisfying the minorization
condition (2.10) is regenerative by Theorem 14.2.6.
Example 14.2.13: The waiting time chain (Example 14.2.6) with Eη1 < 0
is regenerative with {Ti }i≥1 being the times of successive returns of
{Xn }n≥0 to zero.
468
14. Markov Chains and MCMC
Example 14.2.14: An example of a non-Markov sequence {Xn }n≥0 that
is regenerative is a semi-Markov chain. Let {yn }n≥0 be a Harris recurrent Markov chain satisfying (2.10). Given {yn = an }n≥0 , let {Ln }n≥0 be
independent positive integer valued random variables. Set

y0
0 ≤ j < L0




y1
L0 ≤ j < L0 + L1
Xj =
y
L0 + L1 ≤ j < L0 + L1 + L2
2



 ..
.
Then {Xn }n≥0 is called a semi-Markov chain with embedded Markov chain
{yn }n≥0 and sojourn times {Ln }n≥0 . It is regenerative if {Ti }i≥1 are defined
$Ni −1
Lj , where {Ni }i≥1 are the successive regeneration times for
by Ti = j=0
{yn } as in Theorem 14.2.7.
Theorem 14.2.10: Let {Xn }n≥0 be a regenerative sequence with regen! $T2 −1
"
eration times {Ti }i≥1 . Let π̃(A) ≡ E
j=T1 IA (Xj ) for A ∈ S. Suppose
π̃(S) ≡ E(T2 − T1 ) < ∞. Let π(·) = π̃(·)/π̃(S). Then
%
n
1)
f (Xj ) → f dπ w.p. 1 for any f ∈ L1 (S, S, π).
(i)
n j=0
n
(ii) µn (·) ≡
1)
P (Xj ∈ ·) → π(·) in total variation.
n j=0
(iii) If the distribution of T2 − T1 is aperiodic, then P (Xn ∈ ·) → π(·) in
total variation.
Proof: To prove (i) it suffices to consider nonnegative f . For each n, let
Nn = k if Tk ≤ n < Tk+1 . Let
Yi =
Ti+1 −1
)
j=Ti
Then
Y0 +
f (Xj ), i ≥ 1
N)
n −1
i=1
Yi ≤
n
)
i=0
and Y0 =
f (Xi ) ≤ Y0 +
T)
1 −1
f (Xj ).
j=0
Nn
)
Yi .
(2.18)
i=1
$Nn −1
$Nn
By the SLLN, N1n i=1
Yi and N1n i=1
Yi converge to EY1 w.p. 1 and
!
"−1
Nn
. It follows from (2.18) that
n → E(T2 − T1 )
&
%
n
f dπ̃
EY1
1)
=
= f dπ,
f (Xi ) =
lim
n→∞ n
E(T2 − T1 )
π̃(S)
i=0
14.2 Markov chains on a general state space
469
establishing (i).
By taking f = IA and using the BCT, one concludes from (i) that
µn (A) → π(A) for every A in S. Since µn and π are probability measures,
this implies that µn → π in total variation, proving (ii).
To prove (iii), note that for any bounded measurable f , an ≡ Ef (XT1 +n )
satisfies
an
n
!
" )
!
"
= E f (XT1 +n )I(T2 − T1 > n) +
E f (XT1 +n )I(T2 − T1 = r)
r=1
n
)
!
"
= bn +
E f (XT2 +n−r ) P (T2 − T1 = r)
r=1
=
bn +
n
)
an−r pr ,
r=1
where pr = P (T2 − T1 = r). Now by the discrete renewal theorems from
Section 8.5 (which applies since ET2 −T1 < ∞ and T2 −T1 has an aperiodic
distribution), (iii) follows.
!
Remark 14.2.1: Since the strong law is valid for any m-dependent (m <
∞) and stationary sequence of random variables, Theorem 14.2.10 is valid
even if the excursions {ηj }j≥1 are m-dependent.
14.2.8
Limit theory of Harris recurrent Markov chains
The minorization theorem, the fundamental regeneration theorem, and
the limit theorem for regenerative sequences, i.e., Theorems 14.2.5, 14.2.6,
14.2.7, and 14.2.10, are the essential components of a limit theory for Harris
recurrent Markov chains that parallels the limit theory for discrete state
space irreducible recurrent Markov chains.
Definition 14.2.8: A probability measure π on (S, S) is called stationary
for a transition function P (·, ·) if
π(A) =
%
S
P (x, A)π(dx)
for all A ∈ S.
Note that if X0 ∼ π, then Xn ∼ π for all n ≥ 1, justifying the term
“stationary.”
Theorem 14.2.11: Let {Xn }n≥0 be a Harris recurrent Markov chain
with state space (S, S) and transition function P (·, ·). Let S be countably
generated. Suppose there exists a stationary probability measure π. Then,
(i) π is unique.
470
14. Markov Chains and MCMC
(ii) (The law of large numbers). For all f ∈ L1 (S, S, π), for all x ∈ S,
%
n−1
1)
f (Xj ) → f dπ w.p. 1 (Px ).
n j=0
(iii) (Convergence of n-step probabilities). For all x ∈ S
µn,x (·) ≡
n−1
1)
Px (Xj ∈ ·) → π(·)
n j=0
in total variation.
Proof: By Harris recurrence and the minorization Theorem 14.2.5, there
exist a set A0 ∈ S, a constant 0 < α < 1, an integer n0 ≥ 1, a probability
measure ν such that
and
for all x ∈ A0 , A ∈ S, P n0 (x, A) ≥ αν(A)
(2.19)
for all x in S, Px (TA0 < ∞) = 1.
(2.20)
For simplicity of exposition, assume that n0 = 1. (The general case n0 > 1
can be reduced to this by considering the transition function P n0 .)
Let the sequence {ηn , δn , Yn,x }n≥1 and the regeneration times {Ti }i≥1
be as in Theorem 14.2.6. Recall that the first regeneration time T1 can be
defined as
(2.21)
T1 = min{n : n > 0, Xn−1 ∈ A0 , δn = 1}
and the succeeding ones by
Ti+1 = min{n : n > Ti , Xn−1 ∈ A0 , δn = 1},
(2.22)
and that XTi are distributed as ν independent of the past. Let for n ≥ 1,
Nn = k if Tk ≤ n < Tk+1 . By Harris recurrence, for all x in S, Nn → ∞
w.p. 1 (Px ) and by the SLLN, for all x in S,
1
Nn
→
w.p. 1 (Px ).
n
Eν T 1
and hence, by the BCT,
1
Nn
→
.
n
Eν T 1
On the other hand, for any k ≥ 1, x ∈ S,
Ex
Px (a regeneration occurs at k)
Thus
n
Ex
= Px (Xk−1 ∈ A0 , δk = 1)
= Px (Xk−1 ∈ A0 )α.
1)
Nn
=
Px (Xk−1 ∈ A0 )α
n
n
k=1
14.2 Markov chains on a general state space
471
and hence, for all x in S,
n−1
1
1)
Px (Xj ∈ A0 ) →
.
n j=0
αEν T1
Now let π be a stationary measure for P (·, ·). Then
%
π(A0 ) = Px (Xj ∈ A0 )π(dx) for all j = 0, 1, 2, . . .
and hence
nπ(A0 ) =
% n−1
)
j=0
Px (Xj ∈ A0 )π(dx).
(2.23)
$∞
Since G(x, A0 ) ≡ j=0 Px (Xj ∈ A0 ) > 0 for all x in S, by Harris recurrence
(Harris irreducibility will do for this), it follows that π(A0 ) > 0. Dividing
both sides of (2.23) by n and letting n → ∞ yield
π(A0 ) =
1
αEν T1
and hence that Eν T1 < ∞. Since Eν T1 ≡ E(T2 − T1 ) < ∞, by Theorem
14.2.10, for all x in S, A ∈ S,
! $T −1
"
n−1
Eν
1)
j=0 IA (Xj )
.
Px (Xj ∈ A) →
n j=0
Eν (T1 )
Integrating the left side with respect to π yields that for any A ∈ S,
"
! $T −1
Eν
j=0 IA (Xj )
π(A) =
,
Eν (T1 )
thus establishing the uniqueness of π, i.e., establishing (i) of Theorem
14.2.11. The other two parts follow from the regeneration Theorem 14.2.6
and the limit Theorem 14.2.10.
!
Remark 14.2.2: Under the assumption n0 = 1 that was made at the
beginning of the proof, it also follows that
in total variation.
Px (Xj ∈ ·) → π(·)
(2.24)
This also holds if the g.c.d. of the n0 ’s for which there exist A0 , α, ν
satisfying (2.19) is one.
Remark 14.2.3: A necessary and sufficient condition for the existence of
a stationary distribution for a Harris recurrent chain is that there exists a
set {A0 , α, ν, n0 } satisfying (2.19) and (2.20) and
Eν TA0 < ∞.
472
14. Markov Chains and MCMC
A more general result than Theorem 14.2.11 is the following that was
motivated by applications to Markov chain Monte Carlo methods.
Theorem 14.2.12: Let {Xn }n≥0 be a Markov chain with state space (S, S)
and transition function P (·, ·). Suppose (2.19) holds for some (A0 , α, ν, n0 ).
Suppose π is a stationary probability measure for P (·, ·) such that
!
"
π {x : Px (TA0 < ∞) > 0} = 1.
(2.25)
Then, for π-almost all x,
(i) Px (TA0 < ∞) = 1.
$n−1
(ii) µn,x (·) = n1 j=0 Px (Xj ∈ ·) → π(·) in total variation.
(iii) For any f ∈ L1 (S, S, π),
n
1)
f (Xj ) →
n j=0
(iv)
1
n
$n
j=0
Ex f (Xj ) →
&
%
f dπ w.p. 1 (Px ).
f dπ.
If, further the g.c.d. {m: there exists αm > 0 such that for all x in A0 ,
P m (x, ·) ≥ αν(·)} = 1, then (ii) can be strengthened to
Px (Xn ∈ ·) → π(·) in total variation.
The key difference between Theorems 14.2.11 and 14.2.12 is that the latter
does not require Harris recurrence which is often difficult to verify. On
the other hand, the conclusions of Theorem 14.2.12 are valid only for πalmost all x unlike for all x in Theorem 14.2.11. In MCMC applications, the
existence of a stationary measure is given (as it is the ‘target distribution’)
and the minorization condition is more easy to verify as is the milder form of
irreducibility condition (2.25). (Harris irreducibility will require Px (TA0 <
∞) > 0 for all x in S.) For a proof of Theorem 14.2.12 and applications to
MCMC, see Athreya, Doss and Sethuraman (1996).
Example 14.2.15: (AR(1) time series) (Example 14.2.4). Suppose η1 has
an absolutely continuous component and that |ρ| < 1. Then the chain is
Harris
and admits a stationary probability distribution π(·) of
$∞ recurrent
j
ρ
η
and
the
Px (Xn ∈ ·) → π(·) in total variation for any x.
j
j=0
Example 14.2.16: (Waiting time chain) (Example 14.2.6). If Eη1 < 0,
the state 0 is recurrent and hence the Markov chain is Harris recurrent.
Also a stationary distribution π does exist. It is known that$π is the same
j
as the distribution of M∞ ≡ supj≥0 Sj , where S0 = 0, Sj = i=1 ηi , j ≥ 1,
{ηi }i≥1 being iid.
14.2 Markov chains on a general state space
14.2.9
473
Markov chains on metric spaces
14.2.9.1 Feller continuity
Let (S, d) be a metric space and S be the Borel σ-algebra in S. Let P (·, ·)
be a transition function. Let {Xn }n≥0 be Markov chain with state space
(S, S) and transition function P (·, ·).
Definition 14.2.9: The transition function P (·, ·) is called Feller continuous (or simply
Feller) if xn → x in S& ⇒ P (xn , ·) −→d P (x, ·) i.e.
&
(P f )(xn ) ≡ f (y)P (xn , dy) → (P f )(x) ≡ f (y)P (x, dy) for all bounded
continuous f : S → R. In terms of the Markov chain, this says
!
"
!
"
E f (X1 ) | X0 = xn → E f (X1 ) | X0 = x if xn → x.
Example 14.2.17: Let (Ω, F, P ) be a probability space and h : S×Ω → S
be jointly measurable and h(·, ω) be continuous w.p. 1. Let P (x, A) ≡
P (h(x, ω) ∈ A) for x ∈ S, A ∈ S. Then P (·, ·) is a Feller continuous
transition function. Indeed, for any bounded continuous f : S → R
%
!
"
(P f )(x) ≡ f (y)P (x, dy) = Ef h(x, ω) .
Now, xn → x
⇒ h(xn , ω) → h(x, ω) w.p. 1
!
"
!
"
⇒ f h(xn , ω) → f h(x, ω) w.p. 1 (by continuity of f )
!
"
⇒ Ef h(xn , ω) → Ef h(x, ω) (by bounded convergence theorem).
The first five examples of Section 14.2.4 fall in this category. If h is discontinuous, then P (·, ·) need not be Feller (Problem 14.22). That P (·, ·) is a
transition function requires only that h(·, ·) be jointly measurable (Problem
14.23).
14.2.9.2 Stationary measures
A general method of finding a stationary measure for a Feller transition
function P (·, ·) is to consider weak or vague limits of the occupation mea$n−1
sures µn,λ (A) ≡ n1 j=0 Pλ (Xj ∈ A), where λ is the initial distribution.
Theorem 14.2.13: Fix an initial distribution λ. Suppose a probability
measure µ is a weak limit point of {µn,λ }n≥1 . That is, for some n1 < n2 <
n3 < · · ·, µnk ,λ −→d µ. Assume P (·, ·) is Feller. Then µ is a stationary
probability measure for P (·, ·).
Proof: Let f : S → R be continuous and bounded. Then
%
%
f (y)µnk ,λ (dy) → f (y)µ(dy).
474
14. Markov Chains and MCMC
But the left side equals
nk −1
1 )
Eλ f (Xj )
nk j=0
=
=
=
nk −1
1
1 )
Eλ f (X0 ) +
Eλ f (Xj )
nk
nk j=1
nk −1
1
1 )
Eλ f (X0 ) +
Eλ (P f )(Xj−1 )
nk
nk j=1
!
"
since by Markov property, for j ≥ 1, Eλ f (Xj ) = Eλ (P f )(Xj−1 )
nk −1
1
1 )
1
Eλ f (X0 ) +
Eλ (P f )(Xj ) −
Eλ (P f )(Xnk −1 ).
nk
nk j=0
nk
The first and third term on the right side
& go to zero since f is bounded
and nk → ∞. The second term goes to (P f )(y)µ(dy) since by the Feller
property P f is a bounded continuous function. Thus,
%
%
f (y)µ(dy) =
(P f )(y)µ(dy)
S
%S ' %
(
=
f (z)P (y, dz) µ(dy)
S
%S
=
f (z)(µP )(dz)
S
where
µP (A) ≡
%
S
P (y, A)µ(dy).
This being true for all bounded continuous f , it follows that µ = µP , i.e.,
µ is stationary for P (·, ·).
!
A more general result is the following.
Theorem 14.2.14: Let λ be an initial distribution. Let µ be a subprobability measure (i.e., µ(S) ≤ 1) such
& that for some
& n1 < n2 < n3 < · · ·, {µnk ,λ }
converges vaguely to µ, i.e., f dµnk ,λ → f dµ for all f : S → R continuous with compact support. Suppose there exists an approximate identity
{gn }n≥1 for S, i.e., for all n, gn is a continuous function from S to [0,1]
with compact support and for every x in S, gn (x) ↑ 1 as n → ∞. Then µ
is stationary for P (·, ·), i.e.,
%
µ(A) = P (x, A)µ(dx) for all A ∈ S.
S
14.2 Markov chains on a general state space
475
For a proof, see Athreya (2004).
If S = Rk for some k < ∞, S admits an approximate identity. Conditions
to ensure that there is a vague limit point µ such that µ(S) > 0 is provided
by the following.
Theorem 14.2.15: Suppose there exists a set A0 ∈ S, a function V : S →
[0, ∞) and numbers 0 < α, M < ∞ such that
(P V )(x) ≡ Ex V (X1 ) ≤ V (x) − α
and
for
x∈
/ A0
!
"
sup Ex V (X1 ) − V (x) ≡ M < ∞.
(2.26)
(2.27)
x∈A0
Then, for any initial distribution λ,
lim µn,λ (A0 ) ≥
n→∞
Proof: For j ≥ 1,
Eλ V (Xj ) − Eλ V (Xj−1 )
α
.
α+M
(2.28)
"
!
Eλ P V (Xj−1 ) − V (Xj−1 )
'!
(
"
= Eλ P V (Xj−1 ) − V (Xj−1 ) IA0 (Xj−1 )
'!
(
"
+ Eλ P V (Xj−1 ) − V (Xj−1 ) IAc0 (Xj−1 )
=
≤
/ A0 ).
M Pλ (Xj−1 ∈ A0 ) − αPλ (Xj−1 ∈
Adding over j = 1, 2, . . . , n yields
"
1!
Eλ V (Xn ) − V (x) ≤ −α + (α + M )µn,λ (A0 ).
n
Since V (·) ≥ 0, letting n → ∞ yields (2.28).
!
Definition 14.2.10: A metric space (S, d) has the vague compactness
property if given any collection {µα : α ∈ I} of subprobability measures,
there is a sequence {αj } ⊂ I such that µαj converges vaguely to a subprobability measure µ. It is known by the Helly’s selection theorem that all
Euclidean spaces have this property. It is also known that any Polish space,
i.e., a complete, separable, metric space has this property (see Billingsley
(1968)).
Combining the above two results yields the following:
Theorem 14.2.16: Let P (·, ·) be a Feller transition function on a metric space (S, d) that admits an approximate identity and has the vague
compactness property. Suppose there exists a closed set A0 , a function
476
14. Markov Chains and MCMC
V : S → [0, ∞), numbers 0 < α, M < ∞ such that (2.26) and (2.27)
hold. Then there exists a stationary probability measure µ for P (·, ·).
Proof: Fix an initial distribution λ. Then the family {µn,λ }n≥1 has a
subsequence {µnk ,λ }k≥1 and a subprobability measure µ such that µnk ,λ →
µ vaguely. Since A0 is closed, this implies
lim µnk ,λ (A0 ) ≤ µ(A0 ).
k→∞
α
, by Theorem 14.2.15.
Thus µ(A0 ) ≥ lim µnk ,λ (A0 ) ≥ lim µnk ,λ (A0 ) ≥ α+M
This yields that µ(S) > 0. By Theorem 14.2.14, µ is stationary for P . So
µ(·)
µ̃(·) ≡ µ(S)
is a probability measure that is stationary for P .
!
Example 14.2.18: Consider a Markov chain generated by the iteration
of iid random logistic maps
Xn+1 = Cn+1 Xn (1 − Xn ), n ≥ 0
with § = [0, 1], {Cn }n≥1 iid with values in [0,4] and X0 is independent of
{Cn }n≥1 . Assume E log C1 > 0 and E| log(4 − C1!)| < ∞.
" Then there exists
a stationary probability measure π such that π (0, 1) = 1. This follows
from Theorem 14.2.16 by showing that if V (x) = | log x|, then there exists
A0 = [a, b] ⊂ (0, 1) and constants 0 < α, M < ∞ such that (2.26) and
(2.27) hold. For details, see Athreya (2004).
14.2.9.3 Convergence questions
If {Xn }n≥0 is a Feller Markov chain (i.e., its transition function P (·, ·) is
Feller continuous), what can one say about the convergence of the distribution of Xn as n → ∞?
If P (·, ·) admits a unique stationary probability measure π and the family
{µn,λ : n ≥ 1} is tight for a given λ, then one can conclude from Theorem
14.2.13 that every weak limit point of this family has to be π and hence π
is the only weak limit point and that for this λ, µn,λ −→d π. To go from
this to the convergence of Pλ (Xn ∈ ·) to π(·), one needs extra conditions
to rule out periodic behavior.
Since the occupation measure µn,λ (A) is the mean of the empirical measure
n−1
1)
Ln (A) ≡
IA (Xj ),
n j=0
a natural question is what can one say about the convergence of the empirical measure? This is important for the statistical estimation of π.
When the chain is Harris recurrent, it was shown in the previous section
that for each x and for each A ∈ S
Ln (A) → π(A) w.p. 1 (Px ).
14.3 Markov chain Monte Carlo (MCMC)
477
For a Feller chain admitting a unique stationary measure π, one can appeal
to the ergodic theorem to conclude that for each A in S, Ln (A) → π(A)
w.p. 1 (Px ) for π-almost all x in S. Further, if S is Polish, then one can
show that for π-almost all x, Ln (·) −→d π(·) w.p. 1 (Px ).
14.3 Markov chain Monte Carlo (MCMC)
14.3.1
Introduction
Let π be a probability measure
& on a measurable &space (S, S). Let h(·) :
S → R be S-measurable and |h|dπ < ∞ and λ = hdπ. The effort in the
computation of λ depends on the complexity of the function h(·) as well as
that of &the measure π. Clearly, a first approach is to go back to the definition of hdπ and use numerical approximation such as approximating h(·)
by a sequence of simple functions and evaluating π(·) on the sets involved
in these simple functions. However, in many situations this may not be feasible especially if the measure π(·) is specified only up to a constant that
is not easy to evaluate. Such is often the case in Bayesian statistics where
π is the posterior distribution πθ|X of the parameter θ given the data X
whose density is proportional to f (X|θ)ν(dθ), f (X|θ) being the density of
X given θ and ν(dθ) the prior distribution of θ. In such situations, objects
of interest are the posterior mean, variance, and other moments as well
as posterior probability of θ being in some set of interest. In these problems, the&main difficulty lies in the evaluation of the normalizing constant
C(X) = f (X|θ)ν(dθ). However, it may be possible to generate a sequence
of random variables {Xn }n≥1 such that the distribution of Xn gets close
to π in a suitable sense and a law of large numbers asserting that
n
1)
h(Xi ) → λ =
n i=1
%
hdπ
&
holds for a large class of h such that |h|dπ < ∞.
A method that has become very useful in Bayesian statistics in the past
twenty years or so (with the advent of high speed computing) is that of
generating a Markov chain {Xn }n≥1 with stationary distribution π. This
method has its origins in the important paper of Metropolis, Rosenbluth,
Rosenbluth, Teller and Teller (1953). For the adaptation of this method
to image processing problems, see Geman and Geman (1984).
This method is now known as the Markov chain Monte Carlo, or MCMC
for short. For the basic limit theory of Markov chains, see Section 14.2.
For proofs of the claims in the rest of this section and further details on
MCMC, see the recent book of Robert and Casella (1999). In the rest of
this section two of the widely used MCMC algorithms are discussed. These
are the Metropolis-Hastings algorithm and the Gibbs sampler.
478
14.3.2
14. Markov Chains and MCMC
Metropolis-Hastings algorithm
Let π be a probability measure on a measurable space (S, S). Let π be
dominated by a σ-finite measure µ with density f (·). Let for each x, q(y|x)
be a probability density in y w.r.t. µ. That is, q(y|x) is &jointly measurable
as a function from (S × S, S × S) → R+ and for each x, S q(y|x)µ(dy) = 1.
Such a distribution q(·|·) is called the instrumental or proposal distribution.
The Metropolis-Hastings algorithm generates a Markov chain {Xn } using
the densities f (·) and q(·) in two steps as follows:
Step 1: Given Xn = x, first generate a random variable Yn with density
q(·|x).
Step 2: Then, set Xn+1 = Yn with probability p(x, Yn ) and = Xn with
probability 1 − p(x, Yn ), where
3 f (y) q(x|y) 4
p(x, y) ≡ min
, 1 .
(3.1)
f (x) q(y|x)
Thus, the value Yn is “accepted” as Xn+1 with probability p(x, Yn ) and if
rejected the chain stays where it was, i.e., at Xn .
Implicit in the above definition is that the state space of the Markov
chain {Xn } is simply the set Af ≡ {x : f (x) > 0}. It is also assumed
that for all x, y in Af , q(y|x) > 0. The transition function P (x, A) for this
Markov chain is given by
%
P (x, A) = IA (x)(1 − r(x)) +
p(x, y)q(y|x)µ(dy)
(3.2)
A
where
r(x) =
%
S
p(x, y)q(y|x)µ(dy).
It turns out that the measure π(·) is a stationary measure for this Markov
chain {Xn }. Indeed, for any A ∈ S,
%
%
P (x, A)π(dx) =
P (x, A)f (x)µ(dx)
S
S
%
=
IA (x)(1 − r(x))f (x)µ(dx)
S
% %
+
p(x, y)q(y|x)f (x)µ(dy)µ(dx). (3.3)
S
A
By definition of p(x, y), the identity
q(y|x)f (x)p(x, y) = p(y, x)q(x|y)f (y)
(3.4)
holds for all x, y. Thus the second integral in (3.3) (using Tonelli’s theorem)
is
% '%
(
p(y, x)q(x|y)µ(dx) f (y)µ(dy)
=
A
S
14.3 Markov chain Monte Carlo (MCMC)
=
%
479
r(y)f (y)µ(dy).
A
Thus, the right side of (3.3) is
%
IA (x)f (x)µ(dx) ≡ π(A),
S
verifying stationarity.
From the results of Section 14.2, it follows that if the transition function
P (·, ·) is Harris irreducible w.r.t. some reference measure ϕ, then (since it
admits π as a stationary measure) the law of large numbers, holds i.e., for
any h ∈ L1 (π),
%
n−1
1)
h(Xj ) → hdπ
n j=0
as
n→∞
w.p. &1 for any initial distribution.
Thus, a good MCMC approximation to
$n−1
λ = hdπ is λ̂n ≡ n1 j=0 h(Xj ). A sufficient condition for irreducibility
is that q(y|x) > 0 for all (x, y) in Af × Af .
Summarizing the above discussion leads to
Theorem 14.3.1: Let π be a probability measure on a measurable space
(S, S) with probability density f (·) w.r.t. a σ-finite measure µ. Let Af =
{x : f (x) >& 0}. Let q(y|x) be a measurable function from Af ×Af → (0, ∞)
such that S q(y|x)µ(dy) = 1 for all x in Af . Let {Xn }n≥0 be a Markov
chain generated by the Metropolis-Hastings algorithm (3.1). Then, for any
h ∈ L1 (π),
%
n−1
1)
h(Xj ) → hdπ
n j=0
as
n→∞
w.p. 1
(3.5)
for any (initial) distribution of X0 .
The Metropolis-Hastings algorithm does not need the full knowledge of
the target density f (·) of π(·). The function f (·) enters the algorithm only
(y)
and
through the function p(x, y), which involves only the knowledge of ff (x)
q(·|·) and hence this algorithm can be implemented even if f is known only
up to a multiplicative constant. This is often the case in Bayesian statistics.
Also, the choice of q(x|y) depends on f (·) only through the condition that
q(x|y) > 0 on Af × Af . Thus, the Metropolis-Hastings algorithm has wide
applicability. Two special cases of this algorithm are given below.
14.3.2.1 Independent Metropolis-Hastings
Let q(y|x) ≡ g(y) where g(·) is a probability density such that g(y) > 0 if
f (y) > 0.
480
14. Markov Chains and MCMC
: (y)
;
Suppose sup fg(y)
: f (y) > 0 ≡ M < ∞. Then, in addition to the law
of large numbers (3.5) of Theorem 14.3.1, it holds that for any initial value
of X0 ,
'
1 (n
7P (Xn ∈ ·) − π(·)7 ≤ 2 1 −
M
where 7 · 7 is the total variation norm. Thus, the distribution of {Xn }
converges in total variation at a geometric rate. For a proof, see Robert
and Casella (1999).
14.3.2.2 Random-walk Metropolis-Hastings
Here the state space is the real line or a subset of some Euclidean space.
Let q(y|x) = g(y − x) where g(·) is a probability density such that g(y −
x) > 0 for all x, y such that f (x) > 0, f (y) > 0. This ensures irreducibility
and hence the law of large numbers (3.5) holds. A sufficient condition for
geometric convergence of the distribution of {Xn } in the real line case is
the following:
(a) The density f (·) is symmetric about 0 and is asymptotically log concave, i.e., it holds that for some α > 0 and x0 > 0,
log f (x) − log f (y) ≥ α|y − x|
for all y < x < −x0 or x0 < x < y.
(b) The density function g(·) is positive and symmetric.
For further special cases and more results, see Robert and Casella (1999).
14.3.3
The Gibbs sampler
Suppose π is the probability distribution of a bivariate random vector
(X, Y ). A Markov chain {Zn }n≥0 can be generated with π as its stationary
distribution using only the families of conditional distributions Q(·, y) of
X|Y = y and P (·, x) of Y |X = x for all x, y generated by the joint distribution π of (X, Y ). This Markov chain is known as the Gibbs sampler. The
algorithm is as follows:
Step 1: Start with some initial value X0 = x0 . Generate Y0 according to
the conditional distribution P (Y ∈ · | X0 = x0 ) = P (·, x0 ).
Step 2: Next, generate X1 according to the conditional distribution
P (X1 ∈ · | Y0 = y0 ) = Q(·, y0 ).
Step 3: Now generate Y1 as in Step 1 but with conditioning value X1 .
Step 4: Now generate X2 as in Step 2 but with conditioning value Y1 and
so on.
14.4 Problems
481
Thus, starting from X0 , one generates successively Y0 , X1 , Y1 , X2 , Y2 , . . ..
Clearly, the sequences {Xn }n≥0 , {Yn }n≥0 and {Zn ≡ (Xn , Yn )}n≥0 are all
Markov chains. It is also easy to verify that the marginal distribution πX
of X, the marginal distribution πY of Y , and the distribution π are, respectively, stationary for the {Xn }, {Yn }, and {Zn } chains. Indeed, if X0 ∼ πX ,
then Y0 ∼ πY and hence X1 ∼ πX . Similarly one can verify the other two
claims. Recall that a sufficient condition for the law of large numbers (3.5)
to hold is irreducibility. A sufficient condition for irreducibility in turn is
that the chain {Zn }n≥0 has a transition function R(z, ·) that, for each
z = (x, y), is absolutely continuous with respect to some fixed dominating
measure on R2 .
The above algorithm is easily generalized to cover the k-variate case
(k > 2). Let (X1 , X2 , . . . , Xk ) be a random vector with distribution π. For
any vector x = (x1 , x2 , . . . , xk ) let x(i) = (x1 , x2 , . . . , xi−1 , xi+1 , . . . , xk )
and Pi (· | x(i) ) be the conditional distribution of Xi given X(i) = x(i) . Now
generate a Markov chain Zn ≡ (Zn1 , Zn2 , . . . , Znk ), n ≥ 0 as follows:
Step 1: Start with some initial value Z0j = z0j , j = 1, 2, . . . , k − 1.
Generate Z0k from the conditional distribution Pk (· | Xj = z0j , j =
1, 2, . . . , k − 1).
Step 2: Next, generate Z11 from the conditional distribution P1 (· | Xj =
z0j , j = 2, . . . , k − 1, Xk = Z0k ).
Step 3: Next, generate Z12 from the conditional distribution P2 (· |
X1 = Z11 , Xj = z0j , j = 3, . . . , k − 1, Xk = Z0k ) and so on until
(Z11 , Z12 , . . . , Z1,k−1 ) is generated.
Now go back to Step 1 to generate Z1k and repeat Steps 2 and 3 and so
on. This sequence {Zn }n≥0 is called the Gibbs sampler Markov chain for
the distribution π.
A sufficient condition for irreducibility given earlier for the 2-variate case
carries over to the k-variate case. For more on the Gibbs sampler, see Robert
and Casella (1999).
14.4 Problems
14.1 (a) Show using Definition 14.1.1 that when the state space S is
countable, for any n, conditioned on {Xn = an }, the events
{Xn+j = an+j , 1 ≤ j ≤ k} and {Xj = aj : 0 ≤ j ≤ n − 1} are
independent for all choices of k and {aj }n+k
j=0 . Thus, conditioned
on the “present” {Xn = an }, the “past” {Xj : j ≤ n − 1} and
“future” {Xj : j ≥ n + 1} are two families of independent random variables with respect to the conditional probability measure P (· | Xn = an ), provided, P (Xn = an ) > 0.
482
14. Markov Chains and MCMC
(b) Prove Proposition 14.2.2 using induction on n (cf. Chapter 6).
14.2 In Example 14.1.1 (Frog in the well), verify that
(a) if αi ≡ 1 −
1
ci ,
(c) if αi ≡ 1 −
1
2i2 ,
c > 1, i ≥ 1, then 1 is null recurrent,
(b) if αi ≡ α, 0 < α < 1, then 1 is positive recurrent, and
then 1 is transient.
14.3 Consider SSRW in Z2 where the transition probabilities are
p(i,j)(i! ,j ! ) = 14 each if (i$ , j $ ) ∈ {(i + 1, j), (i − 1, j), (i, j + 1), (i, j − 1)}
and zero otherwise. Verify that for n = 2k
(2k)
p(0,0),(0,0)
- .2
1 2k
11
= 2k
∼
k
4
πk
and conclude that (0,0) is null recurrent. Extend this calculation to
SSRW in Z3 and conclude that (0,0,0) is transient.
14.4 Show that if i is absorbing and j → i, then j is transient by showing
∗
∗
= P (Ti < Tj | X0 = j) > 0 and 1 − fjj ≥ fji
.
that if j → i, then fji
14.5 (a) Let i be recurrent and i → j. Show that j is recurrent using
Corollary 14.1.5.
(Hint: Show that there exist n0 and m0 such that for all n,
(n +n+m0 )
(n ) (n) (m )
(n )
(m )
pjj 0
≥ pji 0 pii , pij 0 with pji 0 > 0 and pij 0 > 0.)
(b) Let i and j communicate. Show that di = dj .
14.6 Show that in a finite state space irreducible Markov chain (S, P ), all
states are positive recurrent by showing
(r)
(a) that for any i, j in S, there exist r, r ≤ K such that pij > 0,
where K is the number of states in S,
(b) for any i in S, there exists a 0 < α < 1, and c < ∞ such that
Pi (Ti > n) ≤ cαn .
Give an alternate proof by showing that if S is finite, then for any
initial distribution µ, the occupation measures
µn(·) ≡
n
)
1
Pµ (Xj ∈ ·)
(n + 1) j=0
has a subsequence that converges to a probability distribution π that
is stationary for (S, P ).
14.7 Prove Theorem 14.1.3 using the Markov property and induction.
14.4 Problems
483
14.8 Adapt the proof of Theorem 14.1.9 to show that for any i, j
n
1 ) (k)
fij
pij →
n j=1
Ej T j
if j is positive recurrent and 0 otherwise. Conclude that in a finite
state space case, there must be at least one state that is positive
recurrent.
$Ti −1
14.9 If j → i then ζ1 ≡
j=0 δXr j , the number of visits to j before
visiting i satisfies Pi (ζ1 > n) ≤ cαn for some 0 < c < ∞, 0 < α < 1
and all n ≥ 1.
14.10 Adapt the proof of Theorem 14.1.9 to establish the following laws of
large numbers. Let (S, P ) be irreducible and positive recurrent with
stationary distribution π.
$
(a) Let h : S → R be such that j∈S |h(j)|πj < ∞. Then, for any
initial distribution µ,
n
)
1 )
h(Xj ) →
h(j)πj
n + 1 j=0
w.p. 1
j∈S
by first verifying that
J.
-J T)
J
J i −1
J
h(Xj )JJ < ∞.
Ei J
j=0
(b) Let g : S × S → R be such that
for any initial distribution µ,
$
i,j∈S
|g(i, j)|πi pij < ∞. Then,
n
)
1 )
g(Xj , Xj+1 ) →
g(i, j)πi pij
n + 1 j=0
w.p. 1.
i,j∈S
(c) Fix two disjoint subsets A and B in S. Evaluate the long run
proportion of transitions from A to B.
(d) Extend (b) to conclude that the tail sequence Zn ≡ {Xn+j :
j ≥ 0} of the Markov chain {Xn }n≥0 converges as n → ∞
in the sense of finite dimensional distributions to the strictly
stationary sequence {Xn }n≥0 which is the Markov chain (S, P )
with initial distribution π.
14.11 Let {Xn }n≥0 be a Markov chain that is irreducible and has at least
two states. Show that w.p. 1 the trajectories {Xn } do not converge,
i.e., w.p. 1, limn→∞ Xn does not exist.
(Hint: Show that w.p. 1, the set of limit points of the set {Xn : n ≥
0} coincides with S.)
484
14. Markov Chains and MCMC
14.12 Let
" n }n≥0 be a Markov chain with state space S and tr. pr. P ≡
! {X
(pij ) . A probability distribution π ≡ {πj : j ∈ S} is said to satisfy
the condition of detailed balance or time reversibility with respect to
(S, P ) if for all i, j, πi pij = πj pji .
(a) Show that such a π is necessarily a stationary distribution.
(b) For the birth and death chain (Example 14.1.4), find a condition
in terms of the birth and death rates {αi , βi }i≥0 for the existence
of a probability distribution π that satisfies the condition of
detailed balance.
14.13 (Absorption probabilities and times). Let 0 be an absorbing state.
For any i )= 0, let θi = fi0 ≡ Pi (T0 < ∞) and ηi = Ei T0 . Show using
the Markov property that for every i )= 0,
)
θi = pi0 +
θj pij
j.=0
ηi
=
1+
)
ηj pij .
j.=0
Apply this to the Gambler’s ruin problem with S = {0, 1, 2, . . . , K},
K < ∞ and p00 = 1, pN N = 1, pi,i+1 = p, pi,i−1 = 1 − p, 0 < p < 1,
1 ≤ i ≤ N − 1 and find the probability and expected waiting time for
ruin (absorption at 0) starting from an initial fortune of i, 1 ≤ i ≤
N − 1.
14.14 (Renewal theory via Markov chains). Let {Xj }j≥1$be iid positive
n
integer valued random variables. Let S0 = 0, Sn = j=1 Xj , n ≥ 1,
N (n) = k if Sk ≤ n < Sk+1 , k = 0, 1, 2, . . . be the number of renewals
up to time n, An = n − SN (n) be the age of the current unit at time
n.
(a) Show that {An }n≥0 is a Markov chain and find its state space
S and transition probabilities.
(b) Assuming that EX1 < ∞, verify that
πj =
P (X1 > j)
j = 0, 1, 2, . . .
EX1
is the unique stationary distribution.
(c) Assuming that X1 has an aperiodic distribution and Theorem
14.1.18 holds, show that the discrete renewal theorem holds.
14.15 Prove Proposition 14.2.1 for the countable state space case.
14.16 Prove Proposition 14.2.2.
14.4 Problems
485
14.17 Establish assertion (i) of Theorem 14.2.3.
'<
(
"
!
n
≤
s(f
)
(Hint: Show that d X̂n (x, ω), X̂n+1 (x, ω)
i
i=1
!
"
d x, fn+1 (x, ω) and use Borel-Cantelli to show that the right
n
side
0 < λ < 1 and show similarly
! is O(λ ) w.p. "1 for some
d X̂n (x, ω), X̂n (y, ω) = O(λn ) w.p. 1 for any x, y.)
14.18 Show that if P (·, ·) is the transition function of a Markov chain
{Xn }n≥0 , then for any n ≥ 0, Px (Xn ∈ A) = P (n) (x, A), where
P (n) (·, ·) is defined by the iteration
%
P (n+1) (x, A) = P (n) (y, A)P (x, dy),
S
with P
(0)
(x, A) = IA (x).
(Hint: Use induction and Markov property.)
14.19 (a) Let {Xn }n≥0 be a random walk defined by the iteration scheme
Xn+1 = Xn +ηn+1 where {ηn }n≥1 are iid random variables independent of X0 . Assume that ν(·) = P (η1 ∈ ·) has an absolutely
continuous component with a density that is strictly positive
a.e. on an open interval around 0. Show that {Xn }n≥0 is Harris
irreducible w.r.t. the Lebesgue measure on R. Show that if in
addition Eη1 = 0, then {Xn } is Harris recurrent as well.
(b) Use Theorem 14.2.11 to establish the second claim in Example
14.2.10.
14.20 Show that the waiting time chain (Example 14.2.6) defined by
Xn+1 = max{Xn + ηn+1 , 0}, where {ηn }n≥1 are iid is irreducible
with reference measure φ(·) ≡ δ0 (·), the delta measure at 0, provided
P (η1 < 0) > 0. Show further that it is φ recurrent if Eη1 < 0.
14.21 Prove Theorem 14.2.5 (i) using the C-set lemma.
14.22 Find a h : [0, 1] × [0, 1] → [0, 1] such that h(x, y) is discontinuous in
x !for almost all" y in [0, 1] and conclude that the function P (x, A) =
P h(x, Y ) ∈ A where Y is a uniform [0,1] random variable need not
be Feller.
14.23 Let (Ω, F, P ) be a probability space and (S, S) a measurable space.
Let h : S × Ω → S be jointly measurable. Show that P (x, A) ≡
P (h(x, ω) ∈ A) is a transition function.
14.24 (a) Let {Xn }n≥0 be an irreducible Markov chain with state space
S ≡ {0, 1, 2, . . .}. Suppose V : S → [0, ∞) is such that for
some K < ∞, Ex V (X1 ) ≤ V (x) for all x > K and that
limx→∞ V (x) = ∞. Show that {Xn }n≥0 is recurrent.
486
14. Markov Chains and MCMC
(Hint: Let {X̃n }n≥0 be a Markov chain with state space
S ≡ {0, 1, 2, . . .} and transition probabilities same as that of
{Xn }n≥0 except that the states {0, 1, 2, . . . , K} are absorbing.
Verify that {V (X̃n )}n≥0 is a nonnegative super-martingale and
hence that {X̃n }n≥0 is bounded w.p. 1. Now conclude that
there must exist a state x that gets visited infinitely often by
{Xn }n≥0 .)
(b) Consider the reflecting nonhomogeneous random walk on S ≡
{0, 1, 2, . . .} such that
/
pi
if j = i + 1
pij =
1 − pi if j = i − 1
with p0 = 1, 0 < pi ≤ qi for all i ≥ k0 and some 1 ≤ k0 < ∞
and 0 < pi < 1 for all i ≥ 1. Show that {Xn }n≥0 is irreducible
and recurrent.
14.25 Let {Xn }n≥0 be an irreducible and recurrent Markov chain with a
countable state space S. Let V : S → R+ be such that Ex V (X1 ) ≤
V (x) for all x in S. Show that V (·) is constant on S.
14.26 Let {Cn }n≥1 be iid random variables with values in [0, 4]. Let
{Xn }n≥0 be a Markov chain with values in [0, 1] defined by the random iteration scheme
Xn+1 = Cn+1 Xn (1 − Xn ), n ≥ 0.
(a) Show that if E log C1 < 0 then Xn = O(λn ) w.p. 1 for some
0 < λ < 1.
(b) Show also that if E log C1 < 0 and 0 < V (log C1 ) < ∞ then
there exist sequences {an }n≥1 and {bn }n≥1 such that
log Xn − an
−→d N (0, 1).
bn
15
Stochastic Processes
This chapter gives a brief discussion of two special classes of real valued
stochastic processes {X(t) : t > 0} in continuous time [0, ∞). These are
continuous time Markov chains with a discrete state space (including Poisson processes) and Brownian motion These are very useful in many areas
of applications such as queuing theory and mathematical finance.
15.1 Continuous time Markov chains
15.1.1
Definition
Consider a physical system that can be in one of a finite or countable
number of states {0, 1, 2, . . . , K}, K ≤ ∞. Assume that the system evolves
in continuous time in the following manner. In each state the system stays
a random length of time that is exponentially distributed and then jumps
to a new state with a probability distribution that depends only on the
current state and not on the past history. Thus, if the state of the system
at the time of the nth transition is denoted by yn , n = 0, 1, 2, . . ., then
{yn }n≥0 is a Markov chain with state space S! ≡ {0,
" 1, 2, . . . , K}, K ≤ ∞
and some transition probability matrix P ≡ (pij ) . If yn = in , then the
system stays in in a random length of time Ln , called the sojourn time, such
that conditional on {yn = in }n≥0 , {Ln }n≥0 are independent exponential
random variables with Ln having a mean λ−1
in . Now set the state of the
488
15. Stochastic Processes
system X(t)





X(t) =




at time t ≥ 0 by
y0
y1
..
.
0 ≤ t < L0
L0 ≤ t < L0 + L1
Lν + L1 + · · · + Ln−1 ≤ t < L0 + L1 + · · · + Ln .
(1.1)
Then {X(t) : t ≥ 0} is called a continuous
time
Markov
chain
with
state
!
"
space S, jump probabilities P ≡ (pij ) , waiting time parameters {λi : i ∈
S}, and embedded Markov chain {yn }n≥0 . To make sure that there are only
finite number of transitions in finite time, i.e.,
yn
∞
)
i=0
Li = ∞
w.p. 1
one needs to impose the nonexplosion condition
∞
)
1
=∞
λ
i=0 yn
w.p. 1.
(1.2)
(Problem 15.1)
Clearly, a sufficient condition for this is that λi < ∞ for all i ∈ S and
{yn }n≥0 is an irreducible recurrent Markov chain.
It can be verified using the “memorylessness” property of the exponential
distribution (Problem 15.2) that {X(t) : t ≥ 0} has the Markov property,
i.e., for any 0 ≤ t1 ≤ t2 ≤ t3 ≤ · · · ≤ tr < ∞ and
"
!
P X(tr ) = ir | X(tj ) = ij , 0 ≤ j ≤ r − 1
!
"
= P X(tr ) = ir | X(tr−1 ) = ir−1 .
(1.3)
15.1.2
Kolmogorov’s differential equations
The functions
!
"
pij (t) ≡ P X(t) = j | X(0) = i
(1.4)
are called transition functions. To determine these functions from the jump
probabilities {pij } and the waiting time parameters {λi }, one uses the
Chapman-Kolmogorov equations
)
pij (t + s) =
pik (t)pkj (s), t, s ≥ 0
(1.5)
k∈S
which is an immediate consequence of the Markov property (1.3) and the
definition (1.4). In addition to (1.5), one has the continuity condition
lim pij (t) = δij .
t↓0
(1.6)
15.1 Continuous time Markov chains
489
Under the nonexplosion hypothesis (1.2), it can be shown (Chung (1967),
Feller (1966), Karlin and Taylor (1975)) that pij (t) are differentiable as
functions of t and satisfy the Kolmogorov’s forward and backward differential equations
)
pik (t)p$kj (0) (forward)
(1.7a)
p$ij (t) =
k
p$ij (t)
=
)
p$ik (0)pkj (t)
(backward)
(1.7b)
k
Further, akj ≡ p$kj (0) can be shown to be λk pkj for k )= j and −λk for k = j.
!
"
The matrix A ≡ (aij ) is called the infinitesimal matrix or generator of
the process
{X(t)
" : t ≥ 0}. If the state space S is finite, i.e., K < ∞, then
!
P (t) ≡ (pij (t)) can be shown to be
P (t) = exp(At) ≡
15.1.3
∞
)
An n
t .
n!
n=0
(1.8)
Examples
Example 15.1.1: (Birth and death process). Here
pi,i+1
=
pi,i−1
=
λi
=
αi
αi + βi
βi
αi + βi
(αi + βi )
i≥0
i≥1
i≥0
where {αi , βi }i≥0 are nonnegative numbers with αi being the birth rate, βi
being the death rate. This has the meaning that given X(t) = i, for small
h > 0, X(t + h) goes up to (i + 1) with probability αi h + o(h) or goes
down to (i − 1) with probability βi h + o(h) or stays at i with probability
1 − (αi + βi )h + o(h). In this case the forward and backward equations
become
p$ij (t)
=
p$ij (t)
=
αj−1 pi,j−1 (t) + βj+1 pi,j+1 (t) − (αj + βj )pij (t),
αi pi+1,j (t) + βi pi−1,j (t) − (αi + βi )pij (t)
with initial condition pij (0) = δij .
(a) Pure birth process. A special case of the above is when βi ≡ 0 for all
i. Here pij (t) = 0 if j < i and X(t) is a nondecreasing function of t
and the jumps are of size one.
A further special case of this when αi ≡ α for all i. In this case, the
process waits in each state a random length of time with mean α−1 and
490
15. Stochastic Processes
jumps one step higher. It can be verified that in this case, the solution of
the Kolmogorov’s differential equations (1.7a) and (1.7b) are given by
pij (t) = e−αt
(αt)j−i
.
(j − i)!
(1.9)
From this it is easy to conclude that {X(t) : t ≥ 0} is a Levy process, i.e.,
it has stationary and independent increments, i.e., for 0 = t0 ≤ t1 ≤ t2 ≤
t3 ≤ · · · ≤ tr < ∞, Yj = X(tj ) − X(tj−1 ), j = 1, 2, . . . , r are independent
and the distribution of Yj depends only on (tj − tj−1 ). Further, in this case,
X(t) − X(0) has a Poisson distribution with mean αt. This {X(t) : t ≥ 0}
is called a Poisson process with intensity parameter α.
Another special case is the linear birth and death process. Here αi = iα,
βi = iβ for i = 0, 1, 2, . . .. The pure death process has parameters αi ≡ 0
for i ≥ 0. A number of queuing processes can be modeled as a birth and
death process and more generally as a continuous time Markov chain. For
example, an M/M/s queuing system is one in which customers arrive at a
service facility at the jump times of a Poisson process (with parameter α)
and there are s servers with service time at each server being exponential
with the same mean (= β −1 ). The number X(t) of customers in the system
at time t evolves a birth and dealt process with parameters αi ≡ α for i ≥ 0
and βi = iβ, 0 ≤ i ≤ s, = sβ for i > s.
Example 15.1.2: (Markov branching processes). Here X(t) is the population size in a process where each particle lives a random length of time
with exponential distribution with mean α−1 and on death create a random number of new particles with offspring distribution {pj }j≥0 and all
particles evolve independently of each other. This implies that λi = iα,
i ≥ 0, pij = pj−i+1 , j ≥ i − 1 and = 0 for j < i − 1, i ≥ 1, p00 = 1. Thus 0
is an absorbing barrier. The random variable T ≡ inf{t : t > 0, X(t) = 0}
is called the extinction time. It can be shown that
∞
)
pij (t)s =
j
j=0
-)
∞
p1j (t)s
.i
0≤s≤1
(1.10)
∂
F (s, t) (forward equation)
∂s
!
"
u F (s, t) (backward equation)
(1.11)
j
j=0
and also that
F (s, t) ≡
for i ≥ 0,
-)
∞
j=0
p1j (t)sj
.
satisfies the differential equation
∂F
(s, t)
∂t
∂F
(s, t)
∂t
=
=
u(s)
(1.12)
15.1 Continuous time Markov chains
with F (s, 0) ≡ s
where
u(s) ≡ α
-)
∞
j=0
.
pj sj − s .
491
(1.13)
Further, if q ≡ P (T < ∞ | X(0) = 1) is the extinction
$∞ probability, the q is
the smallest solution in [0,1] of the equation q = j=0 pj q j (cf. Chapter
18). (See Athreya and Ney (2004), Chapter III, p. 106.)
Example 15.1.3: (Compound Poisson processes). Let {Li }i≥0 and {ξi }i≥1
be two independent sequences of random variables such that {Li }i≥0 are
iid exponential with mean α−1 and {ξi }i≥1 are iid integer valued random
variables with distribution {pj }. Let X(t) = k if L0 + · · · + Lk ≤ t <
L0 + · · · + Lk+1 . Let

0
0 ≤ t < L0


 1

L0 ≤ t < L0 + L1


 .
.
.
X(t) =


k
L0 + · · · + Lk−1 ≤ t < L0 + · · · + Lk ,




 ..
.
Let
X(t)
Y (t) =
)
i=1
ξi ,
t ≥ 0.
(1.14)
Then {Y (t) : t ≥ 0} is a continuous time Markov chain with state space
S ≡ {0, ±1, ±2, . . .}, jump probabilities pij = P (ξ1 = j − i) = pj−i . It is
also a Levy process. It is called a compound Poisson process with jump rate
α and jump distribution {pj }. If p1 ≡ 1 this reduces to the Poisson process
case.
15.1.4
Limit theorems
To investigate what happens to pij (t) ≡ P (X(t) = j | X(0) = i) as t → ∞,
one needs to assume that the embedded chain {yn }n≥0 is irreducible and
recurrent. This implies that for any i0 the random variable
T = min{t : t > L0 , X(t) = i0 }
is finite w.p. 1. Further, the process, starting from i0 , returns to i0 infinitely
often and hence by the Markov property is regenerative in the sense that
the excursions between consecutive returns to i0 are iid. One can use this,
laws of large numbers and renewal theory (cf. Section 8.5) to arrive at the
following:
492
15. Stochastic Processes
Theorem 15.1.1: Let P = {pij } be irreducible and recurrent and 0 <
λi < ∞ for all i in S. Let there exist a probability distribution {πi } such
that
)
aij πj = 0 for all i
(1.15)
j∈S
where
aij
= λi pij
= −λi
i )= j
i = j.
Then
(i) {πj } is stationary for {pij (t)}, i.e.,
t ≥ 0,
$
i∈S
πi pij (t) = πj for all j,
(ii) for all i, j
lim pij (t) = πj ,
(1.16)
t→∞
and hence {πj } is the unique stationary distribution,
$
(iii) for any function h : S → R, such that j∈S |h(j)|πj < ∞,
1
t→∞ t
lim
%
0
t
)
!
"
h X(u) du =
h(j)πj
w.p. 1
(1.17)
j∈S
for any initial distribution of X(0).
! Note
" that (1.16) holds without any assumption of aperiodicity on P ≡
(pij ) .
A sufficient condition for a probability distribution {πj } to be a stationary distribution is the so-called detailed balance condition
πk akj = πj ajk .
(1.18)
One can use this for birth and death chains on a finite state space S ≡
{0, 1, 2, . . . , N }, N < ∞ to conclude that the stationary distribution is
given by
αn−1 αn−2 . . . α0
πn =
π0
(1.19)
βn βn−1 . . . β1
provided αi > 0 for all 0 ≤ i ≤ N − 1, βi > 0 for all 1 ≤ i ≤ N and
αN = 0, β0 = 0. A necessary and sufficient condition for equilibrium, i.e.,
the existence of a stationary distribution when N = ∞ is
∞
)
αn−1 . . . α0
< ∞.
βn . . . β1
n=1
(1.20)
15.2 Brownian motion
493
This yields in the M/M/s case with arrival rate α and service rate β (i.e.,
αi ≡ α, for i ≥ 0, βi ≡ iβ for 1 ≤ i ≤ s, = sβ for i > s) the necessary and
sufficient condition for the equilibrium, that the traffic intensity
α
< 1,
sβ
ρ≡
(1.21)
i.e., the mean number of arrivals per unit time, be less than the mean
number of the persons served per unit time. For further discussion and
results, see the books of Karlin and Taylor (1975) and Durrett (2001).
15.2 Brownian motion
Definition 15.2.1: A real valued stochastic process {B(t) : t ≥ 0} is
called standard Brownian motion (SBM) if it satisfies
(i) B(0) = 0,
(ii) B(t) has N (0, t) distribution, for each t ≥ 0,
(iii) it is a Levy process, i.e., it has stationary independent increments.
It follows that {B(t) : t ≥ 0} is a Gaussian process (i.e., the finite
dimensional distributions are Gaussian) with mean function m(t) ≡ 0 and
covariance function c(s, t) = min(s, t). It can be shown that the trajectories
are continuous w.p. 1. Thus, Brownian motion is a Gaussian process, has
continuous trajectories and has stationary independent increments (and
hence is Markovian). These features make it a very useful process as a
building block for many real world phenomena such as the movement of
pollen (which was studied by the English Botanist, Robert Brown, and
hence the name Brownian motion) movement of a tagged particle in a
liquid subject to the bombardment of the molecules of the liquid (studied
by Einstein and Slomuchowski) and the fluctuations in stock market prices
(studied by the French Economist Bachelier).
15.2.1
Construction of SBM
Let {ηi }i≥1 be iid N (0, 1) random variables on some probability space
(Ω, F, P ). Let {φi (·)}i≥1 be the sequence of Haar functions on [0, 1] defined
by the doubly indexed collection
H00 (t) ≡ 1
H11 (t) =
/
1 on [0, 12 )
−1 on [ 12 , 1]
494
15. Stochastic Processes
and for n ≥ 1
Hn,j (t)
=
8 (j − 1) j (
, n
2n
2
8
n−1
j j + 19
−2 2
for t in n , n
2
2
0 otherwise
=
1, 3, . . . , 2n−1 .
=
=
j
2
n−1
2
for t in
It is known that this family is a complete orthonormal basis for L2 ([0, 1]).
Let
% t
N
)
BN (t, ω) ≡
ηi (ω)
φi (u)du.
(2.1)
i=1
0
Then, for each N , {BN (t, ω) : 0 ≤ t ≤ 1} is a Gaussian process
on (Ω, F, P ) with mean function mN (t) ≡ 0 and covariance function
"! & s
"
$N ! & t
φ (u)du and the property that the tracN (s, t) = i=1 0 φi (u)du
0 i
jectories t → BN (t, ω) are continuous in t for each ω in Ω.
It can be shown (Problem 15.11) that w.p. 1 the sequence {BN (·, ω)}N ≥1
is a Cauchy sequence in the Banach space C[0, 1] of continuous real valued functions on [0, 1] with supremum metric. Hence, {BN (·, ω)}N ≥1 converges w.p. 1 to a limit element B(·, ω) which will be a Gaussian process
with continuous trajectories and mean and covariance functions m(t) ≡ 0
"! & s
"
&t
$∞ ! & t
φ (u)du
φ (u)du = 0 I[0,t] (u)I[0,s] (u)du =
and c(s, t) =
i=1
0 i
0 i
min(s, t) respectively. (See Section 2.3 of Karatzas and Shreve (1991).)
Thus,
% t
∞
)
ηi (ω)
φi (u)du
(2.2)
B(t, ω) ≡
i=1
0
is a well-defined stochastic process for 0 ≤ t ≤ 1 that has all the properties
claimed above and is called SBM on [0,1]. Let {B (j) (t, ω) : 0 ≤ t ≤ 1}j≥1
be iid copies of {B(t, ω) : 0 ≤ t ≤ 1} as defined as above. Now set

B (1) (t, ω),
0≤t≤1




 B (1) (1, ω) + B (2) (t − 1, ω), 1 ≤ t ≤ 2
B(t, ω) ≡
..

.




B(n, ω) + B (n+1) (t − n, ω), n ≤ t ≤ n + 1, n = 1, 2, . . .
(2.3)
Then {B(t, ω) : t ≥ 0} satisfies
(i) B(0, ω) = 0,
(ii) t → B(t, ω) is continuous in t for all ω,
(iii) it is Gaussian with mean function m(t) ≡ 0 and covariance function
c(s, t) ≡ min(s, t),
i.e., it is SBM on [0, ∞). From now on the symbol ω may be suppressed.
15.2 Brownian motion
15.2.2
495
Basic properties of SBM
(i) Scaling properties
Fix c > 0 and set
1
Bc (t) ≡ √ B(ct), t ≥ 0.
c
(2.4)
Then, {Bc (t)}t≥0 is also an SBM. This is easily verified by noting
that Bc (0) = 0, Bc (t) ∼ N (0, t), Cov(Bc (t), Bc (s)) = 1c min{ct, cs} =
min(t, s) and that {Bc (·)} is a Levy process and the trajectories are
continuous w.p. 1.
(ii) Reflection
If {B(·)} is SBM, then so is {−B(·)}. This follows from the symmetry
of the mean zero Gaussian distribution.
(iii) Time inversion
Let
B̃(t) =
/
tB( 1t ) for t > 0
0
for t = 0.
(2.5)
Then {B̃(t) : t ≥ 0} is also an SBM. The facts that {B̃(t) : t > 0}
is a Gaussian process with mean and covariance function same
as SBM and the trajectories are continuous in the open interval
(0, ∞) are straightforward to verify. It only remains to verify that
limt→0 B̃(t) = 0 w.p. 1. Fix 0 < t1 < t2 . Then {B̃(t) : t1 ≤ t ≤ t2 }
is a Gaussian process with mean function 0 and covariance function
min(s, t) and has continuous trajectories, i.e., it has the same distribution as {B(t) : t1 ≤ t ≤ t2 }. Thus X̃1 ≡ sup{|B̃(t)| : t1 ≤ t ≤ t2 }
has the same distribution as X1 ≡ sup{|B(t)| : t1 ≤ t ≤ t2 }. Since
both converge as t1 ↓ 0 to X̃2 (t2 ) ≡ sup{B̃(t) : 0 < t ≤ t2 } and
X2 (t2 ) ≡ sup{B(t) : 0 < t ≤ t2 }, respectively, these two have the
same distribution. Again, since X̃2 (t2 ) and X2 (t2 ) both converge as
t2 ↓ 0 to X̃2 ≡ limt↓0 |B̃(t)| and X2 ≡ limt↓0 |B(t)|, respectively, X̃2
and X2 have the same distribution. But X2 = 0 w.p. 1 since B(t) is
continuous in [0, ∞). Thus X̃2 = 0 w.p. 1, i.e., limt→0 B̃(t) = 0 w.p.
1.
(iv) Translation invariance (after a fixed time t0 )
Fix t0 > 0 and set
Bt0 (t) = B(t + t0 ) − B(t0 ), t ≥ 0.
(2.6)
Then {Bt0 (t)}t≥0 is also an SBM. This follows from the stationary
independent increments property.
(v) Translation invariance (after a stopping time T0 )
A random variable T (ω) with values in [0, ∞) is called a stopping
496
15. Stochastic Processes
time w.r.t. the SBM {B(t) : t ≥ 0} if for each t in [0, ∞) the event
{T ≤ t} is in the σ-algebra Ft ≡ σ(B(s) : s ≤ t) generated by the
trajectory B(s) for 0 ≤ s ≤ t. Examples of stopping times are
for 0 < a < ∞
where a < 0 < b.
Ta = min{t : t ≥ 0, B(t) ≥ a}
(2.7)
Ta,b = min{t : t > 0, B(t) )∈ (a, b)}
(2.8)
Let T0 be a stopping time w.r.t. SBM {B(t) : t ≥ 0}. Let
BT0 (t) ≡ {B(T0 + t) − B(T0 ) : t ≥ 0}.
(2.9)
Then {BT0 (t)}t≥0 is again an SBM.
Here is an outline of the proof.
(a) T0 deterministic is covered by (4) above.
(b) If T0 takes only countably many values, say {aj }j≥1 , then it
is not difficult to show that conditioned on the event T0 = aj ,
the process BT0 (t) ≡ {B(T0 + t) − B(T0 )} is SBM. Thus the
unconditional distribution of {BT0 (t) : t ≥ 0} is again an SBM.
(c) Next given a general stopping time T0 , one can approximate
it by a sequence Tn of stopping times where for each n, Tn is
discrete. By continuity of trajectories, {BT0 (t) : t ≥ 0} has the
same distribution as the limit of {BTn (t) : t ≥ 0} as n → ∞.
A consequence of the above two properties is that SBM has the
Markov and the strong Markov properties. That is, for each fixed
t0 , the distribution of B(t), t ≥ t0 given B(s) : s ≤ t0 depends only
on B(t0 ) (Markov property) and for each stopping time T0 , the distribution of B(t) : t ≥ T0 given B(s) : s ≤ T0 depends only on B(T0 )
(strong Markov property).
(vi) The reflection principle
Fix a > 0 and let Ta = inf{t : B(t) ≥ a} where {B(t) : t ≥ 0} is
SBM. For any t > 0, a > 0,
P (Ta ≤ t)
=
P (Ta ≤ t, B(t) > a)
+ P (Ta ≤ t, B(t) < a).
Now, by continuity of the trajectory, B(Ta ) = a on {Ta ≤ t}. Thus
!
"
P Ta ≤ t, B(t) < a
"
!
= P Ta ≤ t, B(t) < B(Ta )
"
!
= P Ta ≤ t, B(t) − B(Ta ) < 0
!
"
= P Ta ≤ t, B(t) − B(Ta ) > 0
"
!
= P Ta ≤ t, B(t) > a .
15.2 Brownian motion
497
To see this note that by (4), {B(Ta + h) − B(Ta ) : h ≥ 0} is independent
of Ta and has the
: !
" same ;distribution as an SBM and hence
− B(Ta + h) − B(Ta ) : h ≥ 0 is also independent of Ta and has
the same distribution as an SBM. Thus,
P (Ta ≤ t)
!
"
2P Ta ≤ t, B(t) > a
!
"
= 2P B(t) > a
'
' a ((
= 2 1−Φ √
t
=
(2.10)
where Φ(·) is the standard N (0, 1) cdf. The above argument is known
as the reflection principle as it asserts that the path
B̃(t) ≡
/
B(t)
!
"
B(Ta ) − B(t) − B(Ta )
, t ≤ Ta
, t > Ta
(2.11)
obtained by reflecting the original path on the line y = a from the
point (Ta , a) for t > Ta yields a path that has the same distribution
as the original path. Thus the probability density function of Ta is
fTa (t)
=
=
' a (1 a
2φ √
t 2 t3/2
a2
1
a
√ e− 2t 3/2
t
2π
(2.12)
implying that ETap < ∞ for p < 1/2 and ∞ for p ≥ 1/2. Also, by the
strong Markov property the process {Ta : a ≥ 0} is a process with
stationary independent increments, i.e., a Levy process. It is also a
stable process of order 1/2.
One can use this calculation of P (Ta ≤ t) to show that the probabilP
ity that the SBM crosses zero in the interval (t1 , t2 ) is π2 arcsin tt12
(Problem 15.12).
If M (t) ≡ sup{B(s) : 0 ≤ s ≤ t} then for a > 0
!
"
P M (t) > a
= P (Ta ≤ t)
!
"
= 2P B(t) > a
!
"
= P |B(t)| > a
(2.13)
it follows that M (t) has the same distribution as |B(t)| and hence
has finite moments of all order. In fact,
!
"
E eθM (t) < ∞
for all θ > 0.
498
15.2.3
15. Stochastic Processes
Some related processes
(i) Let {B(t) : t ≥ 0} be a SBM. For µ in (−∞, ∞) and σ > 0, the
process Bµ,σ (t) ≡ µt + σB(t), t ≥ 0 is called Brownian motion with
constant drift µ and constant diffusion σ.
(ii) Let B0 (t) = B(t) − tB(1), 0 ≤ t ≤ 1. The process {B0 (t) : 0 ≤
t ≤ 1} is called the Brownian bridge. It is a Gaussian process with
mean function 0 and covariance min(s, t) − st and has continuous
trajectories that vanish both at 0 and 1.
(iii) Let Y (t) = e−t B(e2t ), −∞ < t < ∞. Then {Y (t) : t ≥ 0}
is a Gaussian process with mean function 0 and covariance function c(s, t) = e−(s+t) e+2s = es−t if s < t. This process is called
the Ornstein-Uhlenbeck process. It is to be noted that for each t,
Y (t) ∼ N (0, 1) and in fact {Y (t) : −∞ < t < ∞} is a strictly stationary process and is a Markov process as well.
15.2.4
Some limit theorems
2
Let {ξ$
i }i≥1 be iid random variables with Eξ1 = 0, Eξ1 = 1. Let S0 = 0,
n
1
Sn = i=1 ξi , n ≥ 1. Let Bn (j/n) = √n Sj , j = 0, 1, 2, . . . , n and {Bn (t) :
0 ≤ t ≤ 1} be obtained by linear interpolation from the values at j/n
for j = 0, 1, 2, . . . , n. Then for each n, {Bn (t) : 0 ≤ t ≤ 1} is a random
continuous trajectory and hence is a random element of the metric space
of continuous real valued functions on [0,1] that are zero at zero with the
metric
ρ(f, g) ≡ {sup |f (t) − g(t)| : 0 ≤ t ≤ 1}.
(2.14)
Let µn ≡ P Bn−1 be the induced probability measure on C[0, 1]. The following is a generalization of the central limit theorem as noted in Chapter
11.
Theorem 15.2.1: (Donsker ). In the space (C[0, 1], ρ) the sequence of
probability measures {µn ≡ P Bn−1 }n≥1 converges weakly to µ, the probability distribution of the& SBM. That is, for any bounded continuous function
h from C[0, 1] to R, hdµn → f hdµ.
For a proof, see Billingsley (1968).
Corollary 15.2.2: For any continuous functional T on (C[0, 1], ρ) to Rk ,
k < ∞, the distribution 'of T (Bn ) converges to(that of T (B). In particular,
S
|S |
the joint distribution of max √jn , max √jn converges weakly to that of
0≤j≤n
0≤j≤n
"
!
max B(u), max |B(u)| .
0≤u≤1
0≤u≤1
15.2 Brownian motion
499
There are similar limit theorems asserting the convergence of the empirical processes to the Brownian bridge with applications to the limit
distribution of the Kolmogorov-Smirnov statistics (see Billingsley (1968)).
Theorem 15.2.3: (Laws of large numbers).
lim
t→∞
B(t)
= 0 w.p. 1.
t
(2.15)
Proof: By the time inversion property (2.5)
lim B̃(t) = 0
t→0
But lim B̃(t) = lim tB(1/t) = lim
t→0
t→0
τ →∞
w.p. 1.
B(τ )
.
τ
!
Theorem 15.2.4: (Kallianpur-Robbins). Let f : R → R be integrable
with respect to Lebesgue measure. Then
.
-% ∞
% t
1
d
√
f (B(u))du −→
f (u)du Z
(2.16)
t 0
0
where Z is a random variable with density √
π
z(1−z)
in [0,1].
This is a special case of the Darling-Kac formula for Markov processes
that can be established here using the regenerative property of SBM due
to the fact that starting from 0, SBM will hit level 1 at same time T1
and from there hit level 0 at a later time τ1 . And this can be repeated to
produce a sequence of times 0, τ1 , τ2 , . . . such that the excursions {B(t) :
τi ≤ t < τi+1 }i≥1 are iid. The sequence {τi }i≥1 is a renewal sequence with
life time distribution τ1 having a regularly varying tail of order 1/2 and
hence infinite mean. One can appeal now to results from renewal theory to
complete the proof (see Feller (1966) and Athreya (1986)).
15.2.5
Some sample path properties of SBM
The sample paths t → B(t, ω) of the SBM are continuous w.p. 1. It turns
out that they are not any more smooth than this. For example, they are
not differentiable nor are they of bounded variation on finite intervals. It
will be shown now that w.p. 1 Brownian sample paths are not differentiable
any where and the quadratic variation over any finite interval is finite and
nonrandom. (See also Karatzas and Shreve (1991).)
(i) Nondifferentiability
of B(·, ω) in [0,1]
3
4
Let An,k = ω : sup |B(t,ω)−B(s,ω)|
≤ k for some 0 ≤ s ≤ 1 .
|t−s|
|t−s|≤3/n
J !
"
! "J
:
Let Zr,n = JB (r+1)
− B nr J, r = 0, 1, 2, n − 1. Let Bn,k = ω :
n
500
15. Stochastic Processes
"
!
max Zr,n , Zr+1,n , Zr+2,n ≤
An,k ⊂ Bn,k . Now
P (Bn,k )
since
Z0n
√1
n
implies
6k
n
;
for some r . It can be verified that
'
" 6k (
!
P max Zr,n , Zr+1,n , Zr+2,n ≤
n
r=0
((
' '
6k 3
≤ n P |Z0,n | ≤
n
' |Z | ' 6k ((3
0n
≤ nP
≤ √
√1
n
n
' Const (3
≤ n √
as n → ∞,
n
n−1
)
≤
√
∼ N (0, 1). Thus for each k < ∞, P (An,k ) ≤ Const
. This
n
∞
)
n=1
P (An3 ,k ) < ∞.
So by the Borel-Cantelli lemma, only finitely many An3 ,k can happen
w.p. 1. The event A ≡ {ω : B(t,
# ω) is differentiable for at least one t
in [0, 1]} is contained in C ≡ k≥1 {ω : ω ∈ An3 ,k for infinitely many
n} and so P (A) ≤ P (C) = 0.
(ii) Finite quadratic variation of SBM
Let
"
!
ηn,j = B(j2−n ) − B (j − 1)2−n , j = 1, 2, . . . , 2n
2
)
n
∆n
Then
≡
2
ηnj
.
(2.17)
j=1
2
)
1
E∆n =
= 1.
n
2
j=1
n
Also by independence and stationarity of increments
2
)
3
3
= n.
2n
2
2
j=1
n
Var(∆n ) =
n)
for any # > 0. This implies, by Borel
Thus P (|∆n −1| > #) ≤ Var$(∆
2
Cantelli, ∆n → 1 w.p. 1. By definition the quadratic variation is
/)
n
J
J
JB(tj , ω) − B(tj−1 , ω)J2 : all finite partitions
∆ ≡ sup
j=0
0
(t0 , t1 , . . . , tn ) of [0, 1] .
(2.18)
15.2 Brownian motion
501
It is easy to verify that ∆ = limn ∆n . Thus ∆ = 1 w.p. 1. It follows
that w.p. 1 the Brownian motion paths are not of bounded variation.
By the scaling property of SBM, it follows that the quadratic variation
of SBM over [0, t] is t w.p. 1 for any t > 0.
15.2.6
Brownian motion and martingales
There are three natural martingales associated with Brownian motion.
Theorem 15.2.5: Let {B(t) : t ≥ 0} be SBM. Then
(i) (Linear martingale) {B(t) : t ≥ 0} is a martingale.
(ii) (Quadratic martingale) {B 2 (t) − t : t ≥ 0} is a martingale.
(iii) (Exponential martingale) For any θ real, {eθB(t)−
martingale.
θ2
2
t
: t ≥ 0} is a
Proof: (i) and (ii). Since B(t) ∼ N (0, t),
E|B(t)| < ∞ and E|B(t)|2 < ∞.
By the stationary independent increments property for any t ≥ 0, s ≥ 0,
!
"
E B(t + s) | B(u) : u ≤ t
!!
"
"
= E B(t + s) − B(t) | B(u) : u ≤ t + B(t)
= 0 + B(t) = B(t) establishing (i).
Next,
=
=
"
!
E B 2 (t + s) | B(u) : u ≤ t
"
!!
"2
E B(t + s) − B(t) | B(u) : u ≤ t
!!
"
"
+ B 2 (t) + 2E B(t + s) − B(t) B(t) | B(u) : u ≤ t
s + B 2 (t) + 0
and hence
"
!
E B 2 (t + s) − (t + s) | B(u) : u ≤ t = B 2 (t) − t, establishing (ii).
(iii)
J
(
'
θ2
J
E eθB(t+s)− 2 (t+s) J B(u) : u ≤ t
J
(
'
θ2
θ2
J
= E eθ(B(t+s)−B(t))− 2 s J B(u) : u ≤ t eθB(t)− 2 t .
!
"
Again by using the fact that B(t + s) − B(t) given B(u) : u ≤ t is N (0, s),
the first term on the right side becomes 1 proving (iii).
!
502
15. Stochastic Processes
15.2.7
Some applications
The martingales in Theorem 15.2.5 combined with the optional stopping
theorems of Chapter 13 yield the following applications.
(i) Exit probabilities
Let B(·) be SBM. Fix a < 0 < b. Let Ta,b = min{t : t > 0, B(t) =
a or b}. From (i) and the optional sampling theorem, for any t > 0;
EB(Ta,b ∧ t) = EB(0) = 0.
(2.19)
Also, by continuity, B(Ta,b ∧ t) → B(Ta,b ). By bounded convergence
theorem, this implies
EB(Ta,b ) = 0
(2.20)
i.e., a p + b(1 − p) = 0 where p = P (Ta < Tb ) = P (B(·) reaches a
b
before b). Thus, p = (b−a)
.
(ii) Mean exit time
From (ii) and the optional sampling theorem
"
!
E B 2 (Ta,b ∧ t) − (Ta,b ∧ t) = 0
i.e.,
EB 2 (Ta,b ∧ t) = E(Ta,b ∧ t).
(2.21)
By using the bounded convergence theorem on the left and the monotone convergence theorem on the right, one may conclude
EB 2 (Ta,b ) = ETa,b
i.e.,
ETa,b
= pa2 + (1 − p)b2
b
(−a)
+ b2
= a2
(b − a)
(b − a)
= (−ab).
(2.22)
(iii) The distribution of Ta,b
From (iii) and the optimal sampling theorem
(
'
θ2
E eθB(Ta,b ∧t)− 2 Ta,b = 1.
By the bounded convergence theorem, this implies
'
(
θ2
E eθB(Ta,b )− 2 Ta,b = 1.
(2.23)
15.2 Brownian motion
503
In particular, if b = −a this reduces to
(
'
(
'
θ2
θ2
1 = E eθa− 2 Ta,−a : Ta < T−a + E e−θa e− 2 Ta,−a : Ta > T−a
(
" 1 ' θ2
!
= eθa + e−θa E e− 2 Ta,−a ,
2
since by symmetry
(
' θ2
(
' θ2
E e− 2 Ta,−a : Ta < T−a = E e− 2 Ta,−a : Ta > T−a .
Thus, for λ ≥ 0
'
(
' √
(−1
√
E e−λTa,−a = 2 e 2λa + e− 2λa
.
Similarly, it can be shown that for λ ≥ 0, a > 0
√
!
"
E e−λTa = e− 2λa .
15.2.8
(2.24)
(2.25)
The Black-Scholes formula for stock price option
Let X(t) denote the price of one unit of a stock S at time t. Due to fluctuations in the market place, it is natural to postulate that {X(t) : t ≥ 0}
is a stochastic process. To build an appropriate model consider the discrete time case first. If Xn denotes the unit price at time n, it is natural
to postulate that Xn+1 = Xn yn+1 where yn+1 represents the effects of
the market fluctuation in the time interval [n, n + 1). This leads to the
formula Xn = X0 y1 y2 · · · yn . If one assumes that {yn }n≥1 are sufficiently
independent, then
nµ+
X n = X0 e
n
"
(log yi −µ)
i=1
is, by the central limit theorem, approximately Gaussian, leading one to
consider a model of the form
X(t) = X(0)eµt+σB(t)
(2.26)
where {B(t) : t ≥ 0} is SBM. Thus, {log X(t) − log X(0) : t ≥ 0} is
postulated to be a Brownian motion with drift µ and diffusion σ. In the
language of finance, µ is called the growth rate and σ the volatility rate.
The so-called European option allows one to buy the stock at a future
time t0 for a unit price of K dollars at time 0. If X(t0 ) < K then one has
the option of not buying, whereas if X(t0 ) ≥ K, then one can buy it at
K dollars and sell it immediately at the market price X(t0 ) and realize a
profit of X(t0 ) − K. Thus the net revenue from this option is
/
0
if X(t0 ) ≤ K
X̃(t0 ) =
(2.27)
X(t0 ) − K if X(t0 ) > K.
504
15. Stochastic Processes
Since the value of money depreciates over time, say at rate r, the net
revenue’s value at time 0 is X̃(t0 )e−t0 r . So a fair price for this European
option is
p0
= E X̃(t0 )e−t0 r
= E(X(t0 ) − K)+ e−t0 r .
(2.28)
Here the constants µ, σ, K, t0 , r are assumed known. The goal is to compute p0 . This becomes feasible if one makes the natural assumption of no
arbitrage. That is, the discounted value of the stock, i.e., X(t)e−rt , evolves
as a martingale. This is a reasonable assumption as otherwise (if it is advantageous) then everybody will want to take advantage of it and start buying
the stock, thereby driving the price down and making it unprofitable.
Thus, in effect, this assumption says that
X(t)e−rt ≡ X(0)eµt+σB(t)−rt
evolves as a martingale. But recall that if B(·) is an SBM then for any θ
θ2
real, eθB(t)− 2 t evolves as a martingale. Thus, µ, σ, r should satisfy the
2
condition − σ2 = (µ − r). With this assumption, the fair price for this
European option with µ, σ, r, K, t0 given is
p0
=
=
!
"+ "
!
E e−t0 r X0 eσB(t0 )+µt0 − K
%
!
" y2
1
√
X0 eσy+µt0 − K e− 2t0 dy. (2.29)
e−t0 r
2πt0
X0 eσy+µt0 >K
This is known as the Black-Scholes formula.
For more detailed discussions on Brownian motion including the development of Ito stochastic integration and diffusion processes via a martingale
formulation, the books of Stroock and Varadhan (1979) and Karlin and
Taylor (1975) should be consulted. See also Karatzas and Shreve (1991).
15.3 Problems
15.1 Let {Lj }j≥0 be as in Section 15.1.1. Show that for any θ ≥ 0
.
- n
'
(
"
< λyj
−θ n
Lj
j=0
(a) E e
=E
θ+λy
j=0
j
∞
'
(
"∞
)
1
(b) E e−θ j=0 Lj = 0 for all θ > 0 iff
= ∞ w.p. 1 assumλ
j=0 yj
ing 0 < λi < ∞ for all i.
15.3 Problems
505
15.2 Let L be an exponential random variable. Verify that for any x > 0,
u>0
P (L > x + u | L > x) = P (L > u).
(This is referred to as the “lack of memory” property.)
15.3 Solve the Kolmogorov’s forward and backward equations for the following special cases of birth and death processes:
(a) Poisson process: αi ≡ α, βi ≡ 0,
(b) Yule process: αi ≡ iα, βi ≡ 0,
(c) On-off process: α0 = α, αi = 0, i ≥ 1,
β1 = β, βi = 0, i = 0, 2, . . . ,
(d) M/M/1 queue: αi = α, i ≥ 0, βi = β, i ≥ 1, β0 = 0,
(e) M/M/s queue: αi = α, i ≥ 0, βi = iβ, 1 ≤ i ≤ s and = sβ,
i > s,
β0 = 0,
(f) Pure death process: βi ≡ β, i ≥ 1, β0 = 0, αi = 0, i ≥ 0.
15.4 Find the stationary distributions when they exist for the processes in
Problem 15.3.
15.5 Consider 2 independent M/M/1 queues with arrival rate λ, service
rate µ (Case I), and one M/M/1 queue with arrival rate 2λ and service
rate 2µ (Case II). Assume λ < µ. Show that in the stationary state
the mean number in the system Case I is larger than in Case 2 and
their ratio approaches 2 as ρ = µλ ↑ 1.
15.6 Show that for any finite state space irreducible CTMC {X(t) : t ≥ 0}
with all λi ∈ (0, ∞), there is a unique stationary distribution.
15.7 (M/M/∞ queue). This is a birth and death process such that αn ≡
α, βn = nβ, n ≥ 0, 0 < α, β < ∞. Show that this process has a
stationary distribution that is Poisson with mean ρ = µλ .
15.8 (a) Let {X(t)}t≥0 be a Poisson process with rate λ. Let L be an
exponential random variable with mean µ−1 and independent
of {X(t)}t≥0 . Let N (t) = X(t + L) − X(t). Find the distribution
of N (t).
(b) Let {Y (t)}t≥0 be also a Poisson process with rate µ and independent of {X(t)}t≥0 in (a). Let T and T $ be two successive ‘event
epochs’ for the {Y (t)}t≥0 process. Let N = X(T $ ) − X(T ). Find
the distribution of N .
(c) Let {X(t)}t≥0 be as in (a). Let τ0 = 0 < τ1 < τ2 < · · · be the
successive event epochs of {X(t)}t≥0 . Find the joint distribution
of (τ1 , τ2 , . . . , τn ) conditioned on the event {N (t) = 1} for some
0 < t < ∞.
506
15. Stochastic Processes
15.9 Let {X(t) : t ≥ 0} be a Poisson process with rate λ. Suppose at
each event epoch of the Poisson process an experiment is performed
that results in one of k possible outcomes {ai : 1 ≤ i ≤ k} with
probability distribution {pi : 1 ≤ i ≤ k}. Let Xi (t) = outcomes ai in
[0, t]. Assume the experiments are iid. Show that {Xi (t) : t ≥ 0} are
independent Poisson processes with rate λpi for 1 ≤ i ≤ k.
15.10 Let {X(t) : t ≥ 0} be a Poisson process with rate λ, 0 < λ < ∞.
Let {ξi }i≥1 be a sequence of iid random variables independent of
{X(t) : t ≥ 0} with values in a measurable space (S, S). For each
A ∈ S define
N (t)
)
I(ξj ∈ A), t ≥ 0.
N (A, t) ≡
j=1
(a) Verify that for each A ∈ S, {N (A, t)}t≥0 is a Poisson process
and find its rate.
(b) Show that if A1 , A2 ∈ S, A1 ∩ A2 = S, then the two Poisson
processes {N (Ai , t)}t≥0 , i = 1, 2 are independent.
(c) Show that for each t > 0, {N (A, t) : A ∈ S} is a Poisson
random field on S, i.e., for each A, N (A, t) is Poisson and for
A1 , A2 , . . . , Ak pairwise disjoint elements of S, {N (Ai , t)}k1 are
independent.
(d) Show that {N (·, t)}t≥0 is a process with stationary independent
increments that is Poisson random measure valued.
15.11 Let BN (·, ω) be as in (2.1). Show that {BN (·, ω)}N ≥1 is Cauchy in
the Banach space C[0, 1] with sup norm by completing the following
steps:
(a) If ξnj (t, ω) = Znj (ω)Snj (t) then
(i) 7ξnj (·, ω)7 ≡ sup{|ξnj (t, ω)| : 0 ≤ t ≤ 1} = |Znj (ω)|2−
(ii)
sup
n
/ 2)
−1
j=1
|ξnj (t, ω) : 0 ≤ t ≤ 1
(n+1)
2
0
= (max{|Znj (ω)| : 1 ≤ j ≤ 2n − 1})2−
(n+1)
2
,
(b) for any sequence {ηi }i≥1 of random variables with supi E(eηi ) <
∞, w.p. 1, ηi ≤ 2 log i for all large i,
(c) w.p. 1 there is a C < ∞ such that
max{|Znj (ω)| : 1 ≤ j ≤ 2n − 1} ≤ Cn.
15.3 Problems
507
(d)
∞ 2)
−1
)
n
n=1 j=1
7ξnj (·, ω)7 < ∞ w.p. 1.
15.12 Show that if B(·) is SBM
T
(
'
2
t2
.
P B(t) = 0 for some t in (t1 , t2 ) = arcsin
π
t1
(Hint: Conditioned on B(t1 ) = x )= 0, the required probability equals
!
"
P (T|x| ≤ t2 − t1 ) = 2 1 − Φ( √t|x|
) and hence the unconditional
2 −t1
!
|B(t1 )| "
) .)
probability is E2 1 − Φ( √
t2 −t1
15.13 Use the reflection principle to find P (M (t) ≥ x, B(t) ≤ y) for x > y
where M (t) = max{B(u) : 0 ≤ u ≤ t} and B(·) is SBM.
15.14 For a < 0 < b < c find P (Tb < Ta < Tc ) where Tx = min{t : t >
0, B(t) = x} where B(·) is SBM.
15.15 Let B0 (t) ≡ B(t)−tB(1), 0 ≤ t ≤ 1 (where {B(t) : t ≥ 0} is SBM)
! t be
"
,
the Brownian bridge. Find the distribution of X(t) ≡ (1 + t)B0 1+t
t ≥ 0.
(Hint: X is a Gaussian process. Find its mean and covariance functions.)
15.16 Let B(·) be SBM. Let Mn = sup{|B(t) − B(n)| : n − 1 ≤ t ≤ n},
n = 1, 2, . . ..
(a) Show that
Mn
n
→ 0 w.p. 1 as n → ∞.
(Hint: Show {Mn }n≥1 are iid and EM1 < ∞.)
(b) Using this show that B(t)
t → 0 w.p. 1 as t → ∞ and give another
proof of the time inversion result 15.2.3.
15.17 Use the exponential martingale to find E(e−λT ) where T = inf{t :
t ≥ 0, B(t) ≥ α + βt}, λ > 0, α > 0, β > 0 and B(·) SBM.
15.18 Let {Y (t) : −∞ < t < ∞} be the Ornstein-Uhlenbeck process as
defined in 15.2.4. Let f : R → R be Borel measurable and E|f (Z)| <
"
&t !
∞ where Z ∼ N (0, 1). Evaluate lim 1t 0 f Y (u) du.
t→∞
(Hint: Show that Y (·) is a regenerative stochastic process.)
16
Limit Theorems for Dependent
Processes
16.1 A central limit theorem for martingales
Let {Xn }n≥1 be a sequence of random variables on (Ω, F, P ), and let
{Fn }n≥1 be a filtration, i.e., a sequence of σ-algebras on Ω such that
Fn ⊂ Fn+1 ⊂ F for all n ≥ 1. From Chapter 13, recall that {Xn , Fn }n≥1
is called a martingale if Xn is Fn -measurable for each n ≥ 1 and E(Xn+1 |
Fn ) = Xn for each n ≥ 1.
Given a martingale {Xn , Fn }n≥1 , define
Y1
Yn
= X1 − EX1 ,
= Xn − Xn−1 , n ≥ 1.
Note that each Yn is Fn -measurable and
E(Yn | Fn−1 ) = 0
for all n ≥ 1,
(1.1)
where F0 = {Ω, ∅}.
Definition 16.1.1: Let {Yn }n≥1 be a collection of random variables
on a probability space (Ω, F, P ) and let {Fn }n≥1 be a filtration. Then,
{Yn , Fn }n≥1 is called a martingale difference array (mda) if Yn is Fn measurable for each n ≥ 1 and (1.1) holds.
For example, if {Yn }n≥1 is a sequence of zero mean independent random
variables, then {Yn , Fn }n≥1 is a mda w.r.t. the natural filtration Fn =
σ1Y1 , . . . , Yn 2, n ≥ 1. Other examples of mda’s can be constructed from the
510
16. Limit Theorems for Dependent Processes
examples given in Chapter 13. The main result of this section shows that
for square-integrable mda’s satisfying a Lindeberg-type condition, the CLT
holds. For more on limit theorems for mdas, see Hall and Heyde (1980).
Theorem 16.1.1: For each n ≥ 1, let {Yni , Fni }i≥1 be a mda on (Ω, F, P )
2
< ∞ for all i ≥ 1 and let τn be a finite stopping time w.r.t.
with EYni
{Fni }i≥1 . Suppose that for some constant σ 2 ∈ (0, ∞),
τn
'
(
)
J
2 J
Fn,i−1 −→p σ 2
E Yni
as
i=1
(1.2)
n→∞
and that for each # > 0,
∆n (#) ≡
Then,
τn
(
'
)
J
2
E Yni
I(|Yni | > #) J Fn,i−1 −→p 0
as
i=1
τn
)
i=1
n → ∞.
Yni −→d N (0, σ 2 ).
(1.3)
(1.4)
Proof: First the theorem will be proved under the additional condition
that
τn
=
mn for all n ≥ 1 for some nonrandom sequence of
positive integers {mn }n≥1
(1.5)
and that for some c ∈ (0, ∞),
mn
(
'
)
J
2 J
Fn,i−1 ≤ c w.p. 1.
E Yni
(1.6)
i=1
"
! 2
2
Let σni
= E Yni
| Fn,i−1 , i ≥ 1, n ≥ 1. Also, write m for mn to ease the
2
notation. Since σni
is Fn,i−1 -measurable, for any t ∈ R,
J
J
.
- )
m
J
"J
!
2 2
J
JE exp ιt
Y
t
/2
−
exp
−
σ
nj
J
J
j=1
J
J
- )
.
- m−1
.
m
)
J
"J
!
2
≤ JJE exp ιt
Ynj − E exp ιt
Ynj exp − t2 σnm
/2 JJ
j=1
j=1
J
- )
.
m
J
!
"
2
+ · · · + JJE exp ιtYn1 exp −
t2 σnj
/2
1
j=2
− exp
-
−
m
)
j=1
2
t2 σnj
/2
.2J
J
J
J
16.1 A central limit theorem for martingales
511
J
J
- )
.
m
J
J
2 2
2 2
J
+ JE exp −
t σnj /2 − exp(−t σ /2)JJ
j=1
m
J 3
J
4
)
J
J
J
2
E JE exp(ιtYnk ) J Fn,k−1 − exp(−t2 σnk
/2)J
≤
k=1
J
J
- )
.
m
J
J
2 2
2 2
J
+ JE exp −
t σnj /2 − exp(−t σ /2)JJ
j=1
≡ I1n + I2n ,
say.
(1.7)
By (1.2), (1.5), and the BCT,
I2n → 0
as
(1.8)
n → ∞.
To estimate I1n , note that for any 1 ≤ k ≤ n,
4
3
J
E exp(ιtYnk ) J Fn,k−1
=
=
" (ιt)2 ! 2
"
!
E Ynk | Fn,k−1 + θnk (t)
1 + ιtE Ynk | Fn,k−1 +
2
t2 2
1 − σnk
+ θnk (t)
(1.9)
2
and
"
!
t2 2
2
/2 = 1 − σnk
+ γnk , say.
exp − t2 σnk
2
It is easy to verify that
2
0
1
/
|tYnk |3 JJ
|θnk | ≤ E min (tYnk )2 ,
J Fn,k−1
6
and
(1.10)
! 2 "2
! 2
"
exp t2 σnk
/2 /8.
|γnk | ≤ t2 σnk
Hence, by (1.3), (1.6), (1.9), and (1.10), for any # in (0,1),
I1n
≤
m
3
4
)
E |θnk | + |γnk |
k=1
m
)
2
≤ t
k=1
J 3
4J
J
J
J
2
E JE Ynk
I(|Ynk | > #) J Fn,k−1 J
+ |t|3 # ·
m
m
J !
3
4
)
"JJ )
J
2
4
E JE Ynk
| Fn,k−1 J +
E t4 σnk
exp(t2 c/2)
k=1
≤ t2 E ∆n (#) + |t|3 · # · E
4
2
+ t exp(t c/2) E
1)
m
2
σnk
k=1
1- )
m
k=1
2
σnk
2
.3
k=1
max
l≤k≤m
2
σnk
42
.
512
16. Limit Theorems for Dependent Processes
Note that for any # > 0,
2
E max σnk
1≤k≤m
≤ #2 + E
2
8
3
49
!
"J
2
max E Ynk
I |Ynk | > # J Fn,k−1
1≤k≤m
≤ # + E∆n (#).
Hence, by (1.3), (1.6), and the BCT, for any # ∈ (0, 1),
lim sup I1n
n→∞
8
≤ lim sup t2 E∆n (#) + |t|3 # · c
n→∞
49
3
2
+ t4 cet c/2 #2 + E∆n (#)
≤ c1 (t) #
for some c1 (t) ∈ (0, ∞), not depending on #. Thus implies that
lim I1n = 0.
n→∞
(1.11)
Clearly (1.7), (1.8), and (1.11) yield (1.4), whenever (1.5) and (1.6) are
true.
Next, suppose that condition (1.6) is not assumed a priori but (1.5)
: $k
;
2
holds true. Fix c > σ 2 and define the sets Bnk =
i=1 σni ≤ c , and the
variables Y̌nk = Ynk IBnk , k ≥ 1, n ≥ 1. Note that Bnk ∈ Fn,k−1 and hence,
!
"
"
!
E Y̌nk | Fn,k−1 = IBnk E Ynk | Fn,k−1 = 0,
and
"
! 2
2
2
σ̌nk
≡ E Y̌nk
| Fn,k−1 = IBnk σnk
,
(1.12)
;
:
for all k ≥ 1. In particular, Y̌nk , Fn,k is a mda. Since Bn,k−1 ⊃ Bnk for
all k, by the definitions of the sets Bnk ’s, it follows that
m
)
2
σ̌nk
=
k=1
m
)
2
σnk
IBnk
k=1
=
m
)
2
σnk
IBnm +
k=1
+ ··· +
2
σn1
!
m−1
)
k=1
!
"
2
IBn,m−1 − IBnm
σnk
IBn1 − IBn2
"
"
!
"
!
≤ cIBnm + c IBn,m−1 − IBnm + · · · + cIBn1 − IBn2
≤ c.
(1.13)
:
;
Thus, the mda Y̌nk , Fnk k≥1 satisfies (1.6). Next note that by (1.2) and
(1.5),
! c "
P Bnm
→ 0 as n → ∞.
(1.14)
16.2 Mixing sequences
Also, by (1.12),
$m
k=1
2
σ̌nk
=
$m
k=1
m
)
k=1
:
;
513
2
σnk
on Bnm . Hence, it follows that
2
σ̌nk
−→p σ 2 ,
i.e., the mda Y̌nk , Fnk satisfies (1.2). Further, the inequality “|Y̌nk | ≤
|Ynk |” and the fact that the function h(x) :
= x2 I(|x| ;> #), x > 0 is nondecreasing jointly imply that (1.3) holds for Y̌nk , Fnk . Hence, by the case
already proved,
m
)
(1.15)
Y̌nk −→d N (0, σ 2 ).
But
$m
k=1
Y̌nk =
$m
k=1
k=1
Ynk on Bnm . Hence, by (1.14),
m
)
k=1
Ynk −→d N (0, σ 2 ),
(1.16)
and therefore, the CLT holds without the restriction in (1.6).
Next consider relaxing the restrictions in (1.5) (and (1.6)). Since P (τn <
∞) = 1, there exist positive integers mn such (Problem 16.2) that
!
"
P τn > mn → 0 as n → ∞.
(1.17)
Next define
Ỹnk = Ynk I(τn ≥ k), k ≥ 1, n ≥ 1.
(1.18)
It is easy to check (Problem 16.3) that {Ỹnk , Fnk } is a mda, and that
{Ỹnk , Fnk } satisfies (1.2) and (1.3) with τn replaced by mn (Problem 16.4).
Hence, by the previous case already proved,
mn
)
k=1
Ỹnk −→d N (0, σ 2 ).
Next note that (cf. (4.1), Proclem 16.4),
τn
)
k=1
Ynk −
mn
)
k=1
Ỹnk −→p 0
as
n → ∞.
Hence, (1.4) holds and the proof of the theorem is complete.
(1.19)
!
16.2 Mixing sequences
This section deals with a class of dependent processes, called the mixing
processes, where the degree of dependence decreases as the distance (in
514
16. Limit Theorems for Dependent Processes
time) between two given sets of random variables goes to infinity. The
‘degree of dependence’ is measured by various mixing coefficients, which
are defined in Section 16.2.1 below. Some basic properties of the mixing
coefficients are presented in Section 16.2.2. Limit theorems for sums of
mixing random variables are given in Section 16.2.3.
16.2.1
Mixing coefficients
Definition 16.2.1: Let (Ω, F, P ) be a probability space and G1 , G2 be
sub-σ-algebras of F.
(a) The α-mixing or strong mixing coefficient between G1 and G2 is defined as
J
;
:J
α(G1 , G2 ) ≡ sup JP (A ∩ B) − P (A)P (B)J : A ∈ G1 , B ∈ G2 . (2.1)
(b) The β-mixing coefficient or the coefficient of absolute regularity between G1 and G2 is defined as
k
β(G1 , G2 ) ≡
,
))J
J
1
JP (Ai ∩ Bj ) − P (Ai )P (Bj )J,
sup
2
i=1 j=1
(2.2)
where the supremum is taken over all finite partitions {A1 , . . . , Ak }
and {B1 , . . . , B, } of Ω by sets Ai ∈ G1 and Bj ∈ G2 , 1 ≤ i ≤ k,
1 ≤ j ≤ 2, 2, k ∈ N.
(c) The ρ-mixing coefficient or the coefficient of maximal correlation between G1 and G2 is defined as
:
(2.3)
ρ(G1 , G2 ) ≡ sup ρX1 ,X2 : Xi ∈ L2 (Ω, Gi , P ), i = 1, 2}
is the correlation coefficient of X1
where ρX1 ,X2 ≡ √ Cov(X1 ,X2 )
Var(X1 )Var(X2 )
and X2 .
It is easy to check (Problem 16.5 (a) and (d)) that all three mixing
coeffi:
cients take values in the interval [0, 1] and that ;
ρ(G1 , G2 ) = sup |EX1 X2 | :
Xi ∈ L2 (Ω, Gi , P )EXi = 0, EX12 = 1, i = 1, 2 . When the σ-algebras G1
and G2 are independent, these coefficients equal zero, and vice versa. Thus,
nonzero values of the mixing coefficients give various measures of the degree
of dependence between G1 and G2 . It is easy to check (Problem 16.5 (c))
that
(2.4)
α(G1 , G2 ) ≤ β(G1 , G2 ) and α(G1 , G2 ) ≤ ρ(G1 , G2 ).
However, no ordering between the β(G1 , G2 ) and ρ(G1 , G2 ) exists, in general
(Problem 16.6). There are two other mixing coefficients that are also often
used in the literature. These are given by the φ-mixing coefficient
J
;
:J
φ(G1 , G2 ) ≡ sup JP (A) − P (A | B)J : B ∈ G1 , P (B) > 0, A ∈ G2 , (2.5)
16.2 Mixing sequences
515
and the Ψ-mixing coefficient
Ψ(G1 , G2 ) ≡
sup
A∈G1∗ ,B∈G2∗
|P (A ∩ B) − P (A)P (B)|
,
P (A)P (B)
(2.6)
where P (A | B) = P (A ∩ B)/P (B) for P (B) > 0, and where Gi∗ = {A : A ∈
Gi , P (A) > 0}, i = 1, 2. It is easy to check that Ψ(G1 , G2 ) ≥ φ(G1 , G2 ) ≥
β(G1 , G2 ).
Definition 16.2.2: Let {Xi }i∈Z be a (doubly-infinite) sequence of random
variables on a probability space (Ω, F, P ). Then, the strong- or α-mixing
coefficient of {Xi }i∈Z , denoted by αX (·), is defined by
! :
;
:
;"
αX (n) ≡ sup α σ1 Xj : j ≤ i, j ∈ Z 2, σ1 Xj : j ≥ i+n, j ∈ Z 2 , n ≥ 1,
i∈Z
(2.7)
where the α(·, ·) on the right side of (2.7) is as defined in (2.1). The process
{Xi }i∈Z is called strongly mixing or α-mixing if
lim αX (n) = 0.
n→∞
(2.8)
The other mixing coefficients of {Xi }i∈Z (e.g., βX (·), ρX (·), etc.) are defined
similarly.
For a one-sided sequence {Xi }i≥1 , the α-mixing coefficient {Xi }i≥1 is
defined by replacing Z on the right side of (2.7) by N on all three occurrences. A similar modification is needed for the other mixing coefficients.
When there is no chance of confusion, the coefficients αX (·), βX (·), . . . , etc.,
will be written as α(·), β(·), . . . , etc., to ease the notation.
Another important notion of ‘weak’ dependence is given by the following:
Definition 16.2.3: Let m ∈ Z+ be an integer and {Xi }i∈Z be a collection
of random variables on (Ω, F, P ). Then, {Xi }i∈Z is called m-dependent if
for every k ∈ Z, {Xi : i ≤ k, i ∈ Z} and {Xi : i > k + m, i ∈ Z} are
independent.
Example 16.2.1: If {#i }i∈Z is a collection of independent random variables and Xi = (#i + #i+1 ), i ∈ Z, then {#i }i∈Z is 0-dependent and {Xi }i∈Z
is 1-dependent. It is easy to see that if {Xi }i∈Z is m-dependent for some
∗
∗
(n) = 0 for all n > m, where αX
∈ {αX , βX , ρX , φX , ΨX }.
m ∈ Z+ , then αX
Therefore, m-dependence of {Xi }i∈Z implies that the process {Xi }i∈Z is
∗
αX
-mixing. In this sense, the condition of m-dependence is the strongest
and the condition of α-mixing is the weakest among all weak dependence
conditions introduced here.
Example 16.2.2: Let {#i }i∈Z be a collection of iid random variables with
E#1 = 0, E#21 < ∞ and let
)
ai #n−i , n ∈ Z
(2.9)
Xn =
i∈Z
516
16. Limit Theorems for Dependent Processes
!
"
where ai ∈ R and ai = 0 exp(−ci ) as i → ∞, c ∈ (0, ∞). If #1 has an
integrable characteristic function, then {Xi }i∈Z is strongly mixing (Chanda
(1974), Gorodetskii (1977), Withers (1981), Athreya and Pantula (1986)).
Example 16.2.3: Let {Xi }i∈Z be a zero mean stationary Gaussian process. Suppose that {Xi }i∈Z has spectral density f : (−π, π) → [0, ∞), i.e.,
EX0 Xk =
%
π
−π
eιkx f (x)dx, k ∈ Z.
(2.10)
Then, αX (n) ≤ ρX (n) ≤ 2πα(n), n ≥ 1 and, therefore, {Xi }i∈Z is α-mixing
iff it is ρ-mixing (Ibragimov and Rozanov (1978), Chapter 4). Further,
{Xi }i∈Z is α-mixing iff the spectral density f admits the representation
J2
J
!
"
f (t) = Jp(eιt )J exp u(eιt ) + ṽ(eιt ) ,
(2.11)
where p(·) is a polynomial, u and v are continuous real-valued functions on
the unit circle in the complex plane, and ṽ is the conjugate function of u.
It is also known that if the Gaussian process {Xi }i∈Z is φ-mixing, then it is
necessarily m-dependent for some m ∈ Z+ . Thus, for Gaussian processes,
the condition of α-mixing is as strong as ρ-mixing and the conditions of φmixing and Ψ-mixing are equivalent to m-dependence. See Ibragimov and
Rozanov (1978) for more details.
16.2.2
Coupling and covariance inequalities
The mixing coefficients can be seen as measures of deviations from independence. The idea of coupling is to construct independent copies of a given
pair of random vectors on a suitable probability space such that the Euclidean distance between these copies admits a bound in terms of the mixing coefficient between the (σ-algebras generated by the) random vectors.
Thus, coupling gives a geometric interpretation of the mixing coefficients.
The first result is for β-mixing random vectors.
Theorem 16.2.1: (Berbee’s theorem). Let (X, Y ) be a random vector on
a probability space (Ω0 , F0 , P0 ) such that X takes values in Rd and Y in
Rs , d, s ∈ N. Then, there exist an enlarged probability space (Ω, F, P ) and
a s-dimensional random vector Y ∗ such that
(i) (X, Y , Y ∗ ) are defined on (Ω, F, P ),
(ii) Y ∗ is independent of X under P and (X, Y ) have the same distribution under P and P0 ,
(iii) P (Y )= Y ∗ ) = β(σ1X2, σ1Y 2).
16.2 Mixing sequences
517
Proof: See Corollary 4.2.5 of Berbee (1979).
A weaker version of the above result is available for α-mixing random
variables, where the difference between Y and its independent copy admits
a bound in terms of the α-mixing coefficient.
Theorem 16.2.2: (Bradley’s theorem). In Theorem 16.2.1, assume s =
1 and 0 < E|Y |γ < ∞ for some 0 < γ < ∞. Then, for all 0 < y ≤
(E|Y |γ )1/γ ,
8 !
!
"
"92γ/1+2γ
P |Y − Y ∗ | ≥ y ≤ 18 α σ1X2, σ1Y 2
1/1+2γ
(E|Y |γ )
y −γ/(1+2γ) .
(2.12)
Proof: See Theorem 3 of Bradley (1983).
Next, some bounds on the covariance between mixing random variables
are established. These will be useful for deriving limit theorems for sums
of mixing random variables. For a random variable X, define the function
:
;
QX (u) = inf t : P (|X| > t) ≤ u , u ∈ (0, 1).
(2.13)
Thus, QX (u) is the quantile function of |X| at (1 − u).
Theorem 16.2.3: (Rio’s inequality). Let X and Y be two random vari&1
ables with 0 QX (u)QY (u)du < ∞. Then,
% 2α
J
J
JCov(X, Y )J ≤ 2
QX (u)QY (u)du
(2.14)
0
!
"
where α = α σ1X2, σ1Y 2 .
Proof: By Tonelli’s theorem,
1' % X + (' % Y + (2
+ +
EX Y
= E
du
du
0
- % 0∞ % ∞
.
+
+
= E
I(X > u)I(Y > v)dudv
% ∞ %0 ∞ 0
=
P (X > u, Y > v)dudv
0
+
&∞
0
and similarly, EX = 0 P (X > u)du. Hence, by (2.1), it follows that
J
J
JCov(X + , Y + )J
J% ∞% ∞
J
J
J
B
C
J
= J
P (X > u, Y > v) − P (X > u)P (Y > v) dudv JJ
0
0
% ∞
% ∞
:
;
≤
min α, P (X > u), P (Y > v) dudv.
(2.15)
0
0
518
16. Limit Theorems for Dependent Processes
Next note that for any real numbers a, b, c, d
(α ∧ a ∧ c) + (α ∧ a ∧ d) + (α ∧ b ∧ c) + (α ∧ b ∧ d)
≤ [2(α ∧ a)] ∧ (c + d) + [2(α ∧ b)] ∧ (c + d)
≤ 2[2α ∧ (a + b) ∧ (c + d)].
(2.16)
Now using (2.15), (2.16), and the identity Cov(X, Y ) = Cov(X + , Y + ) +
Cov(X − , Y − ) − Cov(X + , Y − ) − Cov(X − , Y + ), one gets
J
J
JCov(X, Y )J
% ∞% ∞
:
;
≤ 2
min 2α, P (|X| > u), P (|Y | > v) dudv. (2.17)
0
0
Hence, it is enough to show that the right sides of (2.14) and (2.17) agree. To
that end, let U be !a Uniform (0,1) "random variable and define (W1 , W2 ) =
(0, 0)I(U ≥ 2α) + QX (U ), QY (U ) I(U < 2α). Then
EW1 W2 =
%
0
2α
QX (u)QY (u)du.
(2.18)
On the other hand, noting that QX (a) > t iff P (|X| > t) > a, one has
EW1 W2
=
=
=
%
∞
%
∞
%0 ∞ %0 ∞
%0 ∞ %0 ∞
0
0
!
"
P W1 > u, W2 > v dudv
!
"
P U < 2α, QX (U ) > u, QY (U ) > v dudv
:
;
min 2α, P (|X| > u), P (|Y | > v) dudv.
Hence, the theorem follows from (2.17), (2.18), and the above identity. !
Corollary 16.2.4:
Let X and Y be two random variables with
α(σ1X2, σ1Y 2) = α ∈ [0, 1].
(i) (Davydov’s inequality). Suppose that E|X|p < ∞, E|Y |q < ∞ for
some p, q ∈ (1, ∞) with p1 + 1q < 1. Then, E|XY | < ∞ and
where
1
r
J
J
!
" !
"
JCov(X, Y )J ≤ 2r(2α)1/r E|X|p 1/p E|Y |q 1/q ,
=1−
!1
p
+
1
q
"
(2.19)
.
!
(ii) If P |X| ≤ c1 ) = 1 = P (|Y | ≤ c2 ) for some constants c1 , c2 ∈ (0, ∞),
then
J
J
JCov(X, Y )J ≤ 4c1 c2 α.
(2.20)
16.3 Central limit theorems for mixing sequences
519
Proof: Let a = (E|X|p )1/p and b = (E|Y |q )1/q . W.l.o.g., suppose that
a, b ∈ (0, ∞). Then, by Markov’s inequality, for any 0 < u < 1,
!
"
R
P |X| > au−1/p ≤ E|X|p (au−1/p )p = u
and hence, QX (u) ≤ au−1/p . Similarly, QY (u) ≤ bu−1/q , 0 < u < 1. Hence,
by Theorem 16.2.3,
J
J
JCov(X, Y )J ≤
=
2
%
2α
0
ab u−1/p−1/q du
2ab(2α)1−1/p−1/q
S'
1−
1 1(
−
.
p q
which is equivalent to (2.19).
The proof of (2.20) is a direct consequence of Rio’s inequality and the
!
bounds QX (u) ≤ c1 and QY (u) ≤ c2 for all 0 < u < 1.
16.3 Central limit theorems for mixing sequences
In this section, CLTs for sequences of random variables satisfying different
mixing conditions are proved.
Proposition 16.3.1: Let {Xi }i∈Z be a collection of random variables with
strong mixing coefficient α(·).
$∞
(i) Suppose that
n=1 α(n) < ∞ and for some c ∈ (0, ∞), P (|Xi | ≤
c) = 1 for all i. Then,
∞
)
Cov(X1 , Xn+1 ) converges absolutely.
(3.1)
n=1
$∞
δ/2+δ
(ii) Suppose that
< ∞ and supi∈Z E|Xi |2+δ < ∞ for
n=1 α(n)
some δ ∈ (0, ∞). Then, (3.1) holds.
Proof: A direct consequence of Corollary 16.2.4.
!
Next suppose that the$
collection
of random variables
{Xi }i∈Z is stationJ
∞ J
ary and that Var(X1 ) + n=1 JCov(X1 , X1+n )J < ∞. Then by the DCT,
nVar(X̄n )
=
-)
.
n
n−1 Var
Xi
= n
−1
1)
n
i=1
i=1
Var(Xi ) + 2
)
1≤i<j≤n
Cov(Xi , Xj )
2
520
16. Limit Theorems for Dependent Processes
=
1
2
n−1
n−i
))
n−1 n Var(X1 ) + 2
Cov(Xi , Xi+k )
i=1 k=1
1
= n−1 n Var(X1 ) + 2
2
−→ σ∞
≡
Var(X1 ) + 2
n−1
)
k=1
∞
)
k=1
(n − k)Cov(X1 , X1+k )
Cov(X1 , X1+k )
as
2
n → ∞. (3.2)
In particular, under the conditions of part (i) or part (ii) of Proposition
16.3.1,
√
2
lim Var( n X̄n ) exists and equals σ∞
.
n→∞
2
> 0. Indeed, it is not difficult
In general, it is not guaranteed that σ∞
to construct an example of a stationary strong mixing sequence {Xn }n≥1
2
such that σ∞
= 0 (Problem 16.8). However, in addition to the conditions
of
√
2
Proposition 16.3.1, if it is assumed that σ∞
> 0, then a CLT for n(X̄n −
EX1 ) holds in the stationary case; see Corollary 16.3.3 and 16.3.6 below.
A classical method of proving the CLT (and other limit theorems) for
mixing random variables is based on the idea of blocking, introduced by
S. N. Bernstein. Intuitively, the ‘blocking’ approach $
can be described as
n
follows: Suppose, µ = EX1 = 0. First, write the sum i=1 Xi in terms of
alternating sums of ‘big blocks’ Bi ’s (of length ‘p’ say) and ‘little blocks’
Li ’s (of length ‘q’ say) as
n
)
i=1
Xi
=
" !
"
!
X1 + · · · + Xp + Xp+1 + · · · + Xp+q
"
!
+ Xp+q+1 + · · · + X2p+q + · · ·
= B1 + L1 + B2 + L2 + · · · + (BK + LK ) + Rn ,
where the last term Rn is the excess (if any) over the last complete pair of
big- and little-blocks (BK , LK ). Next, group together the Bi ’s and Li ’s to
write
n
K
K
√
1 )
1 )
1 )
√
Xi = √
Bj + √
Lj + Rn / n.
(3.3)
n i=1
n j=1
n j=1
$K
If q ; p ; n, then, the number of Xi ’s in j=1 Lj and in Rn are of smaller
order than n, the total number of Xi ’s. Using this, one can show that the
contribution of the last two terms in (3.3) to the limit is negligible, i.e.,
.
- K
1 )
√
Lj + Rn −→p 0.
n j=1
$K
To handle the first term, √1n j=1 Bi , note that the Bj ’s are functions of
disjoint collections of Xj ’s that are separated by a distance of q or more.
16.3 Central limit theorems for mixing sequences
521
By letting q → ∞ suitably and using the mixing condition, one can replace
the Bj ’s by their independent copies, and appeal to the Lindeberg CLT for
sums of independent random variables to conclude that
K
!
"
1 )
2
√
Bj −→d N 0, σ∞
.
n j=1
Although the blocking approach is described here for stationary random
variables, with minor modifications, it is applicable to certain nonstationary
sequences as shown below.
Theorem 16.3.2: Let {Xn }n≥1 be a sequence of random variables (not
necessarily stationary) with strong mixing coefficient α(·). Suppose that
there exist constants σ02 , c ∈ (0, ∞) such that
P (|Xi | ≤ c) = 1
for all
J
J
- j+n−1
.
)
J −1
J
2J
J
γn ≡ sup Jn Var
Xi − σ 0 J → 0
j≥1
as
i=j
and that
∞
)
n=1
Then,
(3.4)
i ∈ N,
n → ∞,
(3.5)
(3.6)
α(n) < ∞.
"
√ !
n X̄n − µ̄n −→d N (0, σ02 ) as n → ∞
$n
where µ̄n = E X̄n , and X̄n = n−1 i=1 Xi , n ≥ 1.
(3.7)
An important special case of Theorem 16.3.2 is the following:
of stationary bounded random
Corollary 16.3.3:
$∞ If {Xn }n≥1 is a sequence
2
of (3.2) is positive, then, with
variables with n=1 α(n) < ∞, and if σ∞
µ = EX1 ,
"
√ !
2
n X̄n − µ −→d N (0, σ∞
) as n → ∞.
(3.8)
2
(cf.
Proof: For stationary random variables, (3.5) holds with σ02 = σ∞
(3.2)). Hence, the Corollary follows from Theorem 16.3.2.
!
For proving the theorem, the following auxiliary result will be used.
Lemma 16.3.4:
Then,
sup E
m≥1
Suppose that the conditions of Theorem 16.3.2 hold.
1 m+n−1
)
i=m
(Xi − EXi )
24
= o(n3 )
as
n → ∞.
522
16. Limit Theorems for Dependent Processes
Proof: W.l.o.g., let EXi = 0 for all i. Note that for any m ∈ N,
.4
- n+m−1
)
Xj
E
j=m
)
=
EXj4 + 6
j
EXi2 Xj2 + 4
i<j
)
+6
)
EXi2 Xj Xk
+
i.=j.=k
)
EXi3 Xj
i.=j
EXi Xj Xk X,
i.=j.=k.=,
I1n + · · · + I5n , say,
≡
)
(3.9)
where the indices i, j, k, 2 in the above sums lie between m and m + n − 1.
By (3.4),
J J J J J J
JI1n J + JI2n J + JI3n J ≤ n · c4 + 7n(n − 1)c4 ≤ 7n2 c4 .
(3.10)
By Corollary 16.2.4 (ii), noting that EXi = 0 for all i,
) 3J
J J
J J
J J
J4
JI4n J ≤ 12
JEXi2 Xj Xk J + JEXi Xj2 Xk J + JEXi Xj Xk2 J
i<j<k
≤ 12
≤
) :
i<j<k
4 2
144c n
1 n−1
)
2
≤
i<j<k<,
≤ 24
=
)
(4!)
)
i<j<k<,
96c4
J
J
JEXi Xj Xk X, J
B
C
4c4 α(j − i) ∧ α(2 − k)
n−3
) n−2
)
n−1
) n−k
)
i=1 j=i+1 k=j+1 r=1
=
96c4
n−3
) n−2−i
)
≤ 96c4 n2
=
4 2
96c n
n−1
)
B
C
α(j − i) ∧ α(r)
n−k
)
s=1 k=i+s+1 r=1
n
n
))
i=1
s=1 r=1
1)
n
≤ 192c4 n2
s=1
n
)
r=1
;
(3.11)
α(r) .
r=1
Similarly,
J J
JI5n J
4c4 α(j − k) + 4c4 α(j − k) + 4c4 α(j − i)
B
C
α(s) ∧ α(r)
B
C
α(s) ∧ α(r)
α(s) + 2
n−1
)
n
)
s=1 r=s+1
rα(r)
α(r)
2
16.3 Central limit theorems for mixing sequences
1
2
∞
)
)
≤ 192c4 n2 n1/2
α(r) + n
α(r) .
(3.12)
√
r≥ n
r=1
523
$
Since r≥√n α(r) = o(1) and the bounds in (3.9)–(3.12) do not depend on
m, Lemma 16.3.4 follows.
!
Proof of Theorem 16.3.2: W.l.o.g., let EXj = 0 for all j. Let p ≡ pn ,
q ≡ qn = <n1/2 =, n ≥ 1 be integers satisfying
q/p + p/n = o(1)
as n → ∞,
(3.13)
when a choice of {pn } will be specified later. Let r = p + q and K = <n/r=.
As outlined earlier, for i = 1, . . . , K, let
(i−1)r+p
)
=
Bi
Xj ,
j=(i−1)r+1
ir
)
=
Li
Xj ,
j=(i−1)r+p+1
and let
Rn =
n
)
j=1
Xj −
K
)
(Bi + Li )
i=1
respectively denote the sums over the ‘big blocks,’ the ‘little blocks,’ and
the ‘excess’ (cf. (3.3)). Thus
√
K
K
1 )
1 )
1
nX̄n = √
Bi + √
Li + √ Rn .
n i=1
n i=1
n
(3.14)
Since Rn is a sum over (n − Kr) ≤ r consecutive Xj ’s, by Corollary
16.2.4 (ii),
!
√ "2
E Rn / n
≤ n
−1
1 )
n
EXj2
2
)J
J
J
J
EXi Xj
+
i.=j
j=Kr+1
1
≤ 8c2 n−1 (n − Kr) + (n − Kr)
=
O
'r(
n
→0
as
n → ∞.
∞
)
k=1
2
α(k)
(3.15)
Next note that the variables Li and LB i+k depend on two
C disjoint sets of
Xj ’s that are separated by a distance of (i+k−1)r+p−ir = (k−1)r+p >
524
16. Limit Theorems for Dependent Processes
kp. Hence, by (3.5), Corollary 16.2.4 (ii) and the monotonicity of α(·),
.
-)
K
√ 2
Li / n
E
i=1
−1
≤ n
≤ n
≤ n
−1
−1
1)
K
1
EL2i
+2
i=1
Kq
Kq
-
1
!
!
K−1
) K−i
)
i=1 k=1
σ02
σ02
"
+ γq + 2K
K−1
)
k=1
"
+ γq + 8qc
2
= O
as
n
Kp
p
+
→ 1 as n → ∞.
' q (2 (
p
∞
)
j=0
→0
as
2
2
4q c α(kp)
2 2
p
K−1
) /)
k=1
= O n−1 Kq + n−1 Kq 2 p−1
'q
J
J
JELi Li+k J
j=1
.
α(j)
0S 2
α(kp − j)
p
(3.16)
n → ∞,
!
$K
Next consider the term √1n i=1 Bi . Note that α σ1Bj 2, σ1{Bi : i ≥
"
j + 1}2 ≤ α(q) for all 1 ≤ j < K. By applying Corollary 16.2.4 (ii) to the
real and imaginary parts, one gets for any t ∈ R,
J
J
- )
. K
K
K
J
!
√
√ "J
JE exp ιt
Bj / n −
E exp ιtBj / n JJ
J
j=1
j=1
J
- )
.
- )
.J
K
K
J
!
√
√ "
√ J
≤ JJE exp ιt
Bj / n − E exp ιtB1 / n E exp ιt
Bj / n JJ
j=1
j=2
J K−2
J K
!
!
√ "3
√ "4
+ · · · + JJ
E exp ιtBj / n E exp ιt(BK−1 + BK )/ n
j=1
−
≤ 16 α(q) · K
'
n(
= O qα(q)
→0
pq
as
n → ∞.
K
K
j=1
J
!
√ "J
E exp ιtBj / n JJ
(3.17)
$∞
The last step follows by noting that n=1 α(n) < ∞ ⇒ nα(n) → 0 as n →
∞. Let B̃1 , . . . , B̃K be independent random variables such that B̃i =d Bi ,
1 ≤ i ≤ K. Note that by (3.5),
J
J )
J
J1 K
2
2J
J
E B̃i − σ0 J
Jn
i=1
16.3 Central limit theorems for mixing sequences
525
K
≤
≤
J J
J
1 ) JJ
EBi2 − pσ02 J + Jn−1 Kp − 1Jσ02
n i=1
Kp
γp + n−1 |Kp − n|σ02 → 0
n
as
n → ∞.
(3.18)
: ! $m+k−1 "4 −3
;
Xi k
: m ≥ 1 and
Next, for k ∈ N, let Γ1 (k) = sup E
i=m
Γ∗1 (k) = sup{Γ1 (j) : j ≥ k}. By Lemma 16.3.4, Γ∗1 (k) ↓ 0 as k → ∞. Now
:!
"−1/3
;V
U
∧ log n . Then, it is easy to check that (3.13)
set p = n1/2 Γ∗1 (q)
!
"2
holds and that n−1 p2 Γ∗1 (p) ≤ n−1 n1/2 Γ∗1 (q)−1/3 Γ∗1 (q) ≤ Γ∗1 (q)1/3 → 0
and n → ∞. Hence,
K
)
!
√ "4
E B̃i / n
= n−2
i=1
K
)
EBi4
i=1
≤ n−2 Kp3 Γ∗1 (p) → 0
as
n → ∞.
(3.19)
Using (3.18) and (3.19), it is easy
the; triangular array
: to√ check that √
of independent random variables B̃1 / n, . . . , B̃K / n n≥1 satisfies the
Lyapounov’s condition and hence,
K
1 )
√
B̃i −→d N (0, σ02 )
n i=1
as
n → ∞.
By (3.17), this implies that
K
1 )
√
Bi −→d N (0, σ02 ).
n i=1
Theorem 16.3.2 now follows from (3.14), (3.15), (3.16), and (3.20).
(3.20)
!
For unbounded strongly mixing random variables, a CLT can be proved
under moment and mixing conditions similar to Proposition 16.3.1 (ii).
The key idea here is to use Theorem 16.3.2 for the sum of a truncated part
of the unbounded random variables and then show that the sum over the
remaining part is negligible in the limit.
Theorem 16.3.5: Let {Xn }n≥1 be a sequence of random variables
(not necessarily stationary) such that for some δ ∈ (0, ∞), ζδ ≡
!
"1/2+δ
< ∞ and
supn≥1 E|Xn |2+δ
∞
)
n=1
α(n)δ/2+δ < ∞.
(3.21)
526
16. Limit Theorems for Dependent Processes
Suppose that there exist M0 ∈ (0, ∞) and a function τ (·) : (M0 , ∞) →
(0, ∞) such that for all M > M0 ,
J
J
- j+n−1
.
)
J −1
J
J
γn (M ) ≡ sup Jn Var
Xi I(|Xi | < M ) −τ (M )JJ → 0 as n → ∞.
j≥1
i=j
If τ (M ) → σ02 as M → ∞ and σ02 ∈ (0, ∞), then (3.7) holds.
(3.22)
Proof of Theorem 16.3.5: W.l.o.g., let EXi = 0 for all i ∈ N. Let
M ∈ (1, ∞). For i ≥ 1, define
Ỹi,M = Xi I(|Xi | ≤ M )
Then,
and Z̃i,M = Xi I(|Xi | > M ),
Yi,M = Ỹi,M − E Ỹi,M , Zi,M = Z̃i,M − E Z̃i,M .
√
n
n(X̄n − µ)
n
1 )
1 )
√
Yi,M + √
Zi,M ,
n i=1
n i=1
=
≡ Sn,M + Tn,M ,
say.
(3.23)
Note that, with a = <M = and Jn = {1, . . . , n}, by Cauchy-Schwarz
inequality and Corollary 16.2.4 (i),
δ/4
2
ETn,M
= n−1
)
|i−j|≤a,i,j∈Jn
:
J
J
JEZi,M Zj,M J + n−1
2
≤ (2a + 1) sup EZi,M
|i−j|>a,i,j∈Jn
J
J
JEZi,M Zj,M J
∞
)
4 + 2δ
α(k)δ/2+δ ζδ2
:i≥1 +2
δ
≤ 4(2a + 1)M −δ ζδ2+δ +
Hence,
)
;
8 + 4δ 2
ζδ
δ
k=a
∞
)
α(k)δ/2+δ .
k=a
2
lim sup ETn,M
= 0.
M ↑∞ n≥1
(3.24)
Since τ (M ) → σ02 ∈ (0, ∞), there exists M1 ∈ (M0 , ∞) such that for all
M > M1 , τ (M ) ∈ (0, ∞). Hence, by Theorem 16.3.2 and (3.22), for all
M > M1 ,
!
"
(3.25)
Sn,M −→d N 0, τ (M ) as n → ∞.
Next note that for any # ∈ (0, ∞) and x ∈ R,
'√
(
P
n X̄n ≤ x
'
(
'J
(
J
J
J
≤ P Sn,M + Tn,M ≤ x, JTn,M J < # + P JTn,M J > #
'
(
'J
(
J
≤ P Sn,M ≤ x + # + P JTn,M J > #
(3.26)
16.3 Central limit theorems for mixing sequences
and similarly,
(
(
'
(
'J
'√
J
n X̄n ≤ x ≥ P Sn,M ≤ x − # − P JTn,M J > # .
P
527
(3.27)
Hence, for any # > 0, M > M1 , and x ∈ R, letting n → ∞ in (3.26) and
(3.27), by (3.25) one gets
' x−# (
(
'J
J
JTn,M J > #
−
lim
sup
Φ
P
τ (M )1/2
n→∞
'√
(
'√
(
n X̄n ≤ x ≤ lim sup P
n X̄n ≤ x
≤ lim inf P
n→∞
n→∞
' x+# (
(
'J
J
JTn,M J > # .
≤ Φ
P
+
lim
sup
τ (M )1/2
n→∞
Since τ (M ) → σ02 as M → ∞, letting M ↑ ∞ first and then # ↓ 0, and
using (3.24), one gets
'√
(
lim P
n X̄n ≤ x = Φ(x/σ0 ) for all x ∈ R.
n→∞
This completes the proof of Theorem 16.3.5.
!
A direct consequence of Theorem 16.3.5 is the following result:
random variCorollary 16.3.6: Let {Xn }n≥1 be a sequence of
$stationary
∞
ables with EX1 = µ ∈ R, E|X1 |2+δ < ∞ and n=1 α(n)δ/2+δ < ∞ for
2
of (3.2) is positive, then with µ = EX1 ,
some δ ∈ (0, ∞). If σ∞
"
√ !
2
n X̄n − µ −→d N (0, σ∞
).
Proof of Corollary 16.3.6: W.l.o.g., let µ = 0. Clearly, under the stationarity of {Xn }n≥1 , ζδ2+δ = E|X1 |2+δ < ∞. Let a = <M δ/4 =, Yi,M , and
Ỹi,M
$∞be as in the proof of Theorem 16.3.5. Also, let τ (M ) = Var(Yi,M ) +
2 k=1 Cov(Y1,M , Y1+k,M ), M ≥ 1. Since EX12 I(|X1 | > M ) ≤ M −δ ζδ , one
gets
J
J
2 J
Jτ (M ) − σ∞
≤
a−1
)J
J
J
J
2
JEY1,M
JEY1,M Y1+k,M − EX1 X1+k J
− EX12 J + 2
k=1
∞ 3
)
J
J J
J4
JEY1,M Y1+k,M J + JEX1 X1+k J
+2
k=a
≤
2
a−1
) 83
k=0
+
41/2
"2 41/2 3
!
2
E Y1,M − X1
EY1+k,M
3!
J2 "1/2 49
"1/2 ! J
E JY1+k,M − X1+k J
EX12
528
16. Limit Theorems for Dependent Processes
+4
∞ 3
)
4 + 2δ
k=a
≤
δ
α(k)δ/2+δ ζδ2
4
1 ∞
2
3
!
"41/2
4 + 2δ 2 )
4a ζδ EX12 I |X1 | > M
ζδ
+4
α(k)δ/2+δ
δ
k=a
→0
as
M → ∞.
(3.28)
Further, by Corollary 16.3.1 (ii), the stationarity of {Xn }n≥1 , and the arguments in the derivation of (3.2), (3.22) follows. Corollary 16.3.6 now follows
from Theorem 16.3.5.
!
In the case that the sequence {Xn }n≥1 is stationary, further refinements
of Corollaries 16.3.3 and 16.3.6 are possible. The following result, due to
Doukhan, Massart and Rio (1994) uses a different method of proof to
establish the CLT under stationarity. To describe it, for any nonincreasing
sequence of positive real numbers {an }n≥1 , define the function a(t) and its
inverse a−1 (t), respectively, by
a(t) =
∞
)
n=1
and
an I(n − 1 < t ≤ n), t > 0,
a−1 (u) = inf{t : a(t) ≤ u}, u ∈ (0, ∞).
Theorem 16.3.7: Let {Xn }n≥1 be a sequence of stationary random variables with EX1 = µ ∈ R, EX12 < ∞ and strong mixing coefficient α(·).
Suppose that
% 1
α−1 (u)QX1 (u)2 du < ∞,
(3.29)
0
where QX1 (u) = inf{t : P (|X1 | > t) ≤ u}, 0 < u < 1 is as in (2.13). Then,
2
2
< ∞. If σ∞
> 0, then
0 ≤ σ∞
√
2
n(X̄n − µ) −→d N (0, σ∞
).
Proof: See Doukhan et al. (1994).
!
Note that if P (|X1 | ≤ c) = 1 for some c ∈ (0, ∞), then QX1 (u) ≤ c for all
&1
$∞
u ∈ (0, 1) and (3.29) holds iff 0 α−1 (u)du < ∞ iff n=1 α(n) < ∞. Thus,
Theorem 16.3.7 yields Corollary 16.3.3
$∞ as a special case. Similarly, it can be
shown that if E|X1 |2+δ < ∞ and n=1 α(n)δ/2+δ < ∞, then (3.29) holds
and, therefore, Theorem 16.3.7 also yields !Corollary 16.3.6
as a special case.
"
For another example, suppose α(n) = O exp(−cn) as n → ∞ for some
c ∈ (0, ∞). Then, (3.29) holds if
!
"
EX12 log 1 + |X1 | < ∞.
(3.30)
16.4 Problems
529
In this case, the logarithmic factor in (3.30) cannot be dropped. More
precisely, it is known that finiteness of the second moment alone (with arbitrarily fast rate of decay of the strong mixing coefficient) is not enough to
guarantee the CLT (Herrndorf (1983)) for strong mixing random variables.
For the ρ-mixing random variables, the following result is known. A process {Xn }n≥1 is called second order stationary if
EXn2 < ∞, EXn = EX1 and EXn Xn+k = EX1 X1+k
(3.31)
for all n ≥ 1, k ≥ 0.
Theorem 16.3.8: Let {Xn }n≥1 be a sequence of ρ-mixing, second order
stationary random variables with EX12 < ∞ and EX1 = µ. Suppose that
∞
)
n=1
2
≡ Var(X1 ) + 2
Then, σ∞
2
If σ∞
ρ(2n ) < ∞.
(3.32)
$∞
Cov(X1 , X1+k ) converges and
!√
"
2
Var n X̄n → σ∞
.
!
"
√
2
∈ (0, ∞), then n X̄n − µ −→d N (0, σ∞
).
k=1
Proof: See Peligrad (1982).
!
Thus, unlike the strong mixing case, in the ρ-mixing case the CLT holds
2
> 0.
under finiteness of the second moment, provided (3.32) holds and σ∞
16.4 Problems
16.1 Deduce (1.16) from (1.15).
16.2 Let {Xn }n≥1 be a sequence of random variables. Show that there
exists a sequence of integers mn ∈ (0, ∞) such that
"
!
(a) P |Xn | > mn → 0 as n → ∞.
!
(b) P |Xn | > mn i.o.) = 0.
16.3 Show that {Ỹnk , Fnk } of (1.18) is a mda.
(Hint: {τn ≥ k} = {τn ≤ k − 1}c ∈ Fn,k−1 .)
16.4 Show that {Ỹnk , Fnk } of (1.18) satisfies (1.2) and (1.3) with τn replaced by mn .
(Hint: Verify that for any Borel measurable g : R → R and # > 0,
τn
mn
J
(
'J )
)
J
J
g(Ynk ) −
g(Ỹnk )J > # ≤ P (τn > mn ).)
P J
k=1
k=1
(4.1)
530
16. Limit Theorems for Dependent Processes
16.5 Let α∗ (G1 , G2 ) denote one of the coefficients α(G1 , G2 ), β(G1 , G2 ), and
ρ(G1 , G2 ).
(a) Show that α∗ (G1 , G2 ) ∈ [0, 1].
(b) Show that α∗ (G1 , G2 ) = 0 iff G1 and G2 are independent.
:
;
(c) Show that α(G1 , G2 ) ≤ min β(G1 , G2 ), ρ(G1 , G2 ) .
:
sup |EX1 X2 |, Xi ∈ L2 (Ω, Gi , P ), EXi =
(d) Show that ρ(G1 , G2 ) =
;
0, EX12 = 1, i = 1, 2 .
16.6 Find examples of σ-algebras G1 and G2 such that
(a) β(G1 , G2 ) < ρ(G1 , G2 );
(b) ρ(G1 , G2 ) < β(G1 , G2 ).
16.7 If Gi , Fi , i = 1, 2 are σ-algebras and (G1 ∨ F1 ) is independent of
(G2 ∨ F2 ), then show that
α(G1 ∨ G2 , F1 ∨ F2 ) ≤ α(G1 , F1 ) + α(G2 , F2 ).
(4.2)
16.8 Let {#i }i∈Z be a sequence of iid random variables with #1 ∼
Uniform(−1, 1). Let Xi = (#i − #i+1 )/2, i ∈ Z. Show that
√
(4.3)
lim Var( n X̄n ) = 0.
n→∞
$n
Also, find {an } ⊂ R such that of an i=1 Xi −→d T for some nondegenerate random variable T . Find the distribution of T .
16.9 Let {Xi }i∈Z be a sequence of iid random variables and {Yi }i∈Z be
a sequence of stationary random variables with strong mixing coefficient α(·) such that {Xi }i∈Z and {Yi }i∈Z are independent. Let
Zi = h(Xi , Yi ), i ∈ Z, where$h : R2 → R is Borel measurable. Find
n
the limit distribution of √1n i=1 (Zi − EZi ) when
(i) h(x, y) = x + y, (x, y) ∈ R2 , EX12 < ∞ and
E|Y1 |2+δ +
∞
)
n=1
α(n)δ/2+δ < ∞
for some
δ ∈ (0, ∞); (4.4)
(ii) h(x, y) = log(x2 + y 2 ), (x, y) ∈ R2 , and E(|X1 |δ + |Y1 |δ ) < ∞
and α(n) = O(n−δ ) for some δ ∈ (0, ∞);
(iii) h(x, y) =
y
1+x2 ,
(x, y) ∈ R, and (4.4) holds.
16.10 Let {Xn }n≥1 be a sequence of stationary random variables with
strong mixing
coefficient
α(·). Let E|X1 |2 log(1 + |X1 |) < ∞ and
!
"
α(n) = O exp(−cn) as n → ∞, for some c ∈ (0, ∞). Show that
(3.29) holds.
16.4 Problems
531
16.11 Let Yi = xi β + #i , i ≥ 1 be a simple linear regression model where
β ∈ R is the regression parameter, xi ∈ R is nonrandom and {#i }i≥1
is a sequence of stationary random variables with strong
$∞ mixing coefficient α(·) such that E#1 = 0, E|#1 |2+δ < ∞ and n=1 α(n)δ/2+δ <
∞, for some δ ∈ (0, ∞). Suppose that for all h ∈ Z+ ,
n−h
)
xi xi+h
i=1
n
S)
i=1
x2i → γ(h)
(4.5)
$n
for some γ(h) ∈ R and n−1 i=1 |xi |2+δ = O(1) as n → ∞. Let
R $n
$n
2
β̂n =
i=1 xi yi
i=1 xi denote the least squares estimator of β.
Show that
.1/2
-)
n
2
x2i
(β̂n − β) −→d N (0, σ∞
)
i=1
2
σ∞
2
∈ [0, ∞). Find σ∞
. (Note that by definition, Z ∼ N (0, 0)
for some
if P (Z = 0) = 1.)
16.12 Let Yi = xi β $ + #i , i ≥ 1, where xi , β ∈ Rp , (p ∈ N) and {#i }i≥1
satisfies the conditions of Problem 16.11. Suppose that for all h ∈ Z+ ,
n−1
n−h
)
i=1
x$i xi+h → Γ(h)
for some p × p matrix Γ(h), with Γ(h) nonsingular, and that
max{7xi 7 : 1 ≤ i ≤ n} = O(1) and n → ∞. Find the limit distribu(−1 $
'$
√
n
n
$
x
x
tion of n(β̂n − β), where β̂n =
i=1 i i
i=1 xi Yi , n ≥ 1.
16.13 Let {Xi }i∈Z be a first order stationary autoregressive process
Xi = ρXi−1 + #i , i ∈ Z,
(4.6)
where |ρ| < 1 and {#i }i∈Z is a sequence of iid random variables with
E#1 = 0, E#21 < ∞.
$∞
(a) Show that Xi = j=0 ρj #i−j , i ∈ Z.
(b) Find the limit distribution of
:√
;
n(ρ̂n − ρ) n≥1 ,
where ρ̂n =
$n−1
i=1
Xi Xi+1
R $n
(Hint: Use Theorem 16.1.1.)
i=1
Xi2 , n ≥ 1.
17
The Bootstrap
17.1 The bootstrap method for independent
variables
17.1.1
A description of the bootstrap method
The bootstrap method, introduced in statistics by Efron (1979), is a powerful tool for solving many statistical inference problems. Let X1 , . . . , Xn
be iid random variables with common cdf F . Let θ = θ(F ) be a parameter and θ̂n be an estimator of θ based on observations X1 , . . . , Xn , i.e.,
θ̂n = tn (X1 , . . . , Xn ) for some measurable function tn (·) of the random
variables X1 , . . . , Xn . The parameter θ is called a ‘level 1’ parameter and
the parameters related to the distribution of θ̂n are called ‘level 2’ parameters (cf. Lahiri (2003)). For example, Var(θ̂n ) or a median of (the distribution of) θ̂n are ‘level 2’ parameters. The bootstrap is a general method
for estimating such ‘level 2’ parameters.
To describe the bootstrap method, let
Rn = rn (X1 , . . . , Xn ; F )
(1.1a)
be a random variable that is a known function of X1 , . . . , Xn and F . An
example of Rn is given by
√
(1.1b)
R̃n = n(X̄n − µ)/σ,
$
n
the normalized sample mean, where X̄n = n−1 i=1 Xi is the sample mean,
2
µ = EX1 and σ = Var(X1 ). The objective here is to approximate (esti-
534
17. The Bootstrap
mate) the unknown distribution of Rn and its functionals. Let F̂n be an
estimator of F based on X1 , . . . , Xn . For example, one may take F̂n to be
the empirical distribution function (edf) of X1 , . . . , Xn , given by
Fn (x) = n−1
n
)
i=1
I(Xi ≤ x), x ∈ R.
(1.1c)
Next, given X1 , . . . , Xn , generate conditionally iid random variables
∗
with common distribution F̂n , where m denotes the bootstrap
X1∗ , . . . , Xm
∗
of Rn by replacing the Xi ’s
sample size. Define the bootstrap version Rm,n
∗
with Xi ’s, F with F̂n , and n with m in the definition of Rn . Then,
!
"
∗
∗
= rm X1∗ , . . . , Xm
; F̂n .
Rm,n
(1.2)
Let Gn denote the distribution of Rn . The bootstrap estimator of Gn is
∗
given by the conditional distribution Ĝm,n (say) of Rm,n
given X1 , . . . , Xn .
Further, the bootstrap estimator of a functional φn = φ(Gn ) is given by
φ̂m,n ≡ φ(Ĝm,n ).
For example, the bootstrap estimator of the kth moment E(Rn )k of Rn is
given by E∗ (Rn∗ )k , k ∈ N, where E∗ denotes the conditional expectation
from the above ‘plug-in’
method applied
given {Xn , n ≥ 1}. This follows
&
&
to the functional φ(G) ≡ xk dG(x), as E(Rn )k = xk dGn (x) = φ(Gn )
&
∗
)k = xk dĜm,n (x) = φ(Ĝm,n ). Similarly, the bootstrap esand E∗ (Rm,n
timator of the α-quantile (0 < α < 1) of Rn is given by the α-quantile
∗
. In general, closed form expresof (the conditional distribution of) Rm,n
sions for bootstrap estimators are not available. One may use Monte Carlo
simulation to evaluate them numerically. See Hall (1992) for more details.
Note that Ĝm,n and hence, the bootstrap estimators φ̂m,n = φ(Ĝm,n )
depend on the choice of the initial estimator F̂n of F . The most common
choice of F̂n is given by Fn of (1.1), in which case the bootstrap variables
∗
can be equivalently generated by simple random sampling with
X1∗ , . . . , Xm
replacement from {X1 , . . . , Xn }. The following example illustrates the construction of the bootstrap version for Rn = R̃n as in (1.1b).
√
Example 17.1.1: For the normalized sample mean R̃n = n(X̄n − µ)/σ,
its bootstrap version based on a resample of size m from F̂n is given by
"
√ ! ∗
∗
R̃m,n
= n X̄m
− µ̂n /σ̂n ,
(1.3)
$m
∗
∗
where X̄m
= X̄m,n
= m−1 i=1 Xi∗ is the bootstrap sample mean,
&
µ̂n = xdF̂n (x) is the (conditional) mean of X1∗ under F̂n and σ̂n2 is the
conditional$variance of X1∗ under F̂n . For
$nF̂n = Fn , it is easy to check that
n
µ̂n = n−1 i=1 Xi = X̄n and σ̂ 2 = n1 i=1 (Xi − X̄n )2 ≡ s2n , the sample
17.1 The bootstrap method for independent variables
535
variance of X1 , . . . , Xn . Thus, the ‘ordinary’ bootstrap version of R̃n (with
F̂n = Fn ) is given by
√ ! ∗
∗
= n X̄m
− X̄n )/sn .
(1.4)
R̃m,n
Some questions that arise naturally in this context are: Does the bootstrap
∗
give a valid approximation to the distribution of R̃n ?
distribution of R̃m,n
How good is the approximation generated by the bootstrap? How does the
quality of approximation depend on the resample size or on the underlying
cdf F ? Some of these issues are addressed next. Write P∗ to denote the
conditional probability given X1 , X2 , . . . and supx to denote supremum
over x ∈ R.
17.1.2
Validity of the bootstrap: Sample mean
Theorem 17.1.1: Let {Xn }n≥1 be a sequence of iid random variables with
√
EX1 = µ and Var(X1 ) = σ 2 ∈ (0, ∞). Let R̃n = n(X̄n − µ)/σ and let
∗
R̃n,n
be its ‘ordinary’ bootstrap version given by (1.4) with m = n. Then
J
J
∗
≤ x)J → 0 as n → ∞, a.s. (1.5)
∆n ≡ sup JP (R̃n ≤ x) − P∗ (R̃n,n
x
Proof: W.l.o.g., suppose that µ = 0 and σ = 1. By the CLT, R̃n ≡
√
nX̄n −→d N (0, 1). Hence, it is enough to show that as n → ∞,
J
J !√
"
∆1n ≡ sup JP∗ n(X̄n∗ − X̄n ) ≤ sn x − Φ(x)J = o(1) a.s.,
x
where Φ(·) denotes the cdf of the N (0, 1) distribution. By the Berry-Esseen
theorem,
∆1n ≤ (2.75)
E∗ |X1∗ − X̄n |3
23 · (2.75) E∗ |X1∗ |3
√ 3
√
≤
.
s3n
nsn
n
By the SLLN, s2n → σ 2 ∈ (0, ∞) and by the Marcinkiewz-Zygmund SLLN
n
1 )
1
√ E∗ |X1∗ |3 = 3/2
|Xi |3 → 0
n
n
i=1
as
n→∞
a.s.
Hence, ∆1n = o(1) as n → ∞ a.s. and the theorem is proved.
!
One can also give a more direct proof of Theorem 17.1.1 using the
Lindeberg-Feller CLT (Problem 17.1).
Theorem 17.1.1 shows that the bootstrap approximation to the distribution of the normalized sample mean is valid under the same conditions that
guarantee the CLT. Under additional moment conditions, a more precise
536
17. The Bootstrap
3
bound
$non the order of ∆n can be given. If E|X1 | < ∞, then by the SLLN,
n−1 j=1 |Xj |3 → E|X1 |3 as n → ∞, a.s. and hence, the last step in the
proof of Theorem 17.1.1, it follows that ∆n = O(n−1/2 ) a.s. Thus, in this
case, the rate of bootstrap approximation to the distribution of R̃n is as
good as that of the normal approximation to the distribution of R̃n . An
important result of Singh (1981) shows that the error rate of bootstrap approximation is indeed smaller, provided the distribution of X1 is nonlattice
(cf. Chapter 11), i.e.,
|E exp(ιtX1 )| )= 1
for all t ∈ R \ {0}.
A precise statement of this result is given in Theorem 17.1.2 below.
17.1.3
Second order correctness of the bootstrap
Theorem 17.1.2: Let {Xn }n≥1 be a sequence of iid nonlattice random
∗
variables with E|X1 |3 < ∞. Also, let R̃n , R̃n,n
, and ∆n be as in Theorem
17.1.1. Then,
∆n = o(n−1/2 ) as n → ∞, a.s.
(1.6)
Proof: Only an outline of the proof will be given here. By Theorem 11.4.4
on Edgeworth expansions, it follows that
J
8
9J
µ3
J
J
sup JP (R̃n ≤ x) − Φ(x) − √ 3 (x2 − 1)φ(x) J = o(n−1/2 ) as n → ∞,
6
nσ
x∈R
(1.7)
where µ3 = E(X1 − µ)3 and φ(x) = (2π)−1/2 exp(−x2 /2), x ∈ R. It can
∗
be shown that the conditional distribution of R̃n,n
given X1 , . . . , Xn , also
admits a similar expansion that is valid almost surely:
J
8
9J
µ̂3,n
J
J
sup JP∗ (R̃n ≤ x) − Φ(x) − √ 3 (x2 − 1)φ(x) J
6
nσ̂
x∈R
n
= o(n−1/2 )
as
n → ∞,
a.s.,
(1.8)
where µ̂3,n = E∗ (X1∗ − X̄n )3 . By the SLLN, as n → ∞, µ̂3,n → µ3 a.s. and
σ̂n2 → σ 2 a.s. Hence,
J µ̂
4 1
µ3 JJ 3
J 3,n
∆n ≤ J 3 − 3 J sup |x2 − 1|φ(x) √ + o(n−1/2 ) a.s.
σ̂n
σ
6 n
x∈R
= o(n−1/2 )
a.s.
!
Under the conditions of Theorem 17.1.2, the bootstrap approximation
to the distribution of R̃n outperforms the classical normal approximation, since the latter has an error of order O(n−1/2 ). In the literature,
17.1 The bootstrap method for independent variables
537
this property is referred to as the second order correctness (s.o.c.) of the
bootstrap.
that under additional conditions, the order of
! It√can be shown
"
∆n is O n−1 log log n a.s., and therefore the bootstrap gives an accurate
approximation even for small sample sizes where the asymptotic normal
approximation is inadequate. For iid random variables with a smooth cdf
F , s.o.c. of the bootstrap has been established in more complex
problems,
√
such as in the case of the studentized sample mean Tn ≡ n(X̄n − µ)/sn .
For a detailed description of the second and higher order properties of the
bootstrap for independent random variables, see Hall (1992).
17.1.4
Bootstrap for lattice distributions
Next consider the case where the underlying cdf F does not satisfy the
nonlatticeness condition of Theorem 17.1.2. Then,
J
J
JE exp(ιtX1 )J = 1 for some t )= 0
and it can be shown (cf. Chapter 10) that X1 takes all its values in a lattice
of the form {a + jh : j ∈ Z}, a ∈ R, h > 0. The smallest h > 0 satisfying
"
!
P X1 ∈ {a + jh : j ∈ Z} = 1
is called the (maximal) span of F . For the normalized sample mean of lattice random variables, s.o.c. of the bootstrap fails under standard metrics,
such as the sup-norm metric and the Levy metric. Recall that for any two
distribution functions F and G on R,
dL (F, G) = inf{# > 0 : F (x − #) − # < G(x) < F (x + #) + # for all x ∈ R}
(1.9)
defines the Levy metric on the set of probability distribution on R.
Theorem 17.1.3: Let {Xn }n≥1 be a sequence of iid lattice random vari∗
ables with span h ∈ (0, ∞) and let R̃n , R̃n,n
, and ∆n be as in Theorem
4
17.1.1. If E(X1 ) < ∞, then
√
h
lim sup n ∆n = √
n→∞
2πσ 2
and
√
lim sup n dL (Gn , Ĝn ) =
n→∞
a.s.
h
L
σ(1 + 2π)
(1.10)
a.s.
∗
≤ x), x ∈ R.
where Gn (x) = P (R̃n ≤ x) and Ĝn (x) = P∗ (R̃n,n
(1.11)
Thus, the above theorem shows that for lattice random variables, the
bootstrap approximation to the distribution of Rn may not be better than
the normal approximation in the supremum and the Levy metric. In Theorem 17.1.3, relation (1.10) is due to Singh (1981) and (1.11) is due to Lahiri
(1994).
538
17. The Bootstrap
Proof: Here again, an outline of the proof is given. By Theorem 11.4.5,
J
J
sup JP (Rn ≤ x) − ξn (x)J = o(n−1/2 )
(1.12)
x
√
h
Q([ nxσ − nx0 ]/h)φ(x),
where ξn (x) = Φ(x) + 6σµ3 3√n (1 − x2 )φ(x) + σ√
n
x ∈ R, Q(x) = 12 − x + <x=, and x0 ∈ R is such that P (X1 = x0 + µ) > 0.
Also, by Theorem 1 of Singh (1981),
J
J
∗
≤ x) − ξˆn (x)J = o(n−1/2 ) as n → ∞, a.s.,
(1.13)
sup JP∗ (Rn,n
x
where ξˆn (x) = Φ(x) +
E∗ (X12
3
µ̂3,n
√ (1
6s3n n
− x2 )φ(x) +
√
h
√
Q( n xsn /h),
σ n
x ∈ R and
µ̂3,n =
− X̄n ) . Hence, by (1.12) and (1.13),
√
J √
J
√
σ n
J
J
· ∆n = sup JQ([ n xσ − nx0 ]/h) − Q( n xsn /h)Jφ(x) + o(1) a.s.
h
x∈R
Since Q(x) ∈ [− 12 , 12 ] for all x and φ(0) =
√1 ,
2π
√
1
σ n
∆n ≤ √
lim sup
h
n→∞
2π
it is clear that
a.s.
(1.14)
To get the lower bound, suppose that for almost all realizations of
X1 , X2 , . . . , and for any given #, δ ∈ (0, 1), there is a sequence {xn }n≥1 ∈
(0, #) such that
√
√
n xn sn /h ∈ Z and 1[ n xn σ − nx0 ]/h2 ∈ (1 − δ, 1) i.o.,
(1.15)
where 1y2 denotes the fractional part of y, i.e., 1y2 = y − <y=, y ∈ R. Then,
for each n satisfying (1.15),
√
J √
1 JJ
σ n
J
∆n ≥ JQ([ n xn σ − nx0 ]/h) − Jφ(xn ) + o(1)
h
2
J
1 JJ
J
(1.16)
≥
inf JQ(y) − J · inf φ(x) + o(1).
2 x∈(0,$)
y∈(1−δ,1)
Since limy→1− Q(y) = − 12 , (1.14) and (1.16) together yield (1.10). The
existence of the sequence {xn }n≥1 satisfying (1.15) can be established by
using the LIL. For an outline of the main arguments, see Problem 17.4.
Next consider (1.11). Let cn = 7Gn − Ĝn 7∞ +n−1/2 . Since dL (Gn , Ĝn ) ≤
7Gn − Ĝn 7, dL (Gn , Ĝn ) = inf{0 < # < cn : Gn (x − #) − # < Ĝn (x) <
Ĝn (x + #) + # for all x ∈ R}. Using (1.12) and (1.13), it can be shown that
for 0 < # < cn ,
Ĝn (x) < Gn (x + #) + # for all x ∈ R
"
!
√
√
(1.17)
⇔ Q(x) − Q(xτn + ( n #σ − nx0 )/h) φ(hx/sn n)
√
√
< h−1 #σ n(1 + φ(hx/sn n)) + o(1) for all x ∈ R,
17.1 The bootstrap method for independent variables
539
where τn = σ/sn and the o(1) term is uniform over x ∈ R. Similarly, for
0 < # < cn ,
Gn (x − #) − # < Ĝn (x) for all x ∈ R
"
!
√
√
(1.18)
⇔ Q(x) − Q(xτn − ( n #σ + nx0 )/h) φ(hx/sn n)
!
"
√
√
> h−1 #σ n 1 + φ(hx/sn n) + o(1) for all x ∈ R.
:
It is easy to check that (1.17) and (1.18) are satisfied if # = h (1 +
√
√ ;−1
2π)σ n
+ o(n−1/2 ). Hence,
√ "−1
!
√
a.s.
lim sup n dL (Gn , Ĝn ) ≤ h σ(1 + 2π)
n→∞
To prove the reverse inequality, note that if there exists a η > 0 such
that, with #n = dL (Gn , Gn ) + n−1 ,
√
!
"
√
√
Q(x̃) − Q(τn x̃ + ( n #n σ − nx0 )/h) φ(hx̃/sn n) ≥ (1 − η) 2π
for some x̃#R, then
√
√ "
√ !
h−1 #n σ n 1 + φ(hx̃/sn n) > (1 − η)/ 2π + o(1)
√
"
√ !
⇒ h−1 σ#n n 1 + φ(0) > (1 − η) 2π + o(1)
√
√
⇒ #n > (1 − η)h(1 + 2π)−1 (σ n)−1 + o(n−1/2 )
√
√
⇒ dL (Gn , Ĝn ) > (1 − η)h(1 + 2π)−1 (σ n)−1 + o(n−1/2 ).
Thus, it is enough to show that for almost all sample sequences, there exists
a sequence {xn }n≥1 ∈ R such that with #n = d(Ĝn , Gn ) + n−1 ,
!
"
√
√
lim Q(xn ) − Q(τn xn + ( n #n σ − nx0 )/h) φ(xn h/sn n)
n→∞
1
=√ .
(1.19)
2π
√
Since limn→∞ n(τn − 1) = ∞ a.s., for almost all sample sequences, there
√
exists a subsequence {nm } such that lim supm→∞ nm (τ
nm − 1) = ∞, "and
!√
n #n σ − nx0 /h.
τnm > 1 for all m. Let an denote the fractional part of
Then 0 ≤ an < 1. Define a sequence of integers {xn }n≥1 by
xn = <(1 − an )/(τn − 1)= − 1,
Then, it is easy to see that
and
n ≥ 1.
(1 − anm ) − 2(τnm − 1) ≤ xnm (τnm − 1) ≤ (1 − anm ) − (τnm − 1),
!
"
lim Q(xnm τnm + anm ) = lim Q xnm (τnm − 1) − 1 − anm ) = −1/2.
m→∞
m→∞
Now, one can use (1.20) to prove (1.19). This completes the proof.
(1.20)
!
540
17.1.5
17. The Bootstrap
Bootstrap for heavy tailed random variables
Next consider the case where the Xi ’s are heavy tailed, i.e., the Xi ’s have
infinite variance. More specifically, let {Xn }n≥1 be iid with common cdf F
such that as x → ∞,
1 − F (x) ∼ x−2 L(x)
F (−x) ∼ cx−2 L(x)
(1.21)
for some α ∈ (1, 2), c ∈ (0, ∞) and for some function L(·) : (0, ∞) → (0, ∞)
that is slowly varying at ∞, i.e., L(·) is bounded on every bounded interval
of (0, ∞) and
lim L(ty)/L(y) = 1
y→∞
for all t ∈ (0, ∞).
(1.22)
Since α ∈ (1, 2), E|X1 | < ∞, but EX12 = ∞. Thus, the Xi ’s have infinite
variance but finite mean. In this case, it is known (cf. Chapter 11) that for
some sequence {an }n≥1 of normalizing constants
Tn ≡
n(X̄n − µ)
−→d Wα ,
an
(1.23)
$n
where X̄n = n−1 i=1 Xi , µ = EX1 and Wα has a stable distribution of
order α. The characteristic function of Wα is given by
'%
(
h(t, x)dλα (x) , t ∈ R,
(1.24)
φα (t) = exp
! ιtx
"
where
h(t,
x)
=
e
−
1
−
ιt
, x, t ∈ R and where λα (·) is a measure on
!
"
R, B(R) satisfying
!
"
!
"
λα [x, ∞) = x−α and λα (−∞, −x] = cx−α , x > 0.
(1.25)
The normalizing constants {an }n≥1 ⊂ (0, ∞) in (1.23) are determined by
the relation
nP (X1 > an ) ≡ na−2
n L(an ) → 1
as
n → ∞.
(1.26)
∗
To apply the bootstrap, let X1∗ , . . . , Xm
denote a random sample of size
m, drawn with replacement form {X1 , . . . , Xn }. Then, the bootstrap version of Tn is given by
∗
∗
Tm,n
= m(X̄m
− X̄n )/am
(1.27)
$m
∗
where X̄m
= m−1 i=1 Xi∗ is the bootstrap sample mean. The main result
∗
of this section shows that L(Tm,n
| X1 , . . . , Xn ), the conditional distribu∗
tion of Tm,n given X1 , . . . , Xn has a random limit for the usual choice of the
17.1 The bootstrap method for independent variables
541
resample size, m = n and, hence, is an “inconsistent estimator” of L(Tn )
the distribution of Tn . In contrast, consistency of the bootstrap approximation holds if m = o(n), i.e., the resample size is of smaller order than the
sample size. To describe the random limit in the m = n case, the notion of
a random Poisson measure is needed, which is introduced in the following
definition.
!
"
Definition 17.1.1: M (·) is called a random Poisson measure on R, B(R)
with mean measure λα (·) of (1.25), if {M (A) : A ∈ B(R)} is a collection
of random variables defined on some probability
space
!
" (Ω, F, P ) such that
for each w ∈ Ω, M (·)(w) is a measure on R, B(R) , and for any disjoint
finite collection of sets A1 , . . . , Am ∈ B(R), m ∈ N, {M (A1 ), . . . ,!M (Am )}
"
are independent Poisson random variables and M (Ai ) ∼ Poisson λα (Ai ) ,
i = 1, . . . , m.
Theorem 17.1.4: Let {Xn }n≥1 be a sequence of iid random variables
∗
be as in (1.23) and
satisfying (1.21) for some α ∈ (1, 2) and let Tn , Tm,n
(1.27), respectively.
(i) If m = o(n) as n → ∞, then
J
J
∗
sup JP∗ (Tm,n
≤ x) − P (Tn ≤ x)J −→p 0
as
x
n → ∞.
(ii) Suppose that m = n for all n ≥ 1. Then, for any t ∈ R,
'%
(
! ∗ "
E∗ exp ιtTn,n
−→d exp
h(t, x)dM (x) ,
(1.28)
(1.29)
where M (·) is the random Poisson measure defined above and
h(t, x) = (eιtx − 1 − ιt).
Remark 17.1.1: Part (ii) of Theorem 17.1.4 shows that the bootstrap
fails to provide a valid approximation to the distribution of the normalized
sample mean of heavy tailed random variables when m = n. Failure of the
bootstrap in the heavy tail case was first proved by Athreya (1987a),
who
!
∗
established weak convergence of the finite dimensional vectors P∗ (Tn,n
≤
"
∗
x1 ), . . . , P∗ (T̃n,n ≤ xm ) based on a slightly different bootstrap version
∗
∗
T̃n,n
of Tn , where T̃n,n
= n(X̄ ∗ − X̄n )/X(n) and X(n) = max{X1 , . . . , Xn }.
Extensions of these results are given in Athreya (1987b) and Arcones and
Giné (1989, 1991). A necessary and sufficient condition for the validity of
the bootstrap for the sample mean with resample size m = n is that X1
belongs to the domain of attraction of the normal distribution (Giné and
Zinn (1989)).
Proof of Theorem 17.1.4: W.l.o.g., let µ = 0. Let
"
!
φ̂m,n (t) = E∗ exp ιt(X1∗ − X̄n )/am
542
17. The Bootstrap
= n−1
n
)
j=1
"
!
exp ιt[X1 − X̄n ]/am , t ∈ R.
Then,
! ∗ "
E∗ exp ιtTm,n
=
=
!
"m
φ̂m,n (t)
8
;9 m
1:
m(1 − φ̂m,n (t))
1−
.
m
(1.30)
First consider part (ii). Note that for any sequence of complex numbers
{zn }n≥1 satisfying, zn → z ∈ C,
'
zn (n S zn
e →1
1+
n
as
n → ∞.
Hence, by (1.30) (with m = n) and (1.31), (ii) would follow if
%
"
!
n φ̂n,n (t) − 1 −→d h(x, t)Mα (dx).
(1.31)
(1.32)
Since E∗ (X1∗ − X̄n ) = 0, it follows that
!
"
n φ̂n,n (t) − 1
8
"
!
"9
!
= n E∗ exp ιt[X1∗ − X̄n ]/an − 1 − ιtE∗ [X1∗ − X̄n ]/an
%
=
h(x, t)dMn (x),
(1.33)
!
"
$n
where Mn (A) = j=1 I [Xj − X̄n ]/an ∈ A , A ∈ B(R). Note that for a
given t ∈ R, the function h(·, t) is continuous on R, |h(x, t)| = O(x2 ) as
x → 0 and |h(x, t)| = O(|x|) as |x| → ∞. Hence, to prove (1.32), it is
enough to show that for any # > 0,
'%
(
x2 dMn (x) > # = 0,
(1.34)
lim lim sup P
η↓0
n→∞
|x|≤η
'%
(
lim lim sup P
|x|dMn (x) > # = 0,
(1.35)
η↓0
n→∞
η|x|>1
and for any disjoint intervals I1 , . . . , Ik whose closures I¯1 , . . . , I¯k are contained in R \ {0},
:
;
;
:
(1.36)
Mn (I1 ), . . . , Mn (Ik ) −→d Mα (I1 ), . . . , Mα (Ik ) .
Let An = {|X̄n | ≤ n−3/4 an }. Since nX̄n /an −→d Wα ,
P (Acn ) → 0
as
n → ∞.
(1.37)
17.1 The bootstrap method for independent variables
543
Consider (1.34) and fix η ∈ (0, ∞). Then, on the set An , for n1/2 > η,
%
x2 dMn (x)
=
|x|≤η
≤
≤
a−2
n
n
)
(Xj − X̄n )2 I(|Xj − X̄n | ≤ an η)
j=1
n
)
!
"
2a−2
Xj2 I |Xj | ≤ (|X̄n | + an η) +
n
j=1
n
)
2a−2
Xj2 I(|Xj | ≤ 2ηan ) + 2n−1/2 .
n
j=1
2
2a−2
n n · X̄n
Hence,
'%
(
x2 dMn (x) > #
n→∞
|x|≤η
8 '3 %
(
4
9
≤ lim sup P
x2 dMn (x) > # ∩ An + P (Acn )
lim sup P
n→∞
|x|≤η
n
'
)
#(
≤ lim sup P 2a−2
Xj2 I(|Xj | ≤ 2ηan ) >
n
2
n→∞
j=1
≤ lim sup
n→∞
4 −2
na EX12 I(|X1 | ≤ 2ηan ).
# n
(1.38)
2
By (1.21) and (1.26), na−2
n EX1 I(|X1 | ≤ 2ηan ) is asymptotically equivalent
−2
2−α
L(2ηan ) for some C1 ∈ (0, ∞) (not depending on η).
to C1 · nan (ηan )
Hence, by (1.21) and (1.22), the right side of (1.38) is bounded above by
4
2−α
, which → 0 as η ↓ 0. Hence, (1.34) follows.
$ · C1 · η
By similar arguments,
'%
(
lim sup P
|x|dMn (x) > #
n→∞
η|x|>1
2 −1
nan E|X1 |I(|X1 | > η −1 an /2)
n→∞ #
≤ const. η α−1 ,
≤ lim sup
which → 0 as η ↓ 0. Hence, (1.35) follows.
$n
Next consider (1.36). Let Mn† (A) ≡
j=1 I(Xj /an ∈ A), A ∈ B(R).
Then, for any a < b, on the set An ,
!
!
"
!
"
"
Mn† [a + #n , b − #n ] ≤ Mn [a, b] ≤ Mn† [a − #n , b + #n ]
for #n = n−3/4 . Further, by (1.21), (1.22), for any x )= 0,
"
!
EMn† [x − #n , x + #n ]
!
"
= nP X1 ∈ an [x − #n , x + #n ] → 0 as n → ∞.
544
17. The Bootstrap
Hence, it is enough to establish (1.36) with Mn (·) replaced by Mn† (·). Since
"
! †
$k
Mn (I1 ), . . . , Mn† (Ik ), n − j=1 Mn† (Ij ) has a multinomial distribution
"
!
$k
with parameters n, p1n , . . . , pkn , 1− j=1 pjn with pjn = P (X1 /an ∈ Ij ),
and since, by (1.21), npjn → λα (Ij ) for I¯j ⊂ R \ {0}, for j = 1, 2, . . . , k the
last assertion follows. This completes the proof of part (ii).
Next consider part (i). Note that Wα has an absolutely continuous distribution w.r.t. the Lebesgue measure (as the characteristic function of Wα
is integrable on R). Since Tn −→d Wα , it is enough to show that
J
J
J
J
∗
≤ x) − P (Wα ≤ x)J −→p 0 as n → ∞.
(1.39)
sup JP∗ (Tm,n
x
Using the fact that “a sequence of random variables Yn converges to 0 in
probability if and only if given any subsequence {ni }, there exists a further
subsequence {mi } ⊂ {ni } such that Ymi → 0 a.s.”, one can show that
(1.39) holds iff
∗
) −→p E exp(ιtWα )
E∗ exp(ιtTm,n
as
n→∞
(1.40)
for each t ∈ R (Problem 17.8 (c)).
By (1.24), (1.30), and (1.31), it is enough to show that for each t ∈ R,
%
(1.41)
m(φ̂m,n (t) − 1) −→p h(t, x)dλα (x) as n → ∞.
The left side of (1.41) can be written as
. %
n
Xi − X̄n
m)
h
, t ≡ h(x, t)dM̃n (x), say
n i=1
am
!
"
$n
where M̃n (A) = m
i=1 I [Xi − X̄n ]/am ∈ A , A ∈ B(R). The proof of
n
(1.41) now proceeds along the lines of the proof of (1.32), where −→d in
(1.36) is replaced by −→p . Note that for m = o(n), by (1.21) and (1.22),
an m ' n (1/α−1 L(n)
∼
→0
·
am n
m
L(m)
as
n → ∞.
By arguments similar to the proof of (1.34), on An , for n large,
%
x2 dM̃n (x)
|x|≤η
=
n
m −2 )
am
(Xj − X̄n )2 I(|Xj − X̄n | ≤ am η)
n
j=1
≤ 2
n
m −2 ) 2
m a2n −3/2
am
Xj I(|Xj | ≤ 2am η) + 2
n
.
n
n a2m
j=1
(1.42)
17.2 Inadequacy of resampling single values under dependence
545
Also, the expected value of the first term on the right side above equals
2
2ma−2
m EX1 I(|X1 | ≤ 2am η), which is asymptotically equivalent to const.
η 2−α . Hence, it follows that (1.34) holds with Mn replaced by M̃n . Similarly,
one can establish (1.35) for M̃n . Hence, to prove (1.41), it remains to show
that (cf. (1.36)) for any interval I1 whose closure is contained in R \ {0},
M̃n (I1 ) −→p λα (I1 ).
(1.43)
n 1
By (1.23) and (1.42), X̄n /am = naX̄nn aam
n −→p 0 as n → ∞. Hence, by
arguments as in the proof of (1.36), it is enough to show that
M̃n† (I1 ) −→p λα (I1 ),
(1.44)
$n
where M̃n† (A) = m
i=1 I(Xi /am ∈ A), A ∈ B(R). In view of (1.21) and
n
(1.26), this can be easily verified by showing
'm(
!
"
E M̃n† (I1 ) → λα (I1 ) and Var M̃n† (I1 ) = O
as n → ∞, (1.45)
n
and hence → 0 as n → ∞. Hence, part (i) of Theorem 17.1.4 follows.
!
17.2 Inadequacy of resampling single values under
dependence
The resampling scheme, introduced by Efron (1979) for iid random variables, may not produce a reasonable approximation under dependence. An
example to this effect was given by Singh (1981), which is described next.
Let {Xn }n≥1 be a sequence of stationary m-dependent random variables
for some m ∈ N, with EX1 = µ and EX12 < ∞.
Recall that {Xn }n≥1 is called m-dependent for some integer m ≥ 0
if {X1 , . . . , Xk } and {Xk+m+1,... } are independent for all k ≥ 1. Thus,
any sequence of independent random variables {#n }n≥1 is 0-dependent and
if Xn ≡ #n + 0.5#n+1 , n ≥ 1, then {Xn }n≥1 is 1-dependent. For an mdependent sequence {Xn }n≥1 with finite variances, it can be checked that
2
σ∞
≡ lim nVar(X̄n ) = Var(X1 ) + 2
n→∞
m
)
Cov(X1 , X1+i ),
i=1
$n
2
∈ (0, ∞), then by the CLT for mwhere X̄n = n−1 i=1 Xi . If σ∞
dependent variables (cf. Chapter 16),
√
2
n(X̄n − µ) −→d N (0, σ∞
).
(2.1)
Now consider the bootstrap
approximation to the distribution of the
√
random variable Tn = n(X̄n −µ) under Efron (1979)’s resampling scheme.
546
17. The Bootstrap
For simplicity, assume that the resample size equals the sample size, i.e.,
from (X1 , . . . , Xn ), an equal number of bootstrap variables X1∗ , . . . , Xn∗ are
generated by resampling one a single observation at a time. Then, the
∗
bootstrap version Tn,n
of Tn is given by
∗
Tn,n
=
√
n(X̄n∗ − X̄n ),
$n
∗
still conwhere X̄n∗ = n−1 i=1 Xi∗ . The conditional distribution of Tn,n
verges to a normal distribution, but with a “wrong” variance, as shown
below.
Theorem 17.2.1: Let {Xn }n≥1 be a sequence of stationary m-dependent
random variables with EX1 = µ and σ 2 = Var(X1 ) ∈ (0, ∞), where m ∈
Z+ . Then
J
J
∗
(2.2)
≤ x) − Φ(x/σ)J = o(1) as n → ∞, a.s.
sup JP∗ (Tn,n
x
2
Note that, if m ≥ 1, then σ 2 need not equal σ∞
.
For proving the theorem, the following result will be needed.
Lemma 17.2.2: Let {Xn }n≥1 be a sequence of stationary m-dependent
random variables. Let f : R → R be a Borel measurable function with
E|f (X1 )|p < ∞ for some p ∈ (0, 2), such that Ef (X1 ) = 0 if p ≥ 1. Then,
n−1/p
n
)
i=1
f (Xi ) → 0
as
n → ∞,
a.s.
Proof: This is most easily proved by splitting the given m-dependent sequence {Xn }n≥1 into m + 1 iid subsequences {Yji }i≥1 , j = 1, . . . , m + 1,
defined by Yji = Xj+(i−1)(m+1) , and then applying the standard results
for iid random variables to {Yji }i≥1 ’s (cf. Liu and Singh (1992)). For
1 ≤ j ≤ m + 1, let Ajn = {1 ≤ i ≤ n : j + (i − 1)(m + 1) ≤ n} and
let Njn denote the size of the set Ajn . Note that Njn /n → (m + 1)−1 as
n → ∞ for all 1 ≤ j ≤ m + 1. Then, by the Marcinkiewz-Zygmund SLLN
(cf. Chapter 8) applied to each of the sequences of iid random variables
{Yji }i≥1 , j = 1, . . . , m + 1, one gets
n−1/p
n
)
f (Xi )
i=1
=
m+1
)8
j=1
−1/p
Njn
)
i∈Ajn
9
f (Yji ) · (Njn /n)1/p → 0 as n → ∞, a.s.
This completes the proof of Lemma 17.2.2.
!
17.3 Block bootstrap
547
Proof of Theorem 17.2.1: Note that conditional on X1 , . . . , Xn ,
random variables with E∗ X1∗ = X̄n , Var∗ (X1∗ ) = s2n ,
X1∗ , . . . , Xn∗ are iid $
n
and E∗ |X1∗ |3 = n−1 i=1 |Xi |3 < ∞. Hence, by the Berry-Esseen theorem,
J
J
E∗ |X1∗ − X̄n |3
∗
√ 3
≤ x) − Φ(x/sn )J ≤ (2.75)
.
sup JP∗ (Tn,n
n sn
x
(2.3)
By Lemma 17.2.2, w.p. 1,
X̄n → µ, n−1
n
)
i=1
Xi2 → EX12 , and n−3/2
n
)
i=1
|Xi |3 → 0
as n → ∞. Hence, the theorem follows from (2.3) and (2.4).
(2.4)
!
The following result is an immediate consequence of Theorem 17.2.1 and
(2.1).
2
Corollary
$m 17.2.3: Under the conditions of Theorem 17.2.1, if σ∞ )= 0
and i=1 Cov(X1 , X1+i ) )= 0, then for any x )= 0,
J J
J
J
∗
≤ x) − P (Tn ≤ x)J = JΦ(x/σ) − Φ(x/σ∞ )J )= 0 a.s.
lim JP∗ (Tn,n
n→∞
∗
≤ x) of P (Tn ≤ x)
Thus, for all x )= 0, the bootstrap estimator P∗ (Tn,n
based on Efron (1979)’s resampling scheme has an error that tends to a
nonzero number in the limit. As a result, the bootstrap estimator of P (Tn ≤
x) is not consistent. By resampling individual Xi ’s, Efron (1979)’s resampling scheme ignores the dependence structure of the sequence {Xn }n≥1
completely, and thus, fails to account for the lag-covariance terms (viz.,
Cov(X1 , X1+i ), 1 ≤ i ≤ m) in the asymptotic variance.
17.3 Block bootstrap
A new type of resampling scheme that is applicable to a wide class of dependent random variables is given by the block bootstrap methods. Although
the idea of using blocks in statistical inference problems for time series is
very common (cf. Brillinger (1975)), the development of a suitable version
of resampling based on blocks has been slow. In a significant breakthrough,
Künsch (1989) and Liu and Singh (1992) independently formulated a block
resampling scheme, called the moving block bootstrap (MBB). In contrast
with resampling a single observation at a time, the MBB resamples blocks
of consecutive observations at a time, thereby preserving the dependence
structure of the original observations within each block. Further, by allowing the block size to grow to infinity, the MBB is able to reproduce the
dependence structure of the underlying process asymptotically. Essentially
548
17. The Bootstrap
the same idea of block resampling was implicit in the works of Hall (1985)
and Carlstein (1986). A description of the MBB is given next.
Let {Xi }i≥1 be a sequence of stationary random variables and let
{X1 , . . . , Xn } ≡ Xn denote the observations. Let 2 be an integer satisfying 1 ≤ 2 < n. Define the blocks B1 = (X1 , . . . , X, ), B2 =
(X2 , . . . , X,+1 ), . . . , BN = (XN , . . . , Xn ), where N = n − 2 + 1. For simplicity, suppose that 2 divides n. Let b = n/2. To generate the MBB samples, one selects b blocks at random with replacement from the collection
{B1 , . . . , BN }. Since each resampled block has 2 elements, concatenating
the elements of the b resampled blocks serially yields b · 2 = n bootstrap
observations X1∗ , . . . , Xn∗ . Note that if one sets 2 = 1, then the MBB reduces
to the bootstrap method of Efron (1979) for iid data. However, for a valid
approximation in the dependent case, it is typically required that both 2
and b go to ∞ as n → ∞, i.e.,
2−1 + n−1 2 = o(1)
as n → ∞.
(3.1)
Next
! suppose" that the random variable of interest is of the form Tn =
tn Xn ; θ(Pn ) , where Pn denotes the joint distribution of X1 , . . . , Xn and
θ(Pn ) is a functional of Pn . The MBB version of Tn based on blocks of size
2 is defined as
"
!
Tn∗ = tn X1∗ , . . . , Xn∗ ; θ(P̂n ) ,
where P̂n = L(X1∗ , . . . , Xn∗ |Xn ), the conditional joint distribution of
X1∗ , . . . , Xn∗ , given Xn , and where the dependence on 2 is suppressed to
ease the notation.
To illustrate the construction of Tn∗ in a specific example, suppose that
Tn is the centered and scaled sample mean Tn = n1/2 (X̄n − µ). Then the
MBB version of Tn is given by
Tn∗ = n1/2 (X̄n∗ − µ̃n ),
(3.2)
where X̄n∗ is the sample mean of the bootstrap observations and where
µ̃n
=
E∗ (X̄n∗ )
=
N −1
N
)
i=1
=
N
−1
(Xi + · · · + Xi+,−1 )/2
1)
N
i=,
Xi +
,−1
)
i=1
2
i/2(Xi + Xn−i+1 ) ,
(3.3)
which is different from X̄n when 2 is > 1.
17.4 Properties of the MBB
Let {Xi }i∈Z be a sequence of stationary random variables with EX1 = µ,
EX12 < ∞ and strong mixing coefficient α(·). In this section, consistency of
17.4 Properties of the MBB
549
the MBB estimators of the variance and of the distribution of the sample
mean will be proved.
17.4.1
Consistency of MBB variance estimators
√
For Tn = n(X̄n − µ), the MBB variance estimator Var∗ (Tn∗ ) has the
desirable property that it can be expressed by simple, closed-form formulas
involving the observations. This is possible because of the linearity of the
bootstrap sample mean in the resampled observations. Let Ui = (Xi + · · ·+
Xi+,−1 )/2 denote the average of the block (Xi , . . . , Xi+,−1 ), i ≥ 1. Then,
using the independence of the resampled blocks, one gets
1 )
2
N
1
Ui2 − µ̃2n ,
Var∗ (Tn∗ ) = 2
N i=1
(4.1)
$n
where N = n − 2 + 1 and when µ̃n = N −1 i=1 Ui is as in (3.3). Under
the conditions of Corollary 16.3.6, the asymptotic variance of Tn is given
by the infinite series
2
≡ lim Var(Tn ) =
σ∞
n→∞
∞
)
EZ1 Z1+i ,
(4.2)
i=−∞
where Zi = Xi − µ, i ∈ Z. The following result proves consistency of the
2
.
MBB estimator of the ‘level 2’ parameter Var(Tn ) or, equivalently, of σ∞
Theorem 17.4.1:
that there exists a δ > 0 such that E|X1 |2+δ <
$∞ Suppose
δ/2+δ
< ∞. If, in addition, 2−1 + n−1 2 = o(1) as
∞ and that n=1 α(n)
n → ∞, then
2
as n → ∞.
(4.3)
Var∗ (Tn∗ ) −→p σ∞
Theorem 17.4.1 shows that under mild moment and strong mixing conditions on the process {Xi }i∈Z , the bootstrap variance estimators Var∗ (Tn∗ ),
are consistent for a wide range of bootstrap block sizes 2, so long as 2 tends
to infinity with n but at a rate slower than n. Thus, block sizes given by
2 = log log n or 2 = n1−$ , 0 < # < 1, are all admissible block lengths for the
consistency of Var∗ (Tn∗ ).
For proving the theorem, the following lemma from Lahiri (2003) is
needed.
Lemma 17.4.2: Let f : R → R be a Borel measurable function and
let {Xi }i∈Z be a (possibly nonstationary) sequence of random vectors with
strong mixing coefficient α(·). Define 7f 7∞ = sup{|f (x)| : x ∈ R} and
"1/(2+δ)
:!
;
= max E|f (U1i )|2+δ
: 1 ≤ i ≤ N , δ > 0, where U1i ≡
ζ
√2+δ,n
√
2 Ui = (Xi + · · · + Xi+,−1 )/ 2, i ≥ 1. Let {ain : i ≥ 1, n ≥ 1} ⊂ [−1, 1]
be a collection of real numbers. Then, there exist constants C1 and C2 (δ)
550
17. The Bootstrap
(not depending on f (·), 2, n, and ain ’s), such that for any 1 < 2 < n/2 and
any n > 2,
Var
-)
N
ain f (U1i )
i=1
.
Proof: Let
S(j) =
/
8
≤ min C1 7f 72∞ n2 1 +
8
2
n2 1 +
C2 (δ)ζ2+δ,n
2j,
)
i=2(j−1),+1
)
1≤k≤n/,
)
9
α(k2) ,
δ/(2+δ)
α(k2)
1≤k≤n/,
ain f (U1i ), 1 ≤ j ≤ J,
90
.
(4.4)
(4.5)
$n
$J
where J = <N/22= and let S(J + 1) = i=1 ain f (U1i ) − j=1 S(j). Also,
$(2)
$(1)
and
respectively denote summation over even and odd j ∈
let
{1, . . . , J + 1}. Note that for any 1 ≤ j, j + k ≤ J + 1, k ≥ 2, the random
variables S(j) and S(j+k) depend on disjoint sets of Xi ’s that are separated
by (k−1)22−2 observations in between. Hence, noting that |S(j)| ≤ 227f 7∞
for all 1 ≤ j ≤ J + 1, by Corollary 16.2.4, one gets |Cov(S(j), S(j + k))| ≤
4α((k − 1)22 − 2)(427f 7∞ )2 for all k ≥ 2, j ≥ 1. Hence,
.
-)
n
ain f (U1i )
Var
i=1
(
'$
$(2)
(1)
S(j) +
S(j)
Var
8
'$
(
'$
(9
(1)
(2)
S(j) + Var
S(j)
≤ 2 Var
1 J+1
2
)
)
!
"
≤ 2
ES(j)2 + (J + 1) ·
α (2k − 1)22 − 2 · 4(427f 7∞ )2
=
1≤k≤J/2
i=1
1'
2
J
(
)
n
+ 1 422 7f 72∞ + 64(n2)7f 72∞
α(k2)
22
k=1
1
2
J
)
≤ C1 7f 72∞ · n2 + (n2)
α(k2) .
≤
2
(4.6)
k=1
This yields the first term in the upper bound. The
second
term
J
!
can be derived similarly by using the inequalities JCov S(j), S(j +
"1/(2+δ) !
" !
"δ/(2+δ)
"J
!
ES(j + k)2+δ α (k − 1)22 − 2
and
k) J ≤ C(δ) ES(j)2+δ
"
!
2+δ 1/(2+δ)
≤ 22ζ2+δ,n and is left as an exercise.
!
ES(j)
17.4 Properties of the MBB
551
Proof of Theorem 17.4.1: W.l.o.g., let µ = 0. Then, E X̄n2 = O(n−1 ) as
n → ∞. Hence, by (3.3) and Lemma 17.4.2
/J
J
J2
n JJ
J
nE Jµ̃n − X̄n J ≤ nE J1 − J |X̄n |
N
J ,
J02
J)
J
+ (N 2)−1 JJ
(2 − i)(Xi + Xn−i+1 )JJ
i=1
J2
1 J)
J ,
J
J J2
−2
J
J
J
EJ
≤ 2 (2/N ) nE X̄n + 2nN
(i/2)Xi JJ
/
2
i=1
J2 20
J ,
J
J)
J
+ EJ
(i/2)X,−i JJ
i=1
(
'
!
"
= O [2/n]2 + O [2/n]
= O(2/n)
This implies
as
(4.7)
n → ∞.
3 J J
J2 4
J
2
≤ 2 · 2E JX̄ J + 2E JX̄n − µ̃n J
:
E 2µ̃2n }
= O(2/n).
(4.8)
$N
−1
2
2
Hence, it remains
to show that
2N
i=1 Ui −→p σ∞ as n → ∞. Let
"
!
2
1/8
2
and Win = U1i − Vin , 1 ≤ i ≤ N . Then, by
Vin = U1i I |U1i | ≤ (n/2)
Lemma 17.4.2,
J
J
N
J −1 )
!
"J 2
J
E JN
Vin − EVin JJ
i=1
≤
≤
8
const.(n/2)1/2 n2 + n2
8
const.(n/2)−1/2 1 +
= o(1)
1≤k<n/,
)
k≥1
9
α(k)
α(k2)
9S
N2
(4.9)
√
Next, note that by√definition, U11 = 2X̄, . Further, under the conditions
2
), by Corollary 16.3.6. Hence, by
of Theorem 17.4.1, nX̄n −→d N (0, σ∞
the EDCT,
(
'
(4.10)
lim EW11 = lim E|U11 |2 I |U11 | > (n/2)1/8 = 0.
n→∞
as n → ∞.
)
n→∞
2
σ∞
|
2
2
Therefore, |EV1n −
≤ E|U11 |2 I(|U11 |8 > n/2) + |EU11
− σ∞
| = o(1).
Hence, for any # > 0, by (4.9), (4.10), and Markov’s inequality,
J
.
-J
N
J −1 )
J
2
2 J
J
Ui − σ∞ J > 3#
lim P J2N
n→∞
i=1
552
17. The Bootstrap
≤
J
-J
N
J −1 )
J
J J
2 J
J
lim P JN
(Vin − EVin )JJ + JEV1n − σ∞
n→∞
i=1
J
J
.
N
)
J
J
+ JJN −1
Win JJ > 3#
i=1
≤
J
-J
.
N
J −1 )
J
J
J
lim P JN
(Vin − EVin )J > #
n→∞
n→∞
≤
=
i=1
J
.
-J
N
J
J −1 )
J
J
+ lim P JN
Win J > #
i=1
J
J2
N
)
J
J
lim #−2 E JJN −1
(Vin − EVin )JJ + lim #−1 E|W11 |
n→∞
n→∞
i=1
0.
This proves Theorem 17.4.1.
17.4.2
!
Consistency of MBB cdf estimators
The main result of this section is the following:
Theorem 17.4.3: Let {Xi }i∈Z be a sequence of stationary random vari2+δ
< ∞ and
ables. Suppose that there exists a δ ∈ (0, ∞) such that
$
$ E|X1 |
∞
δ/(2+δ)
2
< ∞. Also, suppose that σ∞ = i∈Z Cov(X1 , X1+i ) ∈
n=1 α(n)
(0, ∞) and that 2−1 + n−1 2 = o(1) and n → ∞. Then,
J
J
sup JP∗ (Tn∗ ≤ x) − P (Tn ≤ x)J −→p 0
x∈Rd
as
n → ∞,
(4.11)
√
√
where Tn = n(X̄n − µ) and Tn∗ = n(X̄n∗ − µ̂n ) is the MBB version of
Tn based on blocks of size 2.
Theorem 17.4.3 shows that like the MBB variance estimators, the MBB
distribution function estimator is consistent for a wide range of values of
the block length parameter 2. Indeed, the conditions on 2 presented in both
Theorems 17.4.1 and 17.4.2 are also necessary for consistency of these bootstrap estimators. If 2 remains bounded, then the block bootstrap methods
fail to capture the dependence structure of the original data sequence and
converge to a wrong normal limit as was noted in Corollary 17.2.3. On the
other hand, if 2 goes to infinite at a rate comparable with the sample size
n (violating the condition n−1 2 = o(1) as n → ∞), then it can be shown
(cf. Lahiri (2001)) that the MBB estimators converge to certain random
probability measures.
17.4 Properties of the MBB
553
2
Proof: Since Tn converges in distribution to N (0, σ∞
), it is enough to
show that
J
J
(4.12)
sup JP∗ (Tn∗ ≤ x) − Φ(x/σ∞ )J −→p 0 as n → ∞,
x
where Φ(·) denotes the cdf of the standard normal distribution. Let
∗
∗
+ · · · + Xi1
)/2, 1 ≤ i ≤ b denote the average of
Ui∗ = (X(i−1),+1
the ith resampled MBB block. Also, let an = (n/2)1/4 , n ≥ 1 and let
J
J !√ J
J
"
ˆ n (a) = 2b−1 $b E∗ JU ∗ − µ̂n J2 I 2JU ∗ − µ̂n J > 2a , a > 0. Note that
∆
i
i
i=1
conditional on X1 , . . . , Xn , U1∗ , . . . , Ub∗ are iid random vectors with
!
"
1
for j = 1, . . . , N,
(4.13)
P∗ U1∗ = Uj =
N
"
!
where N = n − 2 + 1 and Uj = Xj + · · · + Xj+,−1 /2, 1 ≤ j ≤ N . Hence,
by (4.8) and the EDCT, for any # > 0,
"
!
ˆ n (an ) > #
P ∆
ˆ n (an )
≤ #−1 E ∆
3
J
J 2 !√ J
J
"4
= #−1 E 2E∗ JU1∗ − µ̃n J I 2JU1∗ − µ̃n J > 2an
√
√
J
J2 !J
J
"
= #−1 E JU11 − 2 µ̃n J I JU11 − 2 µ̃n J > 2an
8
9
"
!
≤ 4#−1 E|U11 |2 I |U11 | > an + 2E|µ̃n |2 → 0 as n → ∞.
Thus,
ˆ n (an ) −→p 0
∆
as
(4.14)
n → ∞.
To prove (4.12), it is enough to show that for any subsequence {nk }k≥1 ,
there is a further subsequence {nki }i≥1 ⊂ {nk }k≥1 (for notational simplicity, nki is written as nki ) such that
J
J
(4.15)
lim sup JP∗ (Tn∗ki ≤ x) − Φ(x/σ∞ )J = 0 a.s.
i→∞
x
Fix a subsequence {nk }k≥1 . Then, by (4.14) and Theorem 17.4.1, there
exists a subsequence {nki }i≥1 of {nk }k≥1 such that as i → ∞,
!
"
2
Var∗ Tn∗ki → σ∞
ˆ n (an ) → 0
a.s. and ∆
ki
ki
a.s.
(4.16)
L
$b
∗
Note that Tn∗ = L
i=1 (Ui − µ̃n ) 2/b is
La sum of conditionally iid random
vectors (U1∗ − µ̃n ) 2/b, . . . , (Ub∗ − µ̃n ) 2/b, which, by (4.16), satisfy Lindeberg’s condition along the subsequence nki , a.s. Hence, by the CLT for
independent random vectors, the conditional distribution of Tn∗ki converges
2
to N (0, σ∞
) as i → ∞, a.s. Hence, by Polyā’s theorem (cf. Chapter 8),
(4.15) follows.
!
554
17.4.3
17. The Bootstrap
Second order properties of the MBB
Under appropriate regularity conditions, second order correctness (s.o.c.)
(cf. Section 17.1.3) of the MBB is known for the normalized and studentized
sample mean. As in the independent case, the proof is based on Edgeworth
expansions for the given pivotal quantities and their block bootstrap versions. Derivation of the Edgeworth expansion under dependence is rather
complicated. In a seminal work, Götze and Hipp (1983) developed some
conditioning argument and established an asymptotic expansion for the
normalized sum of weakly dependent random vectors. S.o.c. of the MBB
can be established under a similar set of regularity conditions, stated next.
Let {Xi }i∈Z be a sequence of stationary random variables on a probability space (Ω, F, P ) and let {Dj : j ∈ Z} be a collection of sub-σ-algebras
of F. For −∞ ≤ a ≤ b ≤ ∞, let Dab = σ1∪{Dj : j ∈ Z, a ≤ j ≤ b}2. The
following conditions will be used:
(
'$
n
2
(C.1) σ∞
≡ lim n−1 Var
Xi ∈ (0, ∞).
n→∞
i=1
(C.2) There exists δ ∈ (0, 1) such that for all n, m = 1, 2, . . . with m > δ −1 ,
n+m
‡
-measurable random vector Xn,m
satisfying
there exists a Dn−m
J
J
‡ J
≤ δ −1 exp(−δm).
E JXn − Xn,m
i
(C.3) There exists δ ∈ (0, 1) such that for all i ∈ Z, m ∈ N, A ∈ D−∞
, and
∞
B ∈ Di+m ,
J
J
JP (A ∩ B) − P (A)P (B)J ≤ δ −1 exp(−δm).
(C.4) There exists δ ∈ (0, 1) such that for all m, n, k = 1, 2, . . ., and
n+k
A ∈ Dn−k
J
J
E JP (A|Dj : j )= n)−P (A|Dj : 0 < |j −n| ≤ m+k)J ≤ δ −1 exp(−δm).
(C.5) There exists δ ∈ (0, 1) such that for all m, n = 1, 2, . . . with δ −1 <
m < n and for all t ∈ Rd with |t| ≥ δ,
J :
;J
E JE exp(ιt · [Xn−m + · · · + Xn+m ]) | Dj : j )= n J ≤ exp(−δ).
Condition (C.4) is a strong-mixing condition on the underlying auxiliary sequence of σ-algebras Dj ’s, that requires the σ-algebras Dj ’s to be
strongly mixing at an exponential rate. For Edgeworth expansions for the
normalized sample mean under polynomial mixing rates, see Lahiri (1996).
Condition (C.3) connects the strong mixing condition on the σ-fields Dj ’s
to the weak-dependence structure of the random vectors Xj ’s. If, for all
j ∈ Z, one sets Dj = σ1Xj 2, the σ-field generated by Xj , then Condition
17.4 Properties of the MBB
555
‡
(C.3) is trivially satisfied with Xn,m
= Xn for all m. However, this choice
of Dj is not always the most useful one for the verification of the rest of
the conditions.
Condition (C.4) is an approximate Markov-type property, which trivially
holds if Xj is Dj -measurable and {Xi }i∈Z is itself a Markov chain of a finite
order. Finally, (C.5) is a version of the Cramer condition in the weakly
dependent case. Note that if Xj ’s are iid and the σ-algebras Dj ’s are chosen
as Dj = σ1Xj 2, j ∈ Z, then Condition (C.5) is equivalent to requiring that
for some δ ∈ (0, 1),
J
J
1 > e−δ ≥ E JE{exp(ιt · Xn ) | Xj : j )= n}J
J
J
= JE exp(ιt · X1 )J for all |t| ≥ δ,
(4.17)
which, in turn, is equivalent to the standard Cramer condition
J
J
lim sup JE exp(ιt · X1 )J < 1.
(4.18)
|t|→∞
However, for weakly dependent stationary Xj ’s, the standard Cramer condition on the marginal distribution of X1 is not enough to ensure a smooth
Edgeworth expansion for the normalized sample mean (cf. Götze and Hipp
(1983)).
Conditions (C.2)–(C.5) have been verified for different classes of dependent processes, such as, (i) linear processes with iid innovations, (ii) smooth
functions of Gaussian processes, and (iii) Markov processes, etc. See Götze
and Hipp (1983) and Lahiri (2003), Chapter 6, for more details.
The next theorem shows that the MBB is s.o.c. for a range of block sizes
under some moment condition and under the regularity conditions listed
above.
stationary random variTheorem 17.4.4: Let {Xi }i∈Z be a collection of √
ables satisfying Conditions (C.1)–(C.5). Let R̃n = n(X̄n − µ)/σ∞ and let
R̃n∗ be the MBB version of R̃n based on blocks of length 2, where µ = EX1
2
and σ∞
is as in (C.1). Suppose that for some δ ∈ (0, 13 ), E|X1 |35+δ < ∞
and
δnδ ≤ 2 ≤ δ −1 n1/3 for all n ≥ δ −1 .
(4.19)
Then
J !
!
"
"
!
"J
sup JP∗ R̃n∗ ≤ x − P R̃n ≤ x J = Op n−1 2 + n−1/2 2−1 .
(4.20)
x
Proof: See Theorem 6.7 (b) of Lahiri (2003).
The moment condition can be reduced considerably if the error bound
on the right side of (4.20) is replaced by o(n−1/2 ) only, and if the range of
2 values in (4.19) is restricted further (cf. Lahiri (1991)).
556
17. The Bootstrap
Remark 17.4.1: It is worth mentioning that the error bound on the right
side of (4.20) is optimal. Indeed, for mean-squared-error (MSE) optimal
estimation of the probability P (R̃n ≤ x), x )= 0, the optimal block size
is of the form 20 ∼ C1 n1/4 for some constant C1 depending on the joint
distribution of the Xi ’s (cf. Hall, Horowitz and Jing (1995)). This rate can
be deduced also by minimizing the expression on the right side of (4.20)
as a function of (4.20). On the other hand, MSE-optimal block size for
estimating variance-type functionals is of the form C1 n1/3 . See Chapter 7,
Lahiri (2003), for more details.
17.5 Problems
17.1 (a) Apply the Lindeberg-Feller CLT to the triangular array
(X1∗ , X2∗ , . . . , Xn∗ )n≥1 given a realization of {Xj }j≥1 to give a
direct proof of Theorem 17.1.1.
(b) Show that under the condition of Theorem 17.1.1, (1.5) holds iff
for each x ∈ R, P (R̃n ≤ x) − P∗ (R̃n∗ ≤ x) → 0 as n → ∞, a.s.
(Hint: Use Polyā’s theorem (cf. Chapter 8).)
17.2 Suppose that {Xn }n≥1 be a sequence of iid random!variables "with
√
√
∗
∗
EX12 < ∞. Let R̃n = n(X̄n − µ)/σ and R̃m,n
= n X̄m
− X̄n /sn
be as in (1.4). Show that for any sequence mn → ∞,
J !
! ∗
"
"JJ
J
≤ x J → 0 as n → ∞ a.s.
sup JP R̃n ≤ x − P∗ R̃m
n ,n
x
17.3 Suppose that {Xn }n≥1 be a sequence of iid random variables with
2
µ = EX1 , σ
Let
& = Var(X1 ) ∈ (0, ∞).
& F̂n be an estimator of F such
that µ̂n = xdF̂n (x) ∈ R and σ̂n2 = x2 dF̂n (x) − (µ̂n )2 ∈ (0, ∞).
√
√
∗
and Tn∗ =
Let Rn = n(X̄n −µ)/σ and Tn = n(X̄n −µ) and let Rn,n
"
√ ! ∗
n X̄n − µ̂n be the bootstrap versions of Rn and Tn , respectively,
based on a resample of size m = n from F̂n .
(a) Suppose that for some δ ∈ (0, 12 ),
J
J2 !J
J
"
E∗ JX1∗ − µ̂n J I JX1∗ − µ̂n J > nδ σ̂n /σ̂n2 = o(1)
Show that
J
J
sup JP (Rn ≤ x) − P∗ (Rn∗ ≤ x)J → 0
x
as
a.s.
(5.1)
n → ∞, a.s.
(b) Show that if F̃n (·) ≡ F̂n (· − µ̂n ) −→d F̃ = F (· − µ) a.s. and
σ̂n2 → σ 2 a.s., then
J
J
sup JP (Tn ≤ x) − P∗ (Tn∗ ≤ x)J → 0 as n → ∞, a.s.
x
17.5 Problems
557
(Hint: First verify (5.1).)
17.4 Show that under the conditions of Theorem 17.1.3, (1.15) holds.
√
(Hint: Set xn = jn h/(s√n n) and ηn = 1nx0 /h2, where jn ∈ Z is to
nσ
be chosen. Check that 1 nx
− ηn 2 = 1jn ( sσn − 1) − ηn 2. By the LIL,
h
"
!σ
−1/2
(log log n)1/4 . Now choose
sn − 1 > an i.o., a.s. where an = n
!σ
"−1
(1 − δ + ηn , 1 + ηn ) ∩ Z.)
any jn ∈ sn − 1
17.5 Let {Xn }n≥1 be a sequence of iid random variables with X1 ∼
N (θ, 1), θ ∈ R. For n ≥$1, let θ̂n = Fn−1 (1/2), the sample median of
n
−1
X1 , . . . , Xn:, X̄n = n
i=1 Xi , the;sample mean of X1 , . . . , Xn , and
$
n
Yn2 = max 1, n−1 i=1 (Xi − X̄n )2 , where Fn is the empirical cdf
of X1 , . . . , Xn . Suppose that X1∗ , . . . , Xn∗ are (conditionally) iid with
$n
X1∗ ∼ N (θ̂n , Yn2 ) and let X̄n∗ = n−1 i=1 Xi∗ be the (parametric)
bootstrap sample mean. Define
√
√
√
Tn = n(X̄n − θ), Tn∗ = n(X̄n∗ − θ̂n ) and Tn∗∗ = n(X̄n∗ − X̄n ).
(a) Show that
J
J
lim sup JP (Tn ≤ x) − P∗ (Tn∗ ≤ x)J = 0
n→∞
a.s.
x
(b) Show that
J
!
"J
lim sup JP∗ (Tn∗∗ ≤ x) − Φ Yn−1 [x − Wn ] J = 0
n→∞
a.s.
x
for some random variables Wn ’s such that Wn −→d W .
(c) Find the distribution of W in part (b).
(Hint: Use the Bahadur representation (cf. Bahadur (1966)) for
sample quantiles.)
(d) Show that Yn2 −→p 1 as n → ∞.
(e) Conclude that there exists # > 0 such that
(
'
J
J
lim P sup JP (Tn ≤ x) − P∗ (Tn∗∗ ≤ x)J > # > #.
n→∞
x∈R
(Thus, Tn∗∗ is an incorrect parametric bootstrap version of Tn .)
17.6 Let {Xn }n≥1 be a sequence of iid random variables with E(X1 ) = µ,
Var(X1 ) = σ 2 ∈ (0, ∞) and E|X1 |4 < ∞. Let X1∗ , . . . , Xn∗ denote the
nonparametric bootstrap sample$drawn randomly with replacement
n
from X1 , . . . , Xn and X̄n∗ = n−1 i=1 Xi∗ . Let
√
√
Tn = n(X̄n − µ) and Tn∗ = n(X̄n∗ − X̄n ).
558
17. The Bootstrap
Show that there exists a constant K ∈ (0, ∞) such that
lim sup √
n→∞
√
J
J
n
sup JP (Tn ≤ x) − P∗ (Tn∗ ≤ x)J = K
log log n x∈R
a.s.
Find K.
(This shows that the bootstrap approximation for Tn is less accurate
than that for the normalized sample mean.)
17.7 (a) Let F and G be two cdfs on R and let F be continuous. Show
that
J
J
J
J
sup JF (x) − G(x)J = sup JF (x−) − G(x−)J.
x
x
(b) Let {Xn }n≥1 be a sequence of iid random variables with EX1 =
√
µ, Var(X1 ) = σ 2 ∈ (0, ∞) and E|X1 |3 < ∞. Let R̃n = n(X̄n −
µ)/σ, and Rn∗ be its bootstrap version based on a resample of
size n from the edf Fn .
(i) Show that
J
J
sup JP (R̃n < x) − Φ(x)J = o(n−1/2 )
as
x
n→∞
where Φ(·) denotes the cdf of the N (0, 1) distribution.
(ii) Show that if the distribution of X1 is nonlattice, then
J
J
sup JP (R̃n < x) − P∗ (Rn∗ < x)J = o(n−1/2 ) a.s.
x
∗
be the bootstrap version of a random variable Rn , n ≥ 1
17.8 Let Rm,n
and let Rn −→d R∞ , where R∞ has a continuous distribution on R.
∗
) and φn (t) = E exp(ιtRn ), 1 ≤ n ≤ ∞.
Let φ̂m,n (t) = E∗ exp(ιtRm,n
(a) Show that if
J
J
∗
≤ x) − P (Rn ≤ x)J −→p 0
sup JP∗ (Rm,n
x
as
n → ∞,
then for every t ∈ R, φ̂m,n (t) −→p φ∞ (t) as n → ∞.
(b) Suppose that there exists a sequence {hn }n≥1 such that
J
J
sup Jφ̂m,n (t) − φ̂m,n (t + hn )J −→p 0
t∈R
as
n → ∞.
(5.2)
Then, the converse to (a) holds.
∗
be as in (1.23) and (1.27), respectively. Show
(c) Let Wα and Tm,n
that (1.40) implies (1.39).
17.5 Problems
559
(Hint: (b) Let Yn (t) ≡ |φ̂m,n (t) − φ∞ (t)|, t ∈ R and let Q =
{q1 , q2 , . . .} be an enumeration of the rationals in R. Given a subsequence {n0j }j≥1 , extract a subsequence {n1j }j≥1 ⊂ {n0j }j≥1 such
that Yn1j (q1 ) → 0 as j → ∞, a.s. Next extract {n2j } ⊂ {n1j } such
that Yn2j (q2 ) → 0 as j → ∞, a.s., and continue this for each qk ∈ Q.
Let nj ≡ njj , j ≥ 1. Then, there exists a set A with P (A) = 1, and
on A,
Ynj (qk ) → 0 as j ≥ ∞ for all qk ∈ Q.
Now use (5.2) to show that w.p. 1, Yn!j (t) → 0 as j → ∞ for all t ∈ R,
for some {n$j } ⊂ {nj }.
(c) Use the inequality
J
!
"J
sup JE exp(ιtX) − E exp ι(t + h)X J ≤ hE|X|
t
and the fact that under (1.21), |X(n) | + |X(1) | = O(nβ ) as n → ∞,
a.s. for any β > α1 . )
17.9 Verify (1.45) in the proof of Theorem 17.1.4.
17.10 (Athreya (1987a)). Let {Xn }n≥1 be a sequence of iid random variables satisfying (1.21) and Tn be as in (1.23). Let X1∗ , . . . , Xn∗ be
conditionally iid random variables with cdf Fn of (1.1). Define an
alternative bootstrap version of Tn as
∗
T̃n,n
= n(X̄n∗ − X̄n )/X(n)
$n
$n
where X̄n∗ = n−1 i=1 Xi∗ , X̄n = n−1 i=1 Xi and X(n) =
an ’s are
max{X1 , . . . , Xn }. (Here the scaling constants
" replaced by
!
the bootstrap analog of (1.26), i.e., by n 1 − Fn (ân ) → 1, which
yields ân = X(n) .) Let
∗
), t ∈ R.
φ̃n (t) = E∗ exp(ιtT̃n,n
Show that φ̃n (t) converges in distribution to a random limit as n →
∞. Identify the limit.
17.11 Suppose that {Xn }n≥1 satisfies conditions of Theorem 17.1.4 and
∗
is as in (1.27). Show that for any t1 , . . . , tk ∈ R, k ∈ N,
that Tm,n
!
(
'
'%
"
h(t1 , x)dM (x) , . . . ,
φ̂n (t1 ), . . . , φ̂n (tk ) −→d exp
'%
((
exp
h(tk , x)M (dx) ,
∗
), t ∈ R.
where φ̂n (t) ≡ E∗ exp(ιtTn,n
560
17. The Bootstrap
17.12 Show that (4.17) and (4.18) are equivalent.
17.13 Let
√ of stationary random variables. Let TN =
√ {Xi }i∈Z be a sequence
n(X̄n − µ) and Tn∗ = n(X̄n∗ − µ̂n ), where X̄n∗ is the MBB sample
mean based on blocks of length 2, where 2−1 +n−1 2 = o(1) as n → ∞.
Suppose that X1 has finite moments of all order and that {Xi } is mdependent for m ∈ Z+ . Assume that the Berry-Esseen theorem holds
for X̄n .
(a) Show that
J
J
!
"
∆n (2) ≡ sup JP (Tn ≤ x) − Pn (Tn∗ ≤ x)J = Op (n−1 2)1/2 + 2−1 .
x
(b) Find the limit distribution of n1/3 ∆n (2) for 2 = <n1/3 =, where
∆n (2) is as in part (a).
17.14 Suppose√that {Xi }i∈Z , X̄n∗ , 2, etc. be√as in Problem 17.13. Let
R̄n =
n(X̄n − µ)/σn and Rn∗∗ =
n(X̄n∗ − X̄n )/σ̂n (2) where
σn2 = nVar(X̄n ) and σ̂n2 (2) = nVar∗ (X̄n∗ ).
(a) Show that
˜ n (2)
∆
J
J
≡ sup JP (Rn ≤ x) − P (Rn∗∗ ≤ x)J
x
!
"
= Op n−1/2 21/2 .
˜ n (2).
(b) Find the limit distribution of n1/2 2−1/2 ∆
17.15 Let {Xi }i∈Z , X̄n∗ , 2, be as in Problem 17.13.
!
"
(a) Find the leading terms in the expansion of MSE σ̂n2 (2) ≡
"2
!
E σ̂n2 (2) − σn2 , where σ̂n2 (2) = nVar∗ (X̄n∗ ) and σn2 = nVar(X̄n ).
(b) Find the MSE-optimal block size for estimating σn2 .
17.16 Let {Xi }i∈Z , Tn , Tn∗ , 2 be as: in Problem 17.13. For α;∈ (0, 1), let
tα,n = α-quantile of Tn = inf :x : x ∈ R, P (Tn ≤ x) ≥ α ;and t̂α,n (2)
be the α-quantile of Tn∗ = inf x : x ∈ R, P∗ (Tn∗ ≤ x) ≥ α .
(a) Show that
[t̂α,n (2) − tα,n ] −→p 0
as
n → ∞.
!
"
= t̂α,n <n1/3 = . Find a
(b) Suppose that 2 = <n1/3 = and write t̃α,n
sequence {an }n≥1 such that
!
"
an t̃α,n − tα,n −→d Z
for some nondegenerate random variable Z. Identify the distribution of Z.
18
Branching Processes
The study of the growth and development of many species of animals,
plants, and other organisms over time may be approached in the following
manner. At some point in time, a set of individuals called ancestors (or
the zeroth generation) is identified. These produce offspring, and the collection of offspring of all the ancestors constitutes the first generation. The
offspring of these first generation individuals constitute the second generation and so on. If one specifies the rules by which the offspring production
takes place, then one could study the behavior of the long-time evolution
of such a process, called a branching process. Questions of interest are the
long-term survival or extinction of such a process, the growth rate of such
a population, fluctuation of population sizes, effects of control mechanisms,
etc. In this chapter, several simple mathematical models and some results
for these will be discussed. Many of the assumptions made, especially the
one about independence of lines of descent, are somewhat idealistic and
unrealistic. Nevertheless, the models do prove useful in answering some
general questions, since many results about long-term behavior stay valid
even when the model deviates somewhat from the basic assumptions.
It is worth noting that the models described below are relevant and applicable not only to population dynamics as mentioned above but also to
any evolution that has a tree-like structure, such as the process of electron
multiplication, gamma ray radiation, growth of sentences in context-free
grammars, algorithmic steps, etc. Thus, the theory of branching processes
has found applications in cosmic ray showers, data structures, combinatorics, and molecular biology, especially DNA sequencing. Some references
to these applications may be found in Athreya and Jagers (1997).
562
18. Branching Processes
18.1 Bienyeme-Galton-Watson branching process
One of the earliest and simplest models is the Bienyeme-Galton-Watson
(BGW) branching process. In this model, it is assumed that the offspring of
each individual is a random variable with a probability distribution (pj )j≥0
and that the offspring of different individuals are stochastically $
indepenZ0
ζ0,i
dent. Thus, if Z0 is the size of the zeroth generation, then Z1 = i=1
where {ζ0,i : i = 1, 2 . . .} are independent random variables with the
same distribution {pj }j≥0 and also independent of Z0 . Here ζ0,i is to be
thought of as the number of offspring of the ith individual in the zeroth
generation.
if Zn denotes the nth generation population, then
$ZSimilarly,
n
ζni , where {ζni : i = 1, 2 . . .} are iid random variables with
Zn+1 = i=1
distribution {pj }j≥0 and also independent of Z0 , Z1 , . . . , Zn . This implies
that the lines of descent initiated by different individuals of a given generation evolve independently of each other (Problem 18.3). This may not
be very realistic when there is competition for limited resources such as
space and food in the habitat. The long-term behavior
$∞of the sequence
{Zn }∞
is
crucially
dependent
on
the
parameter
m
≡
0
j=0 jpj , the mean
with
offspring distrioffspring size. The BGW branching process {Zn }∞
0
bution {pj }j≥0 is called supercritical, critical, and subcritical according as
m > 1, = 1, or < 1, respectively. In what follows, the case when pi = 1
for some i is excluded. The main results are the following: see Athreya and
Ney (2004) for details and full proofs.
Theorem 18.1.1: (Extinction probability).
(a) m ≤ 1 ⇒ Zn = 0 for all large n with probability one (w.p. 1).
(b) m > 1 ⇒ P (Zn → ∞ as n → ∞ | Z0 = 1) = 1 − q > 0, where
q ≡ P (Zn = 0 for all large n | Z0 = 1) is the smallest root in [0,1] of
s = f (s) ≡
∞
)
pj sj .
(1.1)
j=0
Theorem 18.1.1 says that if m ≤ 1, then the process will be extinct in
finite time w.p. 1, whereas if m > 1, then the extinction probability q (with
Z0 = 1) is strictly less than one, and when the process does not become
extinct, it grows to infinity.
Proof: Since {Zn }n≥0 is a Markov chain with state space Z+ ≡
{0, 1, 2, 3, . . .} and 0 is an absorbing barrier, all nonzero states are transient. Thus P (Zn = 0 for some n ≥ 1) + P (Zn → ∞ as n → ∞) = 1.
Also,
q
≡ P (Zn = 0 for some n ≥ 1 | Z0 = 1)
18.1 Bienyeme-Galton-Watson branching process
=
=
563
lim P (Zn = 0|Z0 = 1)
n→∞
lim fn (0)
n→∞
where fn (s) ≡ E(sZn |Z0 = 1), 0 ≤ s ≤ 1.
But by the definition of {Zn },
fn+1 (s)
Iterating this yields
=
E(sZn+1 |Z0 = 1)
!
"
= E E(sZn+1 |Zn )|Z0 = 1
!
"
= E (f (s))Zn = fn (f (s)).
fn (s) = f (n) (s), n ≥ 0
where f (n) (s) is the nth iterate of f (s).
By continuity of f (s) in [0,1],
!
"
q = lim fn (0) = lim f fn (0) = f (q).
n→∞
n→∞
If q $ is any other solution of (1.1) in [0,1], then q $ = f (n) (q $ ) ≥ f (n) (0) for
each n and hence q $ ≥ limn→∞ f (n) (0) = q.
This establishes (1.1). It is not difficult to show that by the convexity of
the function f (s) on [0,1], q = 1 if m ≡ f $ (1) ≤ 1 and p0 ≤ q < 1 if m > 1.
!
The following results are refinements of Theorem 18.1.1.
Theorem 18.1.2: (Supercritical case). Let m > 1, Z0 = z0 , 1 ≤ z0 < ∞,
and Wn ≡ Zn /mn . Then
(a) {Wn }n≥0 is a nonnegative martingale and hence, converges w.p. 1 to
a limit W .
$∞
z0
(b)
j=1 (j log j)pj < ∞ ⇒ P (W = 0) = q , EW = z0 , and W has a
strictly positive density on (0, ∞).
$∞
(c)
j=1 (j log j)pj = ∞ ⇒ P (W = 0) = 1.
(d) There always exists a sequence {Cn }∞
0 such that limn→∞ Zn /Cn ≡ W̃
exists and is finite w.p. 1, P (W̃ = 0) = q and Cs+1 /Cn → m.
This theorem says that in a supercritical BGW process in the event of
nonextinction, the process grows exponentially fast, confirming the assertion of the economist and demographer Malthus of the 19th century.
Proof: Since {Zn }n≥0 is a Markov chain
E{Zn+1 |Z0 , Z1 , . . . , Zn } = E{Zn+1 |Zn } = Zn m
564
18. Branching Processes
implying E(Wn+1 |W0 , W1 , . . . , Wn ) = Wn proving (a).
!
Parts (b) and (c) are known as the Kesten-Stigum theorem, and for full
proof see Athreya and Ney (2004) where a proof of part (d) is also given.
For a weaker version of (b), see Problem 18.2.
Theorem 18.1.3:
$∞ (Critical case). Let m = 1 and Z0 = z0 , 1 ≤ z0 < ∞.
Let 0 < σ 2 = j=1 j 2 pj − 1 < ∞. Then
lim P
n→∞
-
.
Zn
≤ x | Zn )= 0 = 1 − exp(−2x/σ 2 ).
n
(1.2)
Thus, in the critical case, conditioned on nonextinction by time n, the
population in the nth generation is of order n, which when divided by n is
distributed approximately as an exponential with mean σ 2 /2.
Theorem 18.1.4: (Subcritical case). Let m < 1 and Z0 = z0 , 1 ≤ z0 < ∞.
Then
lim P (Zn = j|Zn > 0) = πj
n→∞
(1.3)
$∞
$∞
exists for all j ≥ 1 and j=1 πj = 1. Furthermore, j=1 jπj < ∞ if and
$∞
only if j=1 (j log j)pj < ∞.
For proof of Theorems 18.1.3 and 18.1.4, see Athreya and Ney (2004).
In the supercritical case with p0 = 0, it is possible to estimate consistently
the mean m and the variance σ 2 of the offspring distribution from observing
the population size sequence {Zn }n≥0 , but the whole distribution {pj }j≥0
is not identifiable. However, if the entire tree is available, then {pj }j≥0 is
identifiable.
18.2 BGW process: Multitype case
The model discussed in the previous section has a natural generalization.
Consider a population with k types of individuals, 1 < k < ∞. Assume
that a type i individual can produce offspring of all types with a probability
distribution that may depend on i but is independent of past history as
well as the other individuals in the same generation. This ensures that the
lines of descent initiated by different individuals in a given generation are
independent, and those initiated by the individuals of the same type are iid
as well. The dichotomy of extinction or infinite growth continues to hold.
The analog of the key parameter m of !the single
" type case is the maximal
eigenvalue ρ of the mean matrix M ≡ (mij ) k×k , where mij is the mean
number of type j offspring of an individual of type i.
18.2 BGW process: Multitype case
565
!
"
Theorem 18.2.1: (Extinction probability). Let M ≡ (mij ) , the mean
matrix, be such that for some n0 > 0, M n0 >> 0 i.e., all the entries of
M n0 are strictly positive. Let ρ be the maximal eigenvalue of M . Then
(a) ρ ≤ 1 ⇒ P (the population is extinct in finite time) = 1, for any
initial condition.
(b) ρ > 1 ⇒ P (the population is extinct in finite time) = 1 − P (the
population grows to infinity) < 1 for all initial conditions other than
0 ancestors.
Furthermore, if qi = P (extinction starting with one ancestor of type i),
then the vector q = (q1 , q2 , . . . , qk ) is the smallest root in [0, 1]k of the
equation s = f (s), where s = (s1 , s2 , . . . , sk ) and f (s) = [f1 (s), . . . , fk (s)],
with fk (s) being the generating function of the offspring distribution of a
type i individual.
The following refinements of the above Theorem 18.2.1 are the analogs of
Theorems 18.1.2, 18.1.3, and 18.1.4. It is known that for the maximal eigenvalue ρ of M , there exist strictly positive eigenvectors u = (u1 , u2 , . . . , uk )
and v = (v1 , v2 , . . . , vk ) such that uM = ρu and M v T = ρv T (where su$k
perscript T stands for transpose) normalized to satisfy i=1 ui = 1 and
$k
i=1 ui vi = 1. Let Zn = (Zn1 , Zn2 , . . . , Znk ) denote the vector of population sizes of individuals of the k types in the nth generation.
Theorem 18.2.2: (Supercritical case). Let 1 < ρ < ∞. Let
(
'$
k
v
Z
i=1 i ni
v · Zn
=
.
Wn =
ρn
ρn
(2.1)
Then {Wn } is a nonnegative martingale and hence converges w.p. 1 to a
limit W and Zn /(v · Zn ) → u on the event of nonextinction.
This theorem says that when the process does not die out it does go to
∞, and the sizes of individuals of different types grow at the same rate
and are aligned in the direction of the vector u. There are also appropriate
analogs of Theorem 18.1.2 (b), (c), and (d).
n
If ζ is a k-vector such that ζ · v )= 0, then the sequence Yn ≡ ζ·Z
ρn conn
verges w.p. 1 to (ζ · u)W . If ζ · v = 0, then ρ is not the right normalization
for ζ · Zn . The problem of the correct normalization for such vectors as
well as the proofs of Theorems 18.2.2–18.2.4 are discussed in Chapter 5 of
Athreya and Ney (2004).
Theorem 18.2.3: (Critical case). Let Z0 = z0 )= 0 and ρ = 1. Let all the
offspring distributions possess finite second moments. Then
'v · Z
(
n
lim P
≤ x | Zn )= 0 = 1 − e−λx
(2.2)
n→∞
n
566
18. Branching Processes
for some 0 < λ < ∞ and
M
'M Z
(
M n
M
lim P M
− uM > # | Zn )= 0 = 0
n→∞
v · Zn
(2.3)
for each # > 0, where 7 · 7 is the Euclidean distance.
Theorem 18.2.4: (Subcritical case). Let ρ < 1 and Z0 = z0 )= 0. Then
lim P (Zn = j|Zn )= 0) = πj
n→∞
exists for all vectors j = (j1 , j2 , . . . , jk ) )= 0 and
$
(2.4)
πj = 1.
18.3 Continuous time branching processes
The models discussed in the past two sections deal with branching processes
in which individuals live for exactly one unit of time and are replaced by a
random number of offspring. A model in which individuals live a random
length of time and then produce a random number of offspring is discussed
below.
Assume that each individual lives a random length of time with distribution function G(·) and then produces a random number of offspring with
distribution {pj }j≥0 and these two random quantities are independent and
further independently distributed over all individuals in the population.
Let Z(t) denote the population size and Y (t) be the $
set of ages of all in∞
dividuals alive at time t. Assume G(0) = 0 and m ≡ j=1 jpj < ∞. The
offspring mean m continues to be the key parameter.
When G(·) is exponential, the process {Z(t) : t ≥ 0} is a continuous time
Markov chain. In the general case, the vector process {(Z(t), Y (t)) : t ≥ 0}
is a Markov process.
Theorem 18.3.1: (Extinction probability). Let q = P [Z(t) = 0 for some
t > 0] when Z(0) = 1 and Y (0) = {0}. Then m ≤ 1 ⇒ q = 1 and
m > 1 ⇒ q < 1 and P [Z(t) → ∞] = 1 − P [Z(t) = 0 for some t > 0].
The following refinements of the above Theorem 18.3.1 are analogs of
Theorems 18.1.2–18.1.4.
When m > 1, the effect of random lifetimes is expressed through the
Malthusian parameter α defined by
%
m
e−αt dG(t) = 1.
(3.1)
(0,∞)
The reproductive age value V (x) of an individual of age x is
'%
(
V (x) ≡ m
e−αt dG(t) [1 − G(x)]−1 .
[x,∞)
(3.2)
18.3 Continuous time branching processes
567
If m(t) ≡ EZ(t) when one starts with one individual of age 0, then m(·)
satisfies the integral equation
%
m(t − u)dG(u).
(3.3)
m(t) = 1 − G(t) + m
(0,t)
It can be shown using renewal theory (cf. Section 8.5) that m(t)e−αt converges to a finite positive constant (Problem 18.6).
Theorem 18.3.2: (Supercritical case). Let m > 1. Then
$Z(t)
(a) W (t) ≡ e−αt i=1 V (Xi ), where {x1 , . . . , xZ(t) } are the ages of the
individuals alive at t, is a nonnegative martingale and converges w.p.
1 to a limit W where V (·) is as in (3.2),
$∞
(b)
j=1 (j log j)pj = ∞ ⇒ W = 0 w.p. 1,
$∞
(c)
j=1 (j log j)pj < ∞ ⇒
P (W = 0 | Y0 = {0}) = q
and
E(W |Y0 = {x}) = V (x),
(d) on the event of nonextinction, w.p. 1 the empirical age distribution
at time t, A(x, t) ≡ number of {individuals alive at t with age ≤
x}/{number of individuals alive at t} converges in distribution as
t → ∞ to the steady-state age distribution:
& x −αy
e
[1 − G(y)]dy
(3.4)
A(x) ≡ & 0∞ −αy
e
[1 − G(y)]dy
0
(e)
$∞
−αt
converges w.p.
j=1 (j log j)pj < ∞ ⇒ Z(t)e
C−1
B&∞
W 0 V (x)dA(x) , where W is as in (a).
1 to
$
Theorem 18.3.3: (Critical case). Let m = 1 and σ 2 = j (j−1)2 pj < ∞.
&∞
Assume limt→∞ t2 (1 − G(t)) = 0 and 0 tdG(t) = µ. Then, for any initial
Z0 )= 0.
(a) limt P [Z(t)/t ≤ x|Z(t) > 0] = 1 − exp[(−2µ/σ 2 )x], 0 < x < ∞.
(b) limt P [supx |A(x, t) − A(x)| > #|Z(t) > 0] = 0 for any # > 0, where
A(·, t) and A(x) are as in Theorem 18.3.2 (d) with α = 0.
Theorem 18.3.4: (Subcritical case). Let m < 1. Then for any initial
Z0 )= 0, G(·) nonlattice (cf. Chapter 10),
lim P [Z(t) = j|Z(t) > 0] = πj
t→∞
exists for all j ≥ 1 and
$∞
j=1
πj = 1.
(3.5)
568
18. Branching Processes
18.4 Embedding of Urn schemes in continuous
time branching processes
It turns out that many urn schemes can be embedded in continuous time
branching processes. The case of Polyā’s urn is discussed below.
Recall that Polyā’s urn scheme is the following. Let an urn have an initial
composition of R0 red and B0 black balls. A draw consists of taking a ball
at random from the urn, noting its color, and returning it to the urn with
one more ball of the color drawn. Let (Rn , Bn ) denote the composition after
n draws. Clearly, Rn + Bn = R0 + B0 + n for all n ≥ 0 and {Rn , Bn }n≥0
is a Markov chain.
Let {Zi (t) : t ≥ 0}, i = 1, 2 be two independent continuous time
branching processes with unit exponential life times and offspring distribution of binary splitting, i.e., p2 = 1 and Z1 (0) = R0 , Z2 (0) = B0 .
Let τ0 = 0 < τ1 < τ2 < . . . < τn < . . . denote the successive times
of
in the combined population. Then the sequence
"
! death of an individual
Z1 (τn ), Z2 (τn ) n≥0 has the same distribution as (Rn , Bn )n≥0 .
"
!
To establish this claim, by the Markov property of Z1 (t), Z2 (t) t≥0 , it
"
!
suffices to show that Z1 (τ1 ), Z2 (τ1 ) has the same distribution as (R1 , B1 ).
It is easy to show that if ηi : i = 1, 2, . . . , n are independent exponential
random variables with parameters
$n λi , i = 1, 2, . . . , n then the η ≡ min{ηi :
1 ≤ i ≤ n} is an Exponential ( i=1 λi ) random variable and P (η = ηi ) =
" λi
( n λj ) (Problem 18.9). This, in turn, leads to the fact at time τ1 , the
j=1
Z1 (0)
probability that a split takes place in {Z1 (t) : t ≥ 0} is Z1 (0)+Z
. At this
2 (0)
split, the parent is lost but is replaced by two new individuals resulting in
a net addition of one more individual, establishing the claim.
The same reasoning yields the embedding of the following general urn
scheme. Let Xn = (Xn1 , . . . , Xnk ) be the vector of the composition of an
urn at time n where Xni is the number of balls of color i. Assume that
given (X0 , X1 , . . . , Xn ), Xn+1 is generated as follows.
Pick a ball at random from the urn. If it happens to be of color i, then
return it to the urn along with a random number ζij of balls of color
j = 1, 2, . . . , k where the joint distribution of ζi ≡ (ζi1 , ζi2 , . . . , ζik ) depends
on i, i = 1, 2, . . . , k. Now set, Xn+1 = Xn + ζi .
The embedding is done as follows. Consider a continuous time multitype
branching process {Z(t) : t ≥ 0} with Exponential (1) lifetimes and the
offspring distribution of the ith type is the same as that of ζ̃i ≡ ζi +δi where
ζi is as above and δi is ith the unit vector. Let for i = 1, 2, . . . , k, {Zi (t) :
t ≥ 0} be a branching process that evolves as {Z(t) : t ≥ 0} above but
has initial size Zi (0) ≡ (0, 0, . . . , X0i , 0, . . . , 0). Let 0 = τ0 < τ1 < τ2 < . . .
denote the times at which !deaths occur in the process
" obtained by pooling
all the k processes. Then Zi (τn ) : i = 1, 2, . . . , k , n = 0, 1, 2 . . . has the
same distribution as (Xni , i = 1, 2, . . . , k), n ≥ 0.
18.5 Problems
569
This embedding has been used to prove limit theorems for urn models.
See Athreya and Ney (2004), Chapter 5, for details. For applications to
clinical trials, see Rosenberger (2002).
18.5 Problems
$∞
18.1 Show that for any probability distribution {pj }j≥0 , f (s) = j=0 pj sj
is convex in [0,1]. Show also that there exists a q ∈ [0, 1) such that
f (q) = q iff m = f $ (1·) > 1.
$∞
18.2 Assume j=1 j 2 pj < ∞.
(a) Let vn = V (Zn |Z0 = 1). Show that vn+1 = V (Zn m|Z0 = 1) +
E(Zn σ 2 |Z0 = 1) where m = E(Z1 |Z0 = 1) and σ 2 = V (Z1 |Z0 =
1) and hence vn+1 = m2 vn + σ 2 mn .
(b) Conclude from (a) that supn EWn2 < ∞, where Wn = Zn /mn .
$ 2
(c) Using the fact {Wn }n≥0 is a martingale, show that if
j pj <
∞ then {Wn } converges w.p. 1 and in L2 to a random variable
W such that E(W |Z0 = 1) = 1.
18.3 By definition, the sequence {Zn }n≥0 of population sizes satisfies the
random iteration scheme
Zn+1 =
Zn
)
ζni
i=1
where {ζni , i = 1, 2, . . . , n = 1, 2, . . .} is a doubly infinite sequence of
iid random variable with distribution {pj }.
(a) (Independence of lines of descent). Establish the property that
for any k ≥ 0 if Z0 = k then {Zn }n≥0 has the same distribution
: $k
;
: (j)
(j) ;
as
where {Zn }n≥0 , j ≥ 1 are iid copies of
j=1 Zn
n≥0
{Zn }n≥0 with Z0 = 1.
(b) In the context of Theorem 18.1.2, show that if Z0 = 1 then
W ≡ lim Wn can be represented as
Z
W =
1
1 )
W (j)
m j=1
where Z1 , W (j) , j = 1, 2, . . . are all independent with Z1 having distribution {pj }j≥0 and {W (j) }j≥1 are iid with distribution
same as W .
570
18. Branching Processes
$
(c) Let α ≡ aj ∈D P (W = aj ) where D ≡ {aj } is the set of values
such that P (W = aj ) > 0. Show using (b) that α = f (α) and
conclude that if α < 1, then α = q and hence that if α < 1, then
the distribution of W conditional on W > 0 must be continuous.
(d) Let β be the singular component of the distribution of W in its
Lebesgue decomposition. Show using (b) that β satisfies β ≤
f (β) and hence that if β < 1, then β = P (W = 0) and the
distribution of W conditional on W > 0 must be absolutely
continuous.
(e) Let p0 = 0. Show that the distribution of W is of the pure type,
i.e., it is either purely discrete, purely singular continuous, or
purely absolutely continuous.
18.4 (a) Show using Problem 18.3 (b) that if W has a lattice distribution
with span d, then d must satisfy d = md and hence d = ∞.
Conclude that if P (W = 0) < 1, then the distribution of W on
{W > 0} must be nonlattice.
(b) Let p0 = 0 and P (W = 0) = 0. Use (a) to conclude
that the characteristic function φ(t) ≡ E(eιtW ) of W satisfies
sup1≤|t|≤m |φ(t)| < 1.
(c) Let p0 = 0. Show that for any 0 ≤ s0 < 1, # > 0, f (n) (s0 ) =
0(#n ).
<n−1
(Hint: By the mean value theorem, f (n) (s) = j=0 f $ (fj (s)).
Now use f $ (0) = p0 , fj (s) → 0 as j → ∞.)
(d) Let p0 = 0, P (W = 0) = 0.& Show that for any n ≥ 1,
∞
φ(mn t) = f (n) (φ(t)) and hence −∞ |φ(u)|du < ∞. Conclude
that the distribution of W is absolutely continuous.
18.5 In the multitype case for the martingale defined in (2.1), show that
2
< ∞ for all i, j where Ei denotes
{Wn }n≥0 is L2 bounded if Ei Z1j
expectation when one starts with an individual of type i.
18.6 Let m(·) satisfy the integral equation (3.3).
(a) Show that mα (t) ≡ m(t)e−αt satisfies the renewal equation
%
!
" −αt
mα (·) = 1 − G(t) e
+
mα (t − u)dGα (u)
(0,t]
where Gα (t) ≡ m
&t
0
e−αu dG(u), t ≥ 0.
(b) Use the key renewal theorem of Section 8.5 to conclude that
limt→∞ mα (t) exists and identify the limit.
18.5 Problems
571
$∞
(c) Assuming j=1 j 2 pj < ∞ show using the key renewal theorem
of Section 8.5 that {W (t) : t ≥ 0} of Theorem 18.3.2 is L2
bounded.
18.7 Consider an M/G/1 queue with Poisson arrivals and general service
time. Let Z1 be the number of customers that arrive during the service
time of the first customer. Call these first generation customers. Let
Z2 be the number of customers that arrive during the time it takes to
serve all the Z1 customers. Call these second generation customers.
For n ≥ 1, let Zn+1 denote the number of customers that arrive during
the time it takes to serve all Zn of the nth generation customers.
(a) Show that {Zn }n≥0 is a BGW branching process as in Section
18.1.
(b) Find the offspring distribution {pj }∞
0 and its mean m in terms
of the rate parameter λ of the Poisson arrival process and the
service time distribution G(·).
(c) Show that the queue size goes to ∞ with positive probability iff
m > 1.
(d) Set up a functional equation for the moment generating function
of the busy period U , i.e., the time interval between when the
first service starts and when the server is idle for the first time.
18.8 Let {ηi : i = 1, 2, . . . , n} be independent exponential random variables with Eηi = λ−1
i , i = 1, 2, . . . , n. Let η ≡ min{ηi :! 1 ≤ i ≤ "n}.
$n
−1
Show that η has an exponential
with Eη =
i=1 λi
! $n distribution
"
and that P (η = ηj ) = λj
i=1 λi .
18.9 Using the embedding outlined in Section 18.4 for the Polyā urn
n
scheme, show that Yn ≡ RnR+B
→ Y w.p. 1 and that Y can be
n
represented as Y =
"R0
Xi
"R0i=1
+B0
Xj
j=1
where {Xi }i≥1 are iid exponential (1)
random variables. Conclude that Y has Beta (R0 , B0 ) distribution.
Appendix A
Advanced Calculus: A Review
This Appendix is a brief review of elementary set theory, real numbers,
limits, sequences and series, continuity, differentiability, Riemann integration, complex numbers, exponential and trigonometric functions, and metric spaces. For proofs and further details, see Rudin (1976) and Royden
(1988).
A.1 Elementary set theory
This section reviews the following: sets, set operations, product sets (finite and infinite), equivalence relation, axiom of choice, countability, and
uncountability.
Definition A.1.1: A set is a collection of objects.
It is typically defined as a collection of objects with a common defining
property. For example, the collection of even integers can be written as
E ≡ {n : n is an even integer}. In general, a set Ω with defining property p
is written as
Ω = {ω : ω has property p}.
The individual elements are denoted by the small letters ω, a, x, s, t, etc.,
and the sets by capital letters Ω, A, X, S, T , etc.
Example A.1.1: The closed interval [0, 1] ≡ {x : x a real number, 0 ≤
x ≤ 1}.
574
Appendix A. Advanced Calculus: A Review
Example A.1.2: The set of positive rationals ≡ {x : x =
positive integers}.
m
n,
m and n
Example A.1.3: The set of polynomials in x of degree 10 ≡ {P (x) :
$10
P (x) = j=0 aj xj , aj real, j = 0, . . . , 10, a10 )= 0}.
Example
A.1.4: The set of all polynomials in x ≡ {P (x) : P (x) =
$n
j
a
x
,
n a nonnegative integer, aj real, j = 0, 1, 2, . . . , n}.
j=0 j
A.1.1
Set operations
Definition A.1.2: Let A be a set. A set B is called a subset of A and
written as B ⊂ A if every element of B is also an element of A. Two sets A
and B are the same and written as A = B if each is a subset of the other.
A subset A ⊂ Ω is called empty and denoted by ∅ if there exists no ω in Ω
such that ω ∈ A.
Using the mathematical notation ∈ and ⇒, one writes
B ⊂ A if x ∈ B ⇒ x ∈ A.
Here ‘∈’ means “belongs to” and ⇒ means “implies.”
Example A.1.5: Let N be the set of natural numbers, i.e., N =
{1, 2, 3, . . .}. Let E be the set of even natural numbers, i.e., E = {n : n ∈ N,
n = 2k for some k ∈ N}. Then E ⊂ N.
Example A.1.6: Let A = [0, 1] and B be the set of x in A such that
x2 < 14 . Then B = {x : 0 ≤ x < 12 } ⊂ A.
Definition A.1.3: (Intersection and union). Let A1 , A2 be subsets of a
set Ω. Then A1 union A2 , written as A1 ∪ A2 ,is the set defined by
A1 ∪ A2 = {ω : ω ∈ A1 or ω ∈ A2 or both}.
Similarly, A1 intersection A2 , written as A1 ∩ A2 , is the set defined by
A1 ∩ A2 = {ω : ω ∈ A1 and ω ∈ A2 }.
Example A.1.7: Let Ω ≡ N ≡ {1, 2, 3, . . .},
A1
A2
= {ω : ω = 3k for some k ∈ N}
and
= {ω : ω ≡ 5k for some k ∈ N}.
Then A1 ∪ A2 = {ω : ω is divisible by at least one of the two integers 3
and 5}, A1 ∩ A2 = {ω : ω is divisible by both 3 and 5}.
A.1 Elementary set theory
575
Definition A.1.4: Let Ω and I be nonempty sets. Let {Aα : α ∈ I} be a
collection of subsets of Ω. Then I is called the index set.
The union of {Aα : α ∈ I} is defined as
*
Aα ≡ {ω : ω ∈ Aα for some α ∈ I}.
α∈I
The intersection of {Aα : α ∈ I} is defined as
,
Aα ≡ {ω : ω ∈ Aα for every α ∈ I}.
α∈I
Definition A.1.5: (Complement of a set). Let A ⊂ Ω. Then the comple/ A}.
ment of the set A, written as Ac (or Ã), is defined by Ac ≡ {ω : ω ∈
Example A.1.8: If Ω = N and A is the set of all integers that are divisible
by 2, then Ac is the set of all odd integers, i.e., Ac = {1, 3, 5, 7, . . .}.
Proposition A.1.1: (DeMorgan’s law ). For any {Aα : a ∈ I} of subsets
of Ω, (∪α∈I Aα )c = ∩α∈I Acα , (∩α∈I Aα )c = ∪α∈I Acα .
Proof: To show that two sets A and B are the same, it suffices to show
that
ω ∈ A ⇒ ω ∈ B and ω ∈ B ⇒ ω ∈ A.
Let ω ∈ (∪α∈I Aα )c . Then ω ∈
/ ∪α∈I Aα
⇒ ω∈
/ Aα for any α ∈ I
⇒ ω ∈ Acα for each α ∈ I
+ c
⇒
ω∈
Aα .
α∈I
Thus (∪α∈I Aα )c ⊂ ∩α∈I Acα . The opposite inclusion and the second identity are similarly proved.
!
Definition A.1.6: (Product sets). Let Ω1 and Ω2 be two nonempty sets.
Then the product set of Ω1 and Ω2 , denoted by Ω ≡ Ω1 × Ω2 , consists of
all ordered pairs (ω1 , ω2 ) such that ω1 ∈ Ω1 , ω2 ∈ Ω2 .
Note that if Ω1 = Ω2 and ω1 )= ω2 , then the pair (ω1 , ω2 ) is not the same
as (ω2 , ω1 ), i.e., the order is important.
Example A.1.9: Ω1 = [0, 1], Ω2 = [2, 3]. Then Ω1 × Ω2 = {(x, y) : 0 ≤
x ≤ 1, 2 ≤ y ≤ 3}.
Definition A.1.7: (Finite products). If Ωi , i = 1, 2 . . . , k are nonempty
sets, then
Ω = Ω1 × Ω 2 × . . . × Ω k
576
Appendix A. Advanced Calculus: A Review
is the set of all ordered k vectors (ω1 , ω2 , . . . , ωk ) where ωi ∈ Ωi . If Ωi = Ω1
(k)
for all 1 ≤ i ≤ k, then Ω1 × Ω2 × . . . × Ωk is written as Ω1 or Ωk1 .
Definition A.1.8: (Infinite products). Let {Ωα : α ∈ I} be an infinite
collection of nonempty sets. Then ×α∈I Ωα , the product set, is defined as
{f : f is a function defined on I such that for each α, f (α) ∈ Ωα }. If
Ωα = Ω for all α ∈ I, then ×α∈I Ωα is also written as ΩI .
It is a basic axiom of set theory, known as the axiom of choice (A.C.), that
this space is nonempty. That is, given an arbitrary collection of nonempty
sets, it is possible to form a parliament with one representative from each
set.
For a long time it was thought this should follow from the other axioms
of set theory. But it is shown in Cohen (1966) that it is an independent
axiom. That is, both the A.C. and its negation are consistent with the rest
of the axioms of set theory.
There are several equivalent versions of A.C. These are Zorn’s lemma,
Hausdorff’s maximality principle, the ‘Principle of Well Ordering,’ and
Tukey’s lemma. For a proof of these equivalences, see Hewitt and Stromberg
(1965).
Definition A.1.9: (Functions, countability and uncountability). A function f is a correspondence between the elements of a set X and another
set Y and is written as f : X → Y . It satisfies the condition that for each
x, there is a unique y in Y that corresponds to it and is denoted as
y = f (x).
The set X is called the domain of f and the set f (X), defined as, f (X) ≡
{y : there exists x in X such that f (x) = y} is called the range of f . It is
possible that many x’s may correspond to the same y and also there may
exist y in Y for which there is no x such that f (x) = y. If f (X) is all of Y ,
then the map is called onto. If for each y in f (X), there is a unique x in X
such that f (x) = y, then f is called (1–1) or one-to-one. If f is one-to-one
and onto, then X and Y are said to have the same cardinality.
Definition A.1.10: Let f : X → Y be (1–1) and onto. Then, for each y in
Y , there is a unique element x in X such that f (x) = y. This x is denoted
as f −1 (y). Note that in this case, g(y) ≡ f −1 (y) is a (1–1) onto map from
Y to X and is called the inverse of f .
Example A.1.10: Let X = N ≡ {1, 2, 3, . . .}. Let Y = {n : n = 2k, k ∈ N}
be the set of even integers. Then the map f (x) = 2x is a (1–1) onto map
from X to Y .
Example A.1.11: Let X be N and let P be the set of all prime numbers.
Then X and P have the same cardinality.
A.1 Elementary set theory
577
Definition A.1.11: A set X is finite if there exists n ∈ N such that X
and Y ≡ {1, 2, . . . , n} have the same cardinality, i.e., there exists a (1–1)
onto map from Y to X. A set X is countable if X and N have the same
cardinality, i.e., there exists a (1–1) onto map from N to X. A set X is
uncountable if it is not finite or countable.
Example A.1.12: The set {0, 1, 2, . . . , 9} is finite, the set Nk (k ∈ N) is
countable and NN is uncountable (Problem A.6).
Definition A.1.12: Let Ω be a nonempty set. Then the power set of Ω,
denoted by P(Ω), is the collection of all subsets of Ω, i.e.,
P(Ω) ≡ {A : A ⊂ Ω}.
Remark A.1.1: P(N) is an uncountable set (Problem A.5).
A.1.2
The principle of induction
The set N of natural numbers has the well ordering property that every
nonempty subset A of N has a smallest element s such that (i) s ∈ A
and (ii) a ∈ A ⇒ a ≥ s. This property is one of the basic postulates in
the definition of N. The principle of induction is a consequence of the well
ordering property. It says the following:
Let {P (n) : n ∈ N} be a collection of propositions (or statements). Suppose that
(i) P (1) is true.
(ii) For each n ∈ N, P (n) true ⇒ P (n + 1) true.
Then, P (n) is true for all n ∈ N.
See Problem A.9 for some examples.
A.1.3
Equivalence relations
Definition A.1.13:
(a) Let Ω be a nonempty set. Let G be a nonempty subset of Ω × Ω.
Write x ∼ y if (x, y) ∈ G and call it a relation defined by G.
(b) A relation defined by G is an equivalence relation if
(i) (reflexive) for all x in Ω, x ∼ x, i.e., (x, x) ∈ G;
(ii) (symmetric) x ∼ y ⇒ y ∼ x, i.e., (x, y) ∈ G ⇒ (y, x) ∈ G;
(iii) (transitive) x ∼ y, y ∼ z ⇒ x ∼ z, i.e., (x, y) ∈ G, (y, z) ∈ G ⇒
(x, z) ∈ G.
578
Appendix A. Advanced Calculus: A Review
Example A.1.13: Let Ω = Z, the set of all integers, G = {(m, n) : m − n
is divisible by 3}. Thus, m ∼ n if (m − n) is a multiple of 3. It is easy to
verify that this is an equivalence relation.
Definition A.1.14: (Equivalence classes). Let Ω be a nonempty set. Let
G define an equivalence relation on Ω. For each x in Ω, the set [x] ≡ {y :
x ∼ y} is called the equivalence class generated by x.
Proposition A.1.2: Let C be the set of all equivalence classes in Ω generated by an equivalence relation defined by G. Then
(i) C1 , C2 ∈ C ⇒ C1 = C2 or C1 ∩ C2 = ∅.
#
(ii)
C = Ω.
C∈C
Proof:
(i) Suppose C1 ∩ C2 )= ∅. Then there exist x1 , x2 , y such that C1 = [x1 ],
C2 = [x2 ] and y ∈ C1 ∩ C2 . This implies x1 ∼ y, x2 ∼ y. But by
symmetry y ∼ x2 and this implies by transitivity that x1 ∼ x2 , i.e.,
x2 ∈ C1 implying C2 ⊂ C1 . Similarly, C1 ⊂ C2 , i.e., C1 = C2 .
(ii) For each x in Ω, (x, x) ∈ G and so [x] is not empty and x ∈ [x].
!
The above proposition says that every equivalence relation on Ω leads to
a decomposition of Ω into equivalence classes that are disjoint and whose
union is all of Ω. In the example given above, the set Z of all integers can
be decomposed to three equivalence classes Cj ≡ {n : n = 3m + j for some
m ∈ Z}, j = 0, 1, 2.
A.2 Real numbers, continuity, differentiability, and
integration
A.2.1
Real numbers
This section reviews the following: integers, rationals, real numbers; algebraic, order, and completeness axioms; Archimedean property, denseness
of rationals.
There are at least two approaches to defining the real number system.
Approach 1. Start with the natural numbers N, construct the set Z of all
integers (N ∪ {0} ∪ (−N)), and next, the set Q of rationals and then the set
R of real numbers either as the set of all Cauchy sequences of rationals or
as Dedekind cuts. The step going from Q to R via Cauchy sequences is also
available for completing any incomplete metric space (see Section A.4).
A.2 Real numbers, continuity, differentiability, and integration
579
Approach 2. Define the set of real numbers R as a set that satisfies three
sets of axioms. The first set is algebraic involving addition and multiplication. The second set is on ordering that, with the first, makes R an ordered
field (see Royden (1988) for a definition). The third set is a single axiom
known as the completeness axiom. Thus R is defined as a complete ordered
field.
The algebraic axioms say that there are two binary operations known
as addition (+) and multiplication (·) that render R a field. See Royden
(1988) for the nine axioms for this set.
The order axiom says that there is a set P ⊂ R, to be called positive
numbers such that
(i) x, y ∈ P ⇒ x · y ∈ P, x + y ∈ P
(ii) x ∈ P ⇒ −x ∈
/P
(iii) x ∈ R ⇒ x = 0 or x ∈ P or −x ∈ P.
The set Q of rational numbers is an ordered field (i.e., it satisfies the
algebraic and order axioms). But Q does not satisfy the completeness axiom
(see below).
Given P, one can define an order on R by defining x < y (read x less
than y) to mean y − x ∈ P. Since for all x, y in R, (x − y) is either 0 or
(x − y) ∈ P or (y − x) ∈ P, it follows that for all x, y in R, either x = y or
x < y or x > y. This is called total or linear order.
Definition A.2.1: (Upper and lower bounds).
(a) Let A ⊂ R. A real number M is an upper bound for A if a ∈ A ⇒
a ≤ M and m is a lower bound for A if a ∈ A ⇒ a ≥ m.
(b) The supremum of a set A, denoted by sup A or the least upper bound
(l.u.b.) of A, is defined by the following conditions:
(i) x ∈ A ⇒ x ≤ sup A,
(ii) K < sup A ⇒ there exists x ∈ A such that K < x.
The completeness axiom says that if A ⊂ R has an upper bound M in
R, then there exists a M̃ in R such that M̃ = sup A.
That is, every set A that is bounded above in R has a l.u.b. in R. The
ordered field of rationals Q does not possess this property. One well-known
example is the set
A = {r : r ∈ Q, r2 < 2}.
Then A is bounded above in Q but has no l.u.b. in Q (Problem A.11).
Next some consequences of the completeness axiom are discussed.
Proposition A.2.1: (Axiom of Eudoxus and Archimedes (AOE)). For all
x in R, there exists a natural number n such that n > x.
580
Appendix A. Advanced Calculus: A Review
Proof: If x ≤ 1, take n = 2. If x > 1, let Sx ≡ {k : k ∈ N, k ≤ x}. Then Sx
is not empty and is bounded above. By the completeness axiom, there is a
real number y that is the l.u.b. of Sx . Thus y − 12 is not an upper bound
for Sx and so there exists k0 ∈ Sx such that y − 12 < k0 . This implies that
/ Sx . By the linear order
(k0 + 1) > y − 12 + 1 = y + 12 > y and so (k0 + 1) ∈
!
in R, (k0 + 1) > x and so (k0 + 1) is the desired integer.
Corollary A.2.2: For any x, y ∈ R with x < y, there is a r in Q such
that x < r < y.
Proof: Let z = (y − x)−1 . Then there is an integer k such that 0 < z < k
(by AOE.) Again by AOE, there is a positive integer n such that n > yk.
Let S = {n : n ∈ N, n > yk}. Since S )= ∅, it has a smallest element (by the
p
well ordering property of N) say, p. Then p − 1 < yk < p, i.e., p−1
k < y < k.
p
p−1
1
1
Since k < z = (y − x) and k > y, it follows that k > x. Now take
r = p−1
!
k .
Remark A.2.1: This property is often stated as: The set Q of rationals
is dense in the set R of real numbers.
Definition A.2.2: The set R̄ of extended real numbers is the set consisting
of R and two elements identified as +∞ (plus infinity) and −∞ (negative
infinity) with the following definition of addition (+) and multiplication
(·). For any x in R, x + ∞ = ∞, x − ∞ = −∞, x · ∞ = ∞ if x > 0,
x · ∞ = −∞ if x < 0, 0 · ∞ = 0, ∞ + ∞ = ∞, −∞ − ∞ = −∞,
∞ · (±∞) = ±∞, (−∞) · (±∞) = ∓∞. But ∞ − ∞ is not defined. The
order property on R̄ is defined by extending that on R with the additional
condition x ∈ R ⇒ −∞ < x < +∞. Finally, if A ⊂ R does not have an
upper bound in R, then sup A is defined as +∞ and if A ⊂ R does not
have a lower bound in R, then inf A is defined as −∞.
A.2.2
Sequences, series, limits, limsup, liminf
Definition A.2.3: Let {xn }n≥1 be a sequence of real numbers.
(i) For a real number a, lim xn = a if for every # > 0, there exists a
n→∞
positive integer N$ such that n ≥ N$ ⇒ |xn − a| < #.
(ii) lim xn = ∞ if for any K in R, there exists an integer NK such that
n→∞
n ≥ NK ⇒ xn > K.
(iii) lim xn = −∞ if lim (−xn ) = ∞.
n→∞
n→∞
(iv) lim sup xn ≡ lim xn = inf (sup xj ).
n→∞
n→∞
n≥1 j≥n
(v) lim inf xn ≡ lim xn = sup( inf xj ).
n→∞
n→∞
n≥1 j≥n
A.2 Real numbers, continuity, differentiability, and integration
581
Definition A.2.4: (Cauchy sequences). A sequence {xn }n≥1 ⊂ R is called
a Cauchy sequence if for every ε > 0, there is Nε such that n, m ≥ Nε ⇒
|xm − xn | < ε.
Proposition A.2.3:
If {xn }n≥1 ⊂ R is convergent in R (i.e.,
limn→∞ xn = a exists in R), then {xn }n≥1 is Cauchy. Conversely, if
{xn }n≥1 ⊂ R is Cauchy, then there exists an a ∈ R such that limn→∞ xn =
a.
The proof is based on the use of the l.u.b. axiom (Problem A.14).
For n ≥ 1,
Definition
$n A.2.5: Let {xn }n≥1 be a sequence of real numbers.
$∞
sn ≡ j=1 xj is called the nth partial sum of the series j=1 xj . The series
$∞
to converge to s in R if limn→∞ sn = s. If limn→∞ sn = ±∞,
j=1 xj is said $
∞
then the series j=1 xj is said to diverge to ±∞.
Note that if xj ≥ 0 for all j, then either limn→∞ sn = s ∈ R, or
limn→∞ sn = ∞.
Example A.2.1: (Geometric series). Fix 0 < r < 1. Let xn = rn , n ≥ 0.
$∞
n+1
1
and j=1 rj converges to s = 1−r
.
Then sn = 1 + r + . . . + rn = 1−r
1−r
$∞
Example A.2.2: Consider the series j=1 j1p , 0 < p < ∞. It can be shown
that this converges for p > 1 and diverges to ∞ for 0 < p ≤ 1.
$∞
Definition A.2.6: The series j=1 xj converges absolutely if the series
$∞
j=1 |xj | converges in R.
$∞
There exist series j=1 xj that converge but not absolutely. For example,
$∞ (−1)j
j=1
j . For further material on convergence properties of series, such
as tests for convergence, rates of convergence, etc., see Rudin (1976).
Definition A.2.7: (Power series).
$∞ Letn{an }n≥0 be a sequence of real
numbers.
For
x
∈
R,
the
series
n=0 an x is called a power series.
$∞ If the
$∞
series n=0 an xn converges for all x in B ⊂ R, the power series n=0 an xn
is said to be convergent on B.
Proposition A.2.4: Let {an }n≥0 be a sequence of real numbers. Let ρ =
1
(lim supn→∞ |an | n )−1 . Then
(i) |x| < ρ ⇒
(ii) |x| > ρ ⇒
$∞
n=0
$∞
n=0
|an xn | converges.
|an xn | diverges to +∞.
Proof of this is left as an exercise (Problem A.15).
1
Definition A.2.8: ρ ≡ (lim$
supn→∞ |an | n )−1 is called the radius of con∞
vergence of the power series n=0 an xn .
582
A.2.3
Appendix A. Advanced Calculus: A Review
Continuity and differentiability
Definition A.2.9: Let f : A → R, A ⊂ R. Then
(a) f is continuous at x0 in A if for every # > 0, there exists a δ > 0 such
that x ∈ A, |x − x0 | < δ, implies |f (x) − f (x0 )| < #. (Here, δ may
depend on # and x0 .)
(b) f is continuous on B ⊂ A if it is continuous at every x0 in B.
(c) f is uniformly continuous on B ⊂ A if for every # > 0, there exists a
δ$ > 0 such that sup{|f (x) − f (y)| : x, y ∈ B, |x − y| < δ$ } < #.
Some properties of continuous functions are listed below.
Proposition A.2.5:
(i) (Sums, products, and ratios of continuous functions). Let f , g : A →
R, A ⊂ R. Let f and g be continuous on B ⊂ A. Then
(a) f + g, f − g, α · f for any α ∈ R are all continuous on B.
(b) f (x)/g(x) is continuous at x0 in B, provided g(x0 ) )= 0.
(ii) (Continuous functions on a closed bounded interval). Let f be continuous on a closed and bounded interval [a, b]. Then
(a) f is bounded, i.e., sup{|f (x)| : a ≤ x ≤ b} < ∞,
(b) it achieves its maximum and minimum, i.e., there exist x0 , y0
in [a, b] such that f (x0 ) ≥ f (x) ≥ f (y0 ) for all x in [a, b] and f
attains all values in [f (y0 ), f (x0 )], i.e., for all 2 ∈ [f (y0 ), f (x0 )],
there exists z ∈ [a, b] such that f (z) = 2. Thus, f maps bounded
closed intervals onto bounded closed intervals.
(c) f is uniformly continuous on [a, b].
(iii) (Composition of functions). Let f : A → R, g : B → R be continuous
on A and B, respectively. Let f (A) ⊂ B, i.e., for any x in A, f (x) ∈
B. Let h(x) = g(f (x)) for x in A. Then h : A → R is continuous.
(iv) (Uniform limits of continuous functions). Let {fn }n≥1 , be a sequence
of functions continuous on A to R, A ⊂ R. If sup{|fn (x) − f (x)| :
x ∈ A} → 0 as n → ∞ for some f : A → R, i.e., fn converges to f
uniformly on A, then f is continuous on A.
Remark A.2.2: The function f (x) ≡ x is clearly continuous on R. Now by
Proposition A.2.5 (i) and (iv), it follows that all polynomials are continuous
on R, and hence, so are their uniform limits. Weierstrass’ approximation
theorem is a sort of converse to this. That is, every continuous function on
a closed and bounded interval is the uniform limit of polynomials. More
precisely, one has the following:
A.2 Real numbers, continuity, differentiability, and integration
583
Theorem A.2.6: Let f : [a,$
b] → R be continuous. Then for any # > 0
n
there is a polynomial p(x) = 0 aj xj , aj ∈ R, j = 0, 1, 2, . . . , n such that
sup{|f (x) − p(x)| : x ∈ [a, b]} < #.
$∞
It should be noted that a power series A(x) ≡ 0 an xn is the uniform
"−1
!
limits of polynomials on [−λ, λ] for any 0 < λ < ρ ≡ limn→∞ |an |1/n
and hence is continuous on (−ρ, ρ).
Definition A.2.10: Let f : (a, b) → R, (a, b) ⊂ R. The function f is said
to be differentiable at x0 ∈ (a, b) if
lim
h→0
f (x0 + h) − f (x0 )
≡ f $ (x0 )
h
exists in
R.
A function is differentiable in (a, b) if it is differentiable at each x in (a, b).
Some important consequences of differentiability are listed below.
Proposition A.2.7: Let f , g : (a, b) → R, (a, b) ⊂ R. Then
(i) f differentiable at x0 in (a, b) implies f is continuous at x0 .
(ii) (Mean value theorem). f differentiable on (a, b), f continuous on
[a, b] implies that for some a < c < b, f (b) − f (a) = (b − a)f $ (c).
(iii) (Maxima and minima). f differentiable at x0 and for some δ > 0,
f (x) ≤ f (x0 ) for all x ∈ (x0 − δ, x0 + δ) implies that f $ (x0 ) = 0.
(iv) (Sums, products and ratios). f , g differentiable at x0 implies that for
any α, β in R, (αf + βg), f − g are differentiable at x0 with
(αf + βg)$ (x0 )
(f g)$ (x0 )
=
αf $ (x0 ) + βg $ (x0 ),
=
f $ (x0 )g(x0 ) + f (x0 )g $ (x0 ),
and if g $ (x0 ) )= 0, then f /g is differentiable at x0 with
(f /g)$ (x0 ) =
f $ (x0 )g(x0 ) − f (x0 )g $ (x0 )
.
(g(x0 ))2
(v) (Chain rule). If f is differentiable at x0 and g is differentiable at
f (x0 ), then h(x) ≡ g(f (x)) is differentiable at x0 with h$ (x0 ) =
g $ (f (x0 ))f $ (x0 ).
$∞
(vi) (Differentiability of power series). Let A(x) ≡ n=0 an xn be a power
"−1
!
> 0. Then
series with radius of convergence ρ ≡ limn→∞ |an |1/n
A(·) is differentiable infinitely many times on (−ρ, ρ) and for x in
(−ρ, ρ),
∞
)
dk A(x)
=
n(n − 1) · · · (n − k + 1)xn−k , k ≥ 1.
dxk
n=k
584
Appendix A. Advanced Calculus: A Review
Remark A.2.3: It should be noted that the converse to (a) in the above
proposition does not hold. For example, the function f (x) = |x| is continuous at x0 = 0 but is not differentiable at x0 . Indeed, Weierstrass showed
that there exists a function f : [0, 1] → R such that it is continuous on [0, 1]
but is not differentiable at any x in (0, 1).
Also note that the mean value theorem implies that if f $ (·) ≥ 0 on (a, b),
then f is nondecreasing on (a, b).
Definition A.2.11: (Taylor series). Let f be a map from I ≡ (a−η, a+η)
to R for some a ∈ R, η > 0. Suppose f is n times differentiable in I, for
(n)
$∞
n
each n ≥ 1. Let an = f n!(a) . Then power series
=
n=0 an (x − a)
$∞ f (n) (a)
n
n=0
n! (x − a) is called the Taylor series of f at a.
Remark A.2.4: Let f be as in Definition A.2.11. Taylor’s remainder
theorem says that for any x in I and any n ≥ 1, if f is (n + 1) times
differentiable in I, then
J
J
n
)
J |f (n+1) (yn )|
J
jJ
Jf (x) −
a
x
j
J ≤ (n + 1)!
J
j=0
for some yn in I.
Thus, if for some # > 0,
sup |f (k) (y)| ≡ λk satisfies
|y−a|<ε
λk
k!
→ 0 as k → ∞,
then the Taylor series satisfies
J
J n
J
J)
f (j) (a)
− f (x)JJ → 0
(x − a)j
sup JJ
j!
as
|x−a|<ε j=0
A.2.4
n → ∞.
Riemann integration
Let f : [a, b] → R, [a, b] ⊂ R. Let sup{|f (x)| : a ≤ x ≤ b} < ∞. For any
partition P ≡ {a = x0 < x1 < x2 < xk = b}, let
U (P, f )
L(P, f )
≡
≡
n
)
i=1
n
)
Mi (P )∆i (P )
and
mi (P )∆i (P )
i=1
where
Mi (P )
=
mi (P )
=
∆i (P )
=
sup{f (x) : xi ≤ x ≤ xi },
inf{f (x) : xi−1 ≤ x ≤ xi },
(xi − xi−1 )
A.2 Real numbers, continuity, differentiability, and integration
585
for i = 1, 2, . . . , k.
Definition A.2.12: The upper and lower Riemann integrals of f over
&b
&b
[a, b], denoted by a f (x)dx and f (x)dx, respectively defined as
a
%
%
b
a
b
f (x)dx
≡ inf{U (P, f ) : P a partition of [a, b]}
f (x)dx
≡ sup{L(P, f ) : P a partition of [a, b]}.
a
Definition A.2.13: Let f : [a, b] → R, [a, b] ⊂ R and let sup{|f (x)| : a ≤
x ≤ b} < ∞. The function f is Riemann integrable on [a, b] if
%
b
f (x)dx =
a
and the common value is denoted as
%
&b
a
b
f (x)dx
a
f (x)dx.
The following are some important results on Riemann integration.
Proposition A.2.8:
(i) Let f : [a, b] → R, [a, b] ⊂ R be continuous. Then f is Riemann
&b
$kn
integrable and a f (x)dx = limn→∞ k1n i=0
f (xni )∆ni where {Pn ≡
{xni : 1 ≤ i ≤ kn }} is a sequence of partitions of [a, b] such that
∆n ≡ {max(xni − xn,i−1 ) : 1 ≤ i ≤ kn } → 0 as n → ∞.
(ii) If f , g are Riemann integrable on [a, b], then αf + βg, α, β ∈ R and
f g are Riemann integrable on [a, b] and
% b
% b
% b
(αf + βg)(x)dx = α
f (x)dx + β
g(x)dx.
a
a
a
(iii) Let f be Riemann integrable on [a, b] and [b, c], −∞ < a < b < c < ∞.
Then f is Riemann integrable on [a, c] and
% b
% c
% c
f (x)dx =
f (x)dx +
f (x)dx.
a
a
b
(iv) (Fundamental theorem of Riemann integration: Part I ). Let f be
Riemann integrable on [a, b]. Then
& x it is Riemann integrable on [a, c]
for all a ≤ c ≤ b. Let F (x) ≡ a f (u)du, a ≤ x ≤ b. Then
(a) F (·) is continuous on [a, b].
(b) If f (·) is continuous at x0 ∈ (a, b), then F (·) is differentiable at
x0 and F $ (x0 ) = f (x0 ).
586
Appendix A. Advanced Calculus: A Review
(v) (Fundamental theorem: Part II ). If F : [a, b] → R is differentiable
$
on (a, b), f : [a, b] → R is continuous
& c on [a, b] and F (x) = f (x) for
all a < x < b, then F (x) = F (a) + a f (x)dx, a ≤ c ≤ b.
(vi) (Integration by parts). If f and g are continuous and differentiable
on (a, b) with f $ and g $ continuous on (a, b), then
%
d
f (x)g $ (x)dx +
c
%
d
f $ (x)g(x)dx
c
= f (d)g(d) − f (c)g(c)
for all a < c < d < b.
(vii) (Interchange of limits and integration). Let fn , f be continuous on
[a, b] to R. Let fn converge to f uniformly on [a, b]. Then
% c
% c
fn (x)dx → F (c) ≡
f (x)dx
Fn (c) ≡
a
a
uniformly on [a, b].
A.3 Complex numbers, exponential and
trigonometric functions
Definition A.3.1: The set C of complex numbers is defined as the set
R × R of all ordered pairs of real numbers endowed with addition and
multiplication as follows:
(a1 , b1 ) + (a2 , b2 )
=
(a1 + a2 , b1 + b2 )
(a1 , b1 ) · (a2 , b2 )
=
(a1 a2 − b1 b2 , a1 b2 + a2 b1 ).
(3.1)
It can be verified that C satisfies the field axioms Royden (1988). By
defining ι to be the element (0,1), one can write the elements of C in the
form (a, b) = a + ιb and do addition and multiplication with the rule of
replacing ι2 by −1. Clearly, the set R of real numbers can be identified with
the set {(a, 0) : a ∈ R}. The set {(0, b) : b ∈ R} is called the set of purely
imaginary numbers.
Definition A.3.2: For any complex number z = (a, b) in C,
Re(z), called the real part of z is a,
Im(z), called the imaginary part of z is b,
z̄, called the complex conjugate of z is z̄ = a − ιb
√
|z|, called the absolute value of z is |z| = a2 + b2 .
(3.2)
A.3 Complex numbers, exponential and trigonometric functions
587
Clearly, z z̄ = z̄z = |z|2 and any z )= 0 can be written as z = |z|ω
where |ω| = 1. A set A ⊂ C is bounded if sup{|z| : z ∈ A} < ∞. If
d(z1 , z2 ) ≡ |z1 − z2 |, then (C, d) is a complete separable metric space (see
Section A.4). Clearly, zn = (an , bn ) converges to z = (a, b) in this metric
iff Re(zn ) = an → Re(z) = a and Im(zn ) = bn → Im(z) = b.
Definition A.3.3: The exponential function is a map from C to C defined
by
∞
)
zn
(3.3)
exp(z) ≡
n!
n=0
where the right side is defined as the limit of the partial sum sequence
n
$m
$∞
n
{ n=0 zn! }m≥0 , which exists since n=0 |z|
n! < ∞ for each z ∈ C. In fact,
the convergence of the partial sum sequence is uniform on every bounded
subset A of C. This in turn implies that the exponential function is continuous on C.
Notation: exp(z) will also be written as ez .
$∞ 1
will be called e. It is not
Definition A.3.4: The number e1 ≡ n=0 n!
difficult to show that e = limn→∞ (1 + n1 )n . By definition of ez , e0 = 1.
Definition A.3.5: The cosine and sine functions from R → R are defined
by
cos t
≡ Re(eιt ) =
sin t ≡
Im(eιt ) =
∞
)
(−1)k
k=0
∞
)
(−1)k
k=0
t2k
(2k)!
t2k+1
.
(2k + 1)!
Thus, one gets Euler’s formula eιt = cos t + ι sin t, for all t ∈ R.
The following are some important and useful properties of the exponential and the cosine and sine (also called trigonometric) functions.
Theorem A.3.1:
(i) For all z1 , z2 in C
ez1 ez2 = ez1 +z2 and ez )= 0 f or any z.
(ii) ez is differentiable for all z (i.e., (ez )$ ≡ limh→0
(ez )$ = ez .
ez+h −ez
h
exists) and
(iii) The function ex : R → R+ is strictly increasing with ex ↑ ∞ as x ↑ ∞
and ↓ 0 as x ↓ −∞.
(iv) For t in R, (cos t)2 + (sin t)2 = 1, (cos t)$ = − sin t, (sin t)$ = cos t.
588
Appendix A. Advanced Calculus: A Review
(v) There is a smallest positive number, called π, such that eιπ/2 = ι.
(vi) ez is a periodic function such that ez = ez+ι2π .
Proof:
(i) Since
$∞
|z|n
n=0 n!
converges for all z ∈ C,
ez1 · ez2
=
=
=
lim
m→∞
.- )
.
m
z1n
z2n
n!
n!
n=0
n=0
r
s
) z z
1 2
r!s!
-)
m
lim
m→∞
0≤r,s≤m
2m
k
)
1 )
k!
z r z k−r
m→∞
k! r=0 r!(k − r)! 1 2
lim
k=0
=
lim
m→∞
2m
)
(z1 + z2 )k
= ez1 +z2 .
k!
k=0
Since ez · e−z = e0 = 1, ez )= 0 for any z ∈ C.
(ii) Fix z ∈ C. For any h ∈ C, h )= 0, by (i),
eh − 1
ez+h − ez
= ez
.
h
h
But
J
J eh − 1
J
J
− 1J
J
h
Thus
lim
h→0
and
lim
h→0
≤
-)
∞
≤
|h|
k=2
∞
)
k=0
' eh − 1
h
' ez+h − ez (
h
|h|k
k!
1
k!
.
1
|h|
if |h| ≤ 1.
(
−1 =0
= ez ,
i.e. (ii) holds.
(iii) That the map t → et is strictly increasing on [0, ∞) and that et ↑ ∞
as t ↑ ∞ is clear from the definition of ez . Since e−t et = e0 = 1,
e−t = e1t for all t ∈ R, so that et is strictly increasing on all of R and
et ↓ 0 as t ↓ −∞.
A.3 Complex numbers, exponential and trigonometric functions
589
(iv) From the definition of ez , it follows that ez = ez̄ and, in particular,
for t real eιt = eιt = e−ιt and hence
|eιt |2 = eιt eιt = e−ιt+ιt = e0 = 1.
Thus, for all t real |eιt | = 1 and since eιt = cos t + ι sin t, it follows
that
|eιt |2 = (cos t)2 + (sin t)2 = 1.
Also from (ii), for t ∈ R
(eιt )$ = (cos t)$ + ι(sin t)$ = ιeιt = − sin t + ι cos t
yielding (cos t)$ = − sin t, (sin t)$ = cos t proving (iv).
(v) By definition
cos 2 =
∞
)
(−1)k
k=0
22k
(2k)!
22k
.
(2k)!
Now ak ≡
satisfies ak+1 < ak for k = 0, 1, 2, . . . and hence
$
k
k≥3 (−1) ak = −{(a3 − a4 ) + (a5 − a6 ) + · · ·} < 0. Thus,
cos 2 < 1 −
24
1
22
+
= − < 0.
2!
4!
3
Since cos t is a continuous function on R with cos 0 = 1 and cos 2 < 0,
there exists a smallest t0 > 0 such that cos t0 = 0, defined by t0 =
inf{t : t > 0, cos t = 0}. Set π = 2t0 . Since cos π2 = 0, (iv) implies
π
sin π2 = 1 and hence that eι 2 = ι.
π
(vi) Clearly, eι 2 = ι implies that eιπ = −1 and eι2π = 1 and eι2πk = 1
for all integers k. Since eι2π = 1, it follows that ez = ez+ι2π for all
z ∈ C,
!
It is now possible to prove various results involving π that one learns in
calculus from the above definition. For example,
& ∞ 1 that the arc length of the
unit circle {z : |z| = 1} is 2π and that −∞ 1+x
2 dx = π, etc. (Problems
A.19 and A.20).
The following assertions about ez can be proved with some more effort.
Theorem A.3.2:
(i) ez = 1 iff z = 2πιk for some integer k.
(ii) The map t → eιt from R is onto the unit circle.
(iii) For any ω ∈ C, ω )= 0 there is a z ∈ C such that ω = ez .
590
Appendix A. Advanced Calculus: A Review
For a proof of this theorem as well as more details on Theorem A.3.1,
see Rudin (1987).
Theorem A.3.3: (Orthogonality of {eι2πnt }n∈Z ).
& 1 ι2πnt
e
dt = 0 if n )= 0 and 1 if n = 0.
0
For any n ∈ Z,
Proof: Since (eιt )$ = ιeιt
(eι2πnt )$ = ι2πn eι2πnt , n ∈ Z
and so for n )= 0,
%
0
1
ι2πnt
e
dt
=
=
% 1
1
(eι2πnt )$ dt
ι2πn 0
1
(eι2πn − 1) = 0.
ι2πn
Corollary A.3.4: The family {cos 2πnt : n = 0, 1, 2, . . .} ∪ {sin 2πnt : n =
1, 2, . . .} are orthogonal in L2 [0, 1] (Problem A.22), i.e., for any two f , g
&1
in this family, 0 f (x)g(x)dx = 0 for f )= g.
A.4 Metric spaces
A.4.1
Basic definitions
This section reviews the following: metric spaces, Cauchy sequences, completeness, functions, continuity, compactness, convergence of sequences
functions, and uniform convergence.
Definition A.4.1: Let S be a nonempty set. Let d : S × S → R+ = [0, ∞)
be such that
(i) d(x, y) = d(y, x) for any x, y in S.
(ii) d(x, z) ≤ d(x, y) + d(y, z) for any x, y, z in S.
(iii) d(x, y) = 0 iff x = y.
Such a d is called a metric on S and the pair (S, d) a metric space. Property
(ii) is called the triangle inequality.
Example A.4.1: Let Rk ≡ {(x1 , . . . , xk ) : xi ∈ R, 1 ≤ i ≤ k} be the
k-dimensional Euclidean space. For 1 ≤ p < ∞ and x = (x1 , . . . , xk ), y =
(y1 , y2 , . . . , yk ) ∈ Rk , let
dp (x, y) =
-)
k
i=1
p
|xi − yi |
. p1
,
A.4 Metric spaces
591
and d∞ (x, y) = max{|xi − yi | : 1 ≤ i ≤ k}.
It can be shown that dp (·, ·) is a metric on R for all 1 ≤ p ≤ ∞ (Problem
A.24).
Definition A.4.2: A sequence {xn }n≥1 in a metric space (S, d) converges
to an x in S if for every ε > 0, there is a Nε such that n ≥ Nε ⇒ d(xn , x) < ε
and is written as limn→∞ xn = x.
Definition A.4.3: A sequence {xn }n≥1 in a metric space (S, d) is Cauchy
if for all ε > 0, there exists Nε such that n, m ≥ Nε ⇒ d(xn , xm ) < ε.
Definition A.4.4: A metric space (S, d) is complete if every Cauchy sequence {xn }n≥1 converges to some x in S.
Example A.4.2:
(a) Let S = Q, the set of rationals and d(x, y) ≡ |x − y|. Then (Q, d) is a
metric space that is not complete.
(b) Let S = R and d(x, y) = |x − y|. Then (R, d) is complete (cf. Proposition A.2.3).
(c) Let S = Rk . Then (Rk , dp ) is complete for every 1 ≤ p ≤ ∞, where
dp is as in Example A.4.1.
Remark A.4.1: (Completion of an incomplete metric space). Let (S, d)
be a metric space. Let S̃ be the set of all Cauchy sequences in S. Identify
each x in S with the Cauchy sequence {xn = x}n≥1 . Define a function from
S̃ × S̃ to R+ by
˜ n }n≥1 , {yn }n≥1 ) = lim sup d(xn , yn ).
d({x
n→∞
It is easy to verify that d˜ is symmetric and satisfies the triangle inequality.
Define s1 = {xn }n≥1 and s2 = {yn }n≥1 to be equivalent (write {xn } ∼
˜ 1 , s2 ) = 0. Let S̄ be the set of all equivalence classes in S̃ and
{yn }) if d(s
˜ 1 , s2 ), where c1 , c2 are equivalence classes and s1 , s2
¯ 1 , c2 ) ≡ d(s
define d(c
are arbitrary elements of c1 and c2 , respectively.
¯ is a complete metric space and (S, d)
It can now be verified that (S̄, d)
¯ by identifying each x in S with the equivalence class
is embedded in (S̄, d)
containing the sequence {xn = x}n≥1 .
Definition A.4.5: A metric space (S, d) is separable if there exists a subset
D ⊂ S that is countable and dense in S, i.e., for each x in S and ε > 0,
there is a y in D such that d(x, y) < ε.
Example A.4.3: By the Archimedean property, Q is dense in R. Similarly
Qk , the set of all k vectors with components from Q, is dense in Rk .
592
Appendix A. Advanced Calculus: A Review
Definition A.4.6: A metric space (S, d) is called Polish if it is complete
and separable.
Example A.4.4: (Rk , dp ) in Example A.4.2 is Polish.
A.4.2
Continuous functions
Let (S, d) and (T, ρ) be two metric spaces. Let f : S → T be a map from S
to T.
Definition A.4.7:
(a) f is continuous at p in S if for each ε > 0, there exists δ > 0 such
that d(x, p) < δ ⇒ ρ(f (x), f (p)) < ε. (Here the δ may depend on ε
and p.)
(b) f is continuous on a set B ⊂ S if it is continuous at every p ∈ B.
(c) f is uniformly continuous on B if for each ε > 0, there exists δ > 0
such that for each pair x, y in S, d(x, y) < δ ⇒ ρ(f (x), f (y)) < ε.
Definition A.4.8: Let (S, d) be a metric space.
(a) A set O ⊂ (S, d) is open if x ∈ O ⇒ there exists δ > 0 such that
d(x, y) < δ ⇒ y ∈ O. That is, at every point x in O, an open ball
Bx (δ) ≡ {y : d(x, y) < δ} of positive radius δ is a subset of O.
(b) A set C ⊂ (S, d) is closed if C c is open.
Theorem A.4.1: Let (S, d) and (T, ρ) be metric spaces. A map f : S → T
in continuous on S iff for each O open in T, f −1 (O) is open in S.
Proof is left as an exercise (Problem A.28).
A.4.3
Compactness
Definition A.4.9: A collection of open sets {Oα : α ∈ I} is an open cover
for a set B ⊂ (S, d) if for each x ∈ B, there exists α ∈ I such that x ∈ Oα .
Example A.4.5: Let B = (0, 1). Then the collection {(α − α2 , α + (1−α)
2 ):
α ∈ Q ∩ (0, 1)} is an open cover for B.
Definition A.4.10: Let (S, d) be a metric space. A set K ⊂ S is called
compact if given any open cover {Oα : α ∈ I} for K, there exists a finite
subcollection {Oαi : αi ∈ I, i = 1, 2, . . . , n, n < ∞} that is an open cover
for K.
Example A.4.6: The set B = (0, 1) is not compact as the open cover in
the above Example A.3.4 does not admit a finite subcover.
A.4 Metric spaces
593
The next result is the well-known Heine-Borel theorem.
Theorem A.4.2:
(i) For any −∞ < a < b < ∞, the closed interval [a, b] is compact in R.
(ii) Any K ⊂ R is compact iff it is bounded and closed.
For a proof, see Rudin (1976). From Proposition A.4.1, it is seen that
the inverse image of an open set under a continuous function is open but
the forward image may not have this property. But the following is true.
Theorem A.4.3: Let (S, d) and (T, ρ) be two metric spaces and let
f : (S, d) → (T, ρ) be continuous. Let K ⊂ S be compact. Then f (K)
is compact.
The proof is left as an exercise (Problem A.35).
A.4.4
Sequences of functions and uniform convergence
Definition A.4.11: Let (S, d) and (T, ρ) be two metric spaces and let
{fn }n≥1 be a sequence of functions from (S, d) to (T, ρ). The sequence
{fn }n≥1 is said to:
(a) converge pointwise to f on a set A ⊂ S if limn→∞ fn (x) = f (x) for
each x in A;
(b) converges uniformly to f on a set A ⊂ S if for each ε > 0, there exists
Nε > 0 (depending on ε and A) such that
!
"
n ≥ Nε ⇒ ρ fn (x), f (x) < ε for all x in A.
A consequence of uniform convergence is the preservation of the continuity property.
Theorem A.4.4: Let (S, d) and (T, ρ) be two metric spaces and let
{fn }n≥1 be a sequence of functions from (S, d) to (T, ρ). Let for each n ≥ 1,
fn be continuous on A ⊂ S. Let {fn }n≥1 converge to f uniformly on A.
Then f is continuous on A.
Proof: The proof is based on the “break up into three parts” idea. By the
triangle inequality,
"
!
"
!
"
!
"
!
ρ f (x), f (y) ≤ ρ f (x), fn (x) + ρ fn (x), fn (y) + ρ fn (y), f (y) .
"
!
Fix x in A. By the uniform convergence on A, sup{ρ fn (u), f (u) : u ∈
A} → 0!as n → ∞. "So for each ε > 0, there exists Nε < ∞ such that n ≥
Nε ⇒ ρ fn (u), f (u) < 3ε for all u in A. Now since fNε (·) is continuous on
A, there
! exists a δ > 0"(depending on Nε and x), such that d(x,
! y) < δ, "y ∈
A ⇒ ρ fNε (y), fNε (x) < 3ε . Thus, y ∈ A, d(x, y) < δ ⇒ ρ f (x), f (y) <
2ε
ε
!
3 + 3 = ε.
594
Appendix A. Advanced Calculus: A Review
A.5 Problems
A.1 Express the following sets in the form {x : x has property p}.
(a) The set A of all integers which when divided by 7 leave a remainder ≤ 3.
(b) The set B of all functions form [0, 1] to R with at most two
discontinuity points.
(c) The set C of all students at a given university who are graduate
students with at least one course in mathematics at the graduate
level.
(d) The set D of all algebraic numbers. (A number x is called an
algebraic number, if it is the root of a polynomial with rational
coefficients.)
(e) The set E of all possible sequences whose elements are either 0
or 1.
A.2 Give an example of sets A1 , A2 such that A1 ∩ A2 )= A1 ∪ A2 .
A.3 Let I = [0, 1], Ω = R and for α ∈ R, Aα = (α − 1, α + 1), the open
interval {x : α − 1 < x < α + 1}.
(a) Show that ∪α∈I Aα = (−1, 2) and ∩α∈I Aα = (0, 1).
(b) Suppose J = {x : x ∈ I, x is rational}. Find ∪x∈J Ax and
∩x∈J Ax .
A.4 With Ω ≡ N ≡ {1, 2, 3, . . .}, find Ac in the following cases:
(a) A = {ω : ω is divisible by 2 or 3 or both}. If ω ∈ Ac , what can
be said about its prime factors?
(b) A = {ω : ω is divisible by 15 and 16}.
(c) A = {ω : ω is a perfect square}.
A.5 Show that X ≡ {0, 1}N , the set of all sequences {ωi }i∈N where each
ωi ∈ {0, 1}, is uncountable. Conclude that P(N) is uncountable.
A.6 Show that if Ωi is countable for each i ∈ N, then for each k ∈ N,
×ki=1 Ωi is countable and ∪i∈N Ωi is also countable but ×i∈N Ωi is not
countable.
A.7 Show that the set of all polynomials in x with integer coefficients is
countable.
A.8 Show that the well ordering property implies the principle of induction.
A.9 Apply the principle of induction to establish the following:
A.5 Problems
(a) For each n ∈ N,
n
$
j2 =
j=1
595
n(n+1)(2n+1)
.
6
(b) For each n ∈ N, x1 , x2 , . . . , xk ∈ R,
(i) (The binomial formula). (x1 + x2 )n =
n ! "
$
n
r=0
(ii) (The multinomial formula).
(x1 + x2 + . . . + xk )n =
)
r
xr1 xn−r
.
2
n!
xr1 xr2 . . . xrkk ,
r1 !r2 ! . . . rk ! 1 2
where the summation extends over all (r1 , r2 , . . . , rk ) such
$k
that ri ∈ N, 0 ≤ ri ≤ n, r=1 ri = n.
A.10 Verify that on R, the relation x ∼ y if x−y is rational is an equivalence
relation but the relation x ∼ y if x − y is irrational is not.
A.11 Show that the set A = {r : r ∈ Q, r2 < 2} is bounded above in Q but
has no l.u.b. in Q.
A.12 Show that for any two sequences {xn }n≥1 , {yn }n≥1 ⊂ R,
lim xn + lim yn ≤ lim (xn + yn ) ≤ lim (xn + yn )
n→∞
n→∞
n→∞
n→∞
≤ lim xn + lim yn .
n→∞
n→∞
A.13 Verify that lim xn = a ∈ R iff lim xn = lim xn = a.
n→∞
n→∞
n→∞
A.14 Establish Proposition A.2.3.
(Hint: First show that a Cauchy sequence is bounded and then show
that lim xn = lim xn .)
n→∞
n→∞
A.15 (a) Prove Proposition A.2.4 by comparison with the geometric series.
$∞
(b) Show that for integer k ≥ 1, the power series n=k n(n
$∞− 1)(n −
k + 1)an xn−k has the same radius of convergence as n=0 an xn .
$∞
A.16 Show that the series j=2 j(log1 j)p converges for p > 1 and diverges
for p ≤ 1.
A.17 $
Find the radius of convergence, ρ, for the powers series A(x) ≡
∞
n
n=0 an x where
(a) an =
n
(n+1) ,
n ≥ 0.
(b) an = np , n ≥ 0, p ∈ R.
596
Appendix A. Advanced Calculus: A Review
(c) an =
1
n! ,
n ≥ 0 (where 0! = 1).
A.18 (a) Find the Taylor series at a = 0 for the function f (x) =
I ≡ (−1, +1) and show that it converges to f (x) on I.
1
1−x
in
(b) Find the Taylor series of 1 + x + x2 in I = (1, 3), centered at 2.
(c) Let
f (x) =
/
1
e− x2
0
if |x| < 1, x )= 0
if x = 0 .
(i) Show that f is infinitely differentiable at 0 and compute
f (j) (0) for all j ≥ 1.
(ii) Show that the Taylor series at a = 0 converges but not to f
on (−1, 1).
A.19 Let S = {z : z ∈ C, |z| = 1} be the unit circle. Using the parameterization t → eιt = (cos t + ι sin t) from [0, 2π] to S, show that the arc
length of S (i.e., the circumference of the limit circle) is 2π.
sin t
π
π
$
2
A.20 Set φ(t) = cos
t for − 2 < t < 2 . Verify that φ = 1 + φ and that
π π
φ : (− 2 , 2 ) to (−∞, ∞) is strictly monotone increasing and onto.
Conclude that
% π/2
% ∞
1
φ$ (t)
dx
=
dt = π.
2
2
−∞ 1 + x
−π/2 1 + (φ(t))
π
A.21 Using the property that eι 2 = ι verify that for all t in R
π
π
− t) = sin t, sin( − t) = cos t,
2
2
cos(π + t) = − cos t, sin(π + t) = − sin t,
cos(
cos(2π + t) = cos t, sin(2π + t) = sin t.
Also show that cos t is a strictly decreasing map from [0, π] onto
[−1, 1] and that sin t is a strictly increasing map from [− π2 , π2 ] onto
[−1, 1].
A.22 Using (i) of Theorem A.3.1, express cos(t1 + t2 ), sin(t1 + t2 ) in terms
cos ti , sin ti , i = 1, 2 and in turn use this to prove Corollary A.3.4
from Theorem A.3.3.
A.23 Verify that pn (z) ≡ (1 + nz )n converges to ez uniformly on bounded
sets in C.
A.24 (a) Verify that for p = 1, p = 2 and p = ∞, dp is a metric on Rk .
(b) Show that for fixed x and y, ϕ(p) ≡ dp (x, y) is continuous in p
on [1, ∞].
A.5 Problems
597
(c) Draw the open unit ball Bp ≡ {x : x ∈ R2 , dp (x, 0) < 1} in R2
for p = 1, 2 and ∞.
A.25 Let S = C[0, 1] be the set of all real valued continuous functions on
[0, 1]. Now let
% 1
d1 (f, g) =
|f (x) − g(x)|dx, (area metric)
0
d2 (f, g)
=
d∞ (f, g)
=
-%
0
1
2
|f (x) − g(x)| dx
. 12
,
(least square metric)
sup{|f (x) − g(x)| : 0 ≤ x ≤ 1} (sup metric).
Show that all these are metrics on S.
A.26 Let S = R∞ ≡ {{xn }n≥1 : xn ∈ R for all n ≥ 1} be the
space of all sequences of real numbers. Let d({xn }n≥1 , {yn }n≥1 ) =
$∞
|xj −yj |
1
j=1 ( 1+|xj −yj | ) 2j . Show that (S, d) is a Polish space.
A.27 If sk = {xkn }n≥1 and s = {xn }n≥1 , are elements of S = R∞ as in
Problem A.26, verify that as k → ∞, sk → s iff xkn → xn for all
n ≥ 1.
A.28 Establish Theorem A.4.1.
!&1
"1
A.29 Let S = C[0, 1] and dp (f, g) ≡ 0 |f (t) − g(t)|p dt p for 1 ≤ p < ∞
and d∞ (f, g) = sup{|f (t) − g(t)| : t ∈ [0, 1]}.
(a) Let f (x) ≡ 1. Let fn (t) ≡ 1 for 0 ≤ t ≤ 1− n1 , and fn (t) = n(1−t)
for 1 − n1 ≤ t ≤ 1. Show that dp (fn , f ) → 0 for 1 ≤ p < ∞ but
d∞ (fn , f ) )→ 0.
(b) Fix f ∈ C[0, 1]. Let gn (t) = f (t), 0 ≤ t ≤ 1 − n1 , and gn (t) =
f (1 − n1 ) + (f (1) − f (1 − n1 ))n(t + n1 − 1), 1 − n1 ≤ t ≤ 1.
Show that dp (gn , f ) → 0 for all 1 ≤ p ≤ ∞.
A.30 Show that if {xn }n≥1 is a convergent sequence in a metric space (S, d),
then it is Cauchy.
A.31 Verify (b) of Example A.4.2 from the axioms of real numbers (cf.
Proposition A.2.3). Verify (c) of the same example from (b).
A.32 Let S = C[0, 1] and d be the supremum metric, i.e.,
d(f, g) = sup{|f (x) − g(x)| : 0 ≤ x ≤ 1}.
By approximating any continuous function with piecewise linear functions with rational end points and rational values, show that (S, d) is
Polish, i.e., it is complete and separable.
598
Appendix A. Advanced Calculus: A Review
A.33 Show that the function f (x) = x2 is continuous on R, uniformly so
on any bounded set B ⊂ R but not uniformly on R.
A.34 Show that unions of open sets are open and intersection of any two
open sets is open. Give an example to show that the intersection of
an infinite number of open sets need not be open.
A.35 Prove Theorem A.4.3.
A.36 Let fn (x) = xn and g(x) ≡ 0 on R. Then {fn }n≥1 converges pointwise
to g on (−1, 1), uniformly on [a, b] for −1 < a < b < 1, but not
uniformly on (0, 1).
A.37 Let {fn }n≥1 , f ∈ C[0, 1]. Let {fn }n≥1 converge to f uniformly on
&1
&1
[0, 1]. Show that lim 0 |fn (x) − f (x)|dx = 0 and lim 0 fn (x) =
n→∞
n→∞
&1
f (x)dx.
0
A.38 Give a proof of Proposition A.2.6 (vi) (term by term differentiability of a power series) using Proposition A.2.7 (iv) (the fundamental
theorem of Riemann integration).
Appendix B
List of Abbreviations and Symbols
B.1 Abbreviations
a.c.
a.e.
AR(1)
a.s.
BCT
absolutely continuous (functions)
almost everywhere
autoregressive process of order one
almost sure(ly)
bounded convergence theorem
BGW
cdf
CE
CLT
CTMC
Biengeme-Galton-Watson
cumulative distribution function
conditional expectation
central limit theorem
continuous time Markov chain
DCT
EDCT
fdds
iff
IFS
dominated convergence theorem
extended dominated convergence theorem
finite dimensional distributions
if and only if
iterated function system
iid
IIIDRM
i.o.
LIL
LLN
independent and identically distributed
iterations of iid random maps
infinitely often
law of the iterated logarithm
laws of large numbers
600
Appendix B. List of Abbreviations and Symbols
MBB
m.c.f.a.
m.c.f.b.
MCMC
MCT
moving block bootstrap
monotone continuity from above
monotone continuity from below
Markov chain Monte Carlo
monotone convergence theorem
o.n.b.
r.c.p.
SBM
SLLN
s.o.c.
orthonormal basis
regular conditional probability
standard Brownian motion
strong law of large numbers
second order correctness
SSRW
UI
WLLN
w.p. 1
w.r.t.
simple symmetric random walk
uniform integrability
weak law of large numbers
with probability one
with respect to
w.l.o.g.
without loss of generality
B.2 Symbols
;
−→d
−→p
(·) ∗ (·)
(·)∗
µ ; ν: absolute continuity of a measure
convergence in distribution
convergence in probability
convolution of measures, functions, etc.
extension of a measure
a∼b
an ∼ bn
<a=
a and b are equivalent (under an equivalence relation)
an
bn → 1 as n → ∞
the integer part of a, i.e., <a= = k if k ≤ a < k + 1,
k ∈ Z, a ∈ R
the smallest integer not less than a, i.e., DaE = k + 1 if
k < a ≤ k + 1, k ∈ Z, a ∈ R
closure of A
DaE
Ā
Ac
∂A
A+B
B(S)
B(S, R)
B(x, #), Bx (#)
complement of a set A
boundary of A
symmetric difference of two sets A and B,
i.e., A + B = (A ∩ B c ) ∪ (Ac ∩ B)
Borel σ-algebra on a metric space S such as S = R, Rk ,
R∞
≡ {f | f : S → R, F-measurable, sup{|f (s)| : s ∈ S} ≤ 1}
open ball of radius # with center at x in a metric space
(S, d), i.e., {y : d(x, y) < #}
B.2 Symbols
C
C[a, b]
CB (R)
Cc (R)
C0 (R)
C0 (S)
C(F ), CF
δij
δx
dµ
dν
E(Y |G)
H⊥
ι
IA (·)
II k
λ1A2
Lp (Ω, F, µ)
Lp (R)
m
µF
µ⊥ν
N
∅
Φ(·)
P (A|G)
Pλ (·)
P(Ω)
Px (·)
(Ω, F, P )
(Ω, F, µ)
Q
R
R+
R̄
R̄+
601
the set of all complex numbers
= {f | f : [a, b] → R, f continuous}
≡ {f | f : R → R, f bounded and continuous}
≡ {f | f : R → R, continuous and f ≡ 0 outside a
bounded interval}
≡ {f | f : R → R, continuous and lim|x|→∞ f (x) = 0}
= {f | f : S → R, f continuous and for every # > 0, there
exists a compact set K$ such that |f (x)| < # for x )∈ K$ }
the set of all continuity points of a cdf F
Kronecker delta, i.e., δij = 1 if i = j and = 0 if i )= j
the probability distribution putting mass one at x
Radon-Nikodym derivative of µ w.r.t. ν
conditional expectation of Y given G
orthogonal complement of a subspace H of a Hilbert
√ space
−1
the indicator function of a set A
the identity matrix of order k
λ-class generated by a class of sets
& A
= {f |f : Ω → F, F-measurable, |f |p dµ < ∞}, with
F = R or C (F = C in Sections 5.6, 5.7 only)
= Lp (R, B(R), m)
the Lebesgue measure
Lebesgue-Stieltjes measure corresponding to F
singularity of measures µ and ν
the set of natural numbers
the null set
standard normal cdf, i.e., Φ(x) ≡
−∞ < x < ∞
probability of A given G
√1
2π
&x
−∞
e−u
2
/2
du,
probability distribution of a Markov chain with initial
distribution λ
the power set of Ω = {A : A ⊂ Ω}
same as Pλ with λ = δx
generic probability space
generic measure space
the set of
the set of
the set of
the set of
= [0, ∞]
all rationals
real numbers, (−∞, ∞)
nonnegative real numbers, [0, ∞)
all extended real numbers, [−∞, ∞]
602
Appendix B. List of Abbreviations and Symbols
(S, d)
σ1A2
σ1{fa : a ∈ A}2
T
|z|
Re(z)
Im(z)
Z
Z+
a metric space S with a metric d
σ-algebra generated by a class of sets A
σ-algebra generated by a collection of mappings
{fa : a ∈ A}
tail√σ-algebra
= a2 + b2 , the absolute value of a complex number
z = a + ιb, a, b ∈ R
= a, the real part of a complex number z = a + ιb, a,
b∈R
= b, the imaginary part of a complex number
z = a + ιb, a, b ∈ R
the set of all integers = {0, ±1, ±2, . . .}
the set of all nonnegative integers = {0, 1, 2, . . .}
References
Arcones, M. A. and Giné, E. (1989), ‘The bootstrap of the mean with arbitrary bootstrap sample size’, Ann. Inst. H. Poincaré Probab. Statist.
25(4), 457–481.
Arcones, M. A. and Giné, E. (1991), ‘Additions and correction to: “The
bootstrap of the mean with arbitrary bootstrap sample size” [Ann.
Inst. H. Poincaré Probab. Statist. 25(4) (1989), 457–481]’, Ann. Inst.
H. Poincaré Probab. Statist. 27(4), 583–595.
Athreya, K. B. (1986), ‘Darling and Kac revisited’, Sankhyā A 48(3), 255–
266.
Athreya, K. B. (1987a), ‘Bootstrap of the mean in the infinite variance
case’, Ann. Statist. 15(2), 724–731.
Athreya, K. B. (1987b), Bootstrap of the mean in the infinite variance case,
in ‘Proceedings of the 1st World Congress of the Bernoulli Society’,
Vol. 2, VNU Sci. Press, Utrecht, pp. 95–98.
Athreya, K. B. (2000), ‘Change of measures for Markov chains and the
l log l theorem for branching processes’, Bernoulli 6, 323–338.
Athreya, K. B. (2004), ‘Stationary measures for some Markov chain models
in ecology and economics’, Econom. Theory 23(1), 107–122.
Athreya, K. B., Doss, H. and Sethuraman, J. (1996), ‘On the convergence
of the Markov chain simulation method’, Ann. Statist. 24(1), 69–100.
604
References
Athreya, K. B. and Jagers, P., eds (1997), Classical and Modern Branching Processes, Vol. 84 of The IMA Volumes in Mathematics and its
Applications, Springer-Verlag, New York.
Athreya, K. B. and Ney, P. (1978), ‘A new approach to the limit theory of
recurrent Markov chains’, Trans. Amer. Math. Soc. 245, 493–501.
Athreya, K. B. and Ney, P. E. (2004), Branching Processes, Dover Publications, Inc, Mineola, NY. (Reprint of Band 196, Grundlehren der
Mathematischen Wissenschaften, Springer-Verlag, Berlin).
Athreya, K. B. and Pantula, S. G. (1986), ‘Mixing properties of Harris
chains and autoregressive processes’, J. Appl. Probab. 23(4), 880–892.
Athreya, K. B. and Stenflo, O. (2003), ‘Perfect sampling for Doeblin chains’,
Sankhyā A 65(4), 763–777.
Bahadur, R. R. (1966), ‘A note on quantiles in large samples’, Ann. Math.
Statist. 37, 577–580.
Barnsley, M. F. (1992), Fractals Everywhere, 2nd edn, Academic Press,
New York.
Berbee, H. C. P. (1979), Random Walks with Stationary Increments and
Renewal Theory, Mathematical Centre, Amsterdam.
Berry, A. C. (1941), ‘The accuracy of the Gaussian approximation to the
sum of independent variates’, Trans. Amer. Math. Soc. 48, 122–136.
Bhatia, R. (2003), Fourier Series, 2nd edn, Hindustan Book Agency, New
Delhi, India.
Bhattacharya, R. N. and Rao, R. R. (1986), Normal Approximation and
Asymptotic Expansions, Robert E. Krieger, Melbourne, FL.
Billingsley, P. (1968), Convergence of Probability Measures, John Wiley,
New York.
Billingsley, P. (1995), Probability and Measure, 3rd edn, John Wiley, New
York.
Bradley, R. C. (1983), ‘Approximation theorems for strongly mixing random variables’, Michigan Math. J. 30(1), 69–81.
Brillinger, D. R. (1975), Time Series. Data Analysis and Theory, Holt,
Rinehart and Winston, Inc, New York.
Carlstein, E. (1986), ‘The use of subseries values for estimating the variance of a general statistic from a stationary sequence’, Ann. Statist.
14(3), 1171–1179.
References
605
Chanda, K. C. (1974), ‘Strong mixing properties of linear stochastic processes’, J. Appl. Probab. 11, 401–408.
Chow, Y.-S. and Teicher, H. (1997), Probability Theory: Independence, Interchangeability, Martingales, Springer-Verlag, New York.
Chung, K. L. (1967), Markov Chains with Stationary Transition Probabilities, 2nd edn, Springer-Verlag, New York.
Chung, K. L. (1974), A Course in Probability Theory, 2nd edn, Academic
Press, New York.
Cohen, P. (1966), Set Theory and the Continuum Hypothesis, Benjamin,
New York.
Doob, J. L. (1953), Stochastic Processes, John Wiley, New York.
Doukhan, P., Massart, P. and Rio, E. (1994), ‘The functional central limit
theorem for strongly mixing processes’, Ann. Inst. H. Poincaré Probab.
Statist. 30, 63–82.
Durrett, R. (2001), Essentials of Stochastic Processes, Springer-Verlag, New
York.
Durrett, R. (2004), Probability: Theory and Examples, 3rd edn, Duxbury
Press, San Jose, CA.
Efron, B. (1979), ‘Bootstrap methods: Another look at the jackknife’, Ann.
Statist. 7(1), 1–26.
Esseen, C.-G. (1942), ‘Rate of convergence in the central limit theorem’,
Ark. Mat. Astr. Fys. 28A(9).
Esseen, C.-G. (1945), ‘Fourier analysis of distribution functions. a mathematical study of the Laplace-Gaussian law’, Acta Math. 77, 1–125.
Etemadi, N. (1981), ‘An elementary proof of the strong law of large numbers’, Z. Wahrsch. Verw. Gebiete 55(1), 119–122.
Feller, W. (1966), An Introduction to Probability Theory and Its Applications, Vol. II, John Wiley, New York.
Feller, W. (1968), An Introduction to Probability Theory and Its Applications, Vol. I, 3rd edn, John Wiley, New York.
Geman, S. and Geman, D. (1984), ‘Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images’, IEEE Trans. Pattern
Analysis Mach. Intell. 6, 721–741.
Giné, E. and Zinn, J. (1989), ‘Necessary conditions for the bootstrap of the
mean’, Ann. Statist. 17(2), 684–691.
606
References
Gnedenko, B. V. and Kolmogorov, A. N. (1968), Limit Distributions
for Sums of Independent Random Variables, Revised edn, AddisonWesley, Reading, MA.
Gorodetskii, V. V. (1977), ‘On the strong mixing property for linear sequences’, Theory Probab. 22, 411–413.
Götze, F. and Hipp, C. (1978), ‘Asymptotic expansions in the central
limit theorem under moment conditions’, Z. Wahrsch. Verw. Gebiete
42, 67–87.
Götze, F. and Hipp, C. (1983), ‘Asymptotic expansions for sums of weakly
dependent random vectors’, Z. Wahrsch. Verw. Gebiete 64, 211–239.
Hall, P. (1985), ‘Resampling a coverage pattern’, Stochastic Process. Appl.
20, 231–246.
Hall, P. (1992), The Bootstrap and Edgeworth Expansion, Springer-Verlag,
New York.
Hall, P. G. and Heyde, C. C. (1980), Martingale Limit Theory and Its
Applications, Academic Press, New York.
Hall, P., Horowitz, J. L. and Jing, B.-Y. (1995), ‘On blocking rules for the
bootstrap with dependent data’, Biometrika 82, 561–574.
Herrndorf, N. (1983), ‘Stationary strongly mixing sequences not satisfying
the central limit theorem’, Ann. Probab. 11, 809–813.
Hewitt, E. and Stromberg, K. (1965), Real and Abstract Analysis, SpringerVerlag, New York.
Hoel, P. G., Port, S. C. and Stone, C. J. (1972), Introduction to Stochastic
Processes, Houghton-Mifflin, Boston, MA.
Ibragimov, I. A. and Rozanov, Y. A. (1978), Gaussian Random Processes,
Springer-Verlag, Berlin.
Karatzas, I. and Shreve, S. E. (1991), Brownian Motion and Stochastic
Calculus, 2nd edn, Springer-Verlag, New York.
Karlin, S. and Taylor, H. M. (1975), A First Course in Stochastic Processes,
Academic Press, New York.
Kifer, Y. (1988), Random Perturbations of Dynamical Systems, Birkhäuser,
Boston, MA.
Kolmogorov, A. N. (1956), Foundations of the Theory of Probability, 2nd
edn, Chelsea, New York.
References
607
Körner, T. W. (1989), Fourier Analysis, Cambridge University Press, New
York.
Künsch, H. R. (1989), ‘The jackknife and the bootstrap for general stationary observations’, Ann. Statist. 17, 1217–1261.
Lahiri, S. N. (1991), ‘Second order optimality of stationary bootstrap’,
Statist. Probab. Lett. 11, 335–341.
Lahiri, S. N. (1992), ‘Edgeworth expansions for m-estimators of a regression
parameter’, J. Multivariate Analysis 43, 125–132.
Lahiri, S. N. (1994), ‘Rates of bootstrap approximation for the mean of
lattice variables’, Sankhyā A 56, 77–89.
Lahiri, S. N. (1996), ‘Asymptotic expansions for sums of random vectors
under polynomial mixing rates’, Sankhyā A 58, 206–225.
Lahiri, S. N. (2001), ‘Effects of block lengths on the validity of block resampling methods’, Probab. Theory Related Fields 121, 73–97.
Lahiri, S. N. (2003), Resampling Methods for Dependent Data, SpringerVerlag, New York.
Lehmann, E. L. and Casella, G. (1998), Theory of Point Estimation,
Springer-Verlag, New York.
Lindvall, T. (1992), Lectures on Coupling Theory, John Wiley, New York.
Liu, R. Y. and Singh, K. (1992), Moving blocks jackknife and bootstrap
capture weak dependence, in R. Lepage and L. Billard, eds, ‘Exploring
the Limits of the Bootstrap’, John Wiley, New York, pp. 225–248.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and
Teller, E. (1953), ‘Equations of state calculations by fast computing
machines’, J. Chem. Physics 21, 1087–1092.
Meyn, S. P. and Tweedie, R. L. (1993), Markov Chains and Stochastic
Stability, Springer-Verlag, New York.
Munkres, J. R. (1975), Topology, A First Course, Prentice Hall, Englewood
Cliffs, NJ.
Nummelin, E. (1978), ‘A splitting technique for Harris recurrent Markov
chains’, Z. Wahrsch. Verw. Gebiete 43(4), 309–318.
Nummelin, E. (1984), General Irreducible Markov Chains and Nonnegative
Operators, Cambridge University Press, Cambridge.
Orey, S. (1971), Limit Theorems for Markov Chain Transition Probabilities,
Van Nostrand Reinhold, London.
608
References
Parthasarathy, K. R. (1967), Probability Measures on Metric Spaces, Academic Press, San Diego, CA.
Parthasarathy, K. R. (2005), Introduction to Probability and Measure,
Vol. 33 of Texts and Readings in Mathematics, Hindustan Book
Agency, New Delhi, India.
Peligrad, M. (1982), ‘Invariance principles for mixing sequences of random
variables’, Ann. Probab. 10(4), 968–981.
Petrov, V. V. (1975), Sums of Independent Random Variables, SpringerVerlag, New York.
Reiss, R.-D. (1974), ‘On the accuracy of the normal approximation for
quantiles’, Ann. Probab. 2, 741–744.
Robert, C. P. and Casella, G. (1999), Monte Carlo Statistical Methods,
Springer-Verlag, New York.
Rosenberger, W. F. (2002), ‘Urn models and sequential design’, Sequential
Anal. 21(1–2), 1–41.
Royden, H. L. (1988), Real Analysis, 3rd edn, Macmillan Publishing Co.,
New York.
Rudin, W. (1976), Principles of Mathematical Analysis, International Series
in Pure and Applied Mathematics, 3rd edn, McGraw-Hill Book Co.,
New York.
Rudin, W. (1987), Real and Complex Analysis, 3rd edn, McGraw-Hill Book
Co., New York.
Shohat, J. A. and Tamarkin, J. D. (1943), The problem of moments,
in ‘American Mathematical Society Mathematical Surveys’, Vol. II,
American Mathematical Society, New York.
Singh, K. (1981), ‘On the asymptotic accuracy of Efron’s bootstrap’, Ann.
Statist. 9, 1187–1195.
Strassen, V. (1964), ‘An invariance principle for the law of the iterated
logarithm’, Z. Wahrsch. Verw. Gebiete 3, 211–226.
Stroock, D. W. and Varadhan, S. (1979), Multidimensional Diffusion Processes, Band 233, Grundlehren der Mathematischen Wissenschaften,
Springer-Verlag, Berlin.
Szego, G. (1939), Orthogonal Polynomials, Vol. 23 of American Mathematical Society Colloquium Publications, American Mathematical Society,
Providence, RI.
References
609
Withers, C. S. (1981), ‘Conditions for linear processes to be strong-mixing’,
Z. Wahrsch. Verw. Gebiete 57, 477–480.
Woodroofe, M. (1982), Nonlinear Renewal Theory in Sequential Analysis,
SIAM, Philadelphia, PA.
Author Index
Arcones, M. A., 541
Athreya, K. B., 428, 460, 464,
472, 475, 499, 516, 541,
559, 561, 564, 565, 569
Doss, H., 472
Doukhan, P., 528
Durrett, R., 274, 278, 372, 393,
429, 493
Bahadur, R. R., 557
Barnsley, M. F., 460
Berbee, H.C.P., 517
Berry, A. C., 361
Bhatia, R. P., 167
Bhattacharya, R. N., 365, 368
Billingsley, P., 14, 211, 254, 301,
306, 373, 375, 475, 498
Bradley, R. C., 517
Brillinger, D. R., 547
Efron, B., 533, 545
Esseen, C. G., 361, 368
Etemadi, N., 244
Carlstein, E., 548
Casella, G., 391, 477, 480
Chanda, K. C., 516
Chow, Y. S., 364, 417, 430
Chung, K. L., 14, 250, 323, 359,
489
Cohen, P., 576
Doob, J. L., 211, 399
Feller, W., 166, 265, 308, 313,
323, 354, 357, 359, 362,
489, 499
Geman, D., 477
Geman, S., 477
Giné, E., 541
Gnedenko, B. V., 355, 359
Gorodetskii, V. V., 516
Götze, F., 554
Hall, P. G., 365, 510, 534, 548,
556
Herrndorf, N., 529
Hewitt, E., 30, 576
Heyde, C. C., 510
Hipp, C., 554
Author Index
Hoel, P. G., 455
Horowitz, J. L., 556
Ibragimov, I. A., 516
Jagers, P., 561
Jing, B. Y., 556
Karatzas, I., 494, 499, 504
Karlin, S., 489, 493, 504
Kifer, Y., 460
Kolmogorov, A. N., 170, 200, 225,
244, 355, 359
Körner, T. W., 167, 170
Künsch, H. R., 547
Lahiri, S. N., 533, 537, 549, 552,
556
Lehmann, E. L., 391
Lindvall, T., 265, 456
Liu, R. Y., 547
Massart, P., 528
Metropolis, N., 477
Meyn, S. P., 456
Munkres, J. R., 71
Ney, P., 464, 564, 565, 569
Nummelin, E., 463, 464
Orey, S., 464
Pantula, S. G., 516
Parthasarathy, K. R., 393
Peligrad, M., 529
Petrov, V. V., 365
Port, S. C., 455
Rao, R. Ranga, 365, 368
Reiss, R. D., 381
Rio, E., 528
Robert, C. P., 477, 480
Rosenberger, W. F., 569
Rosenbluth, A. W., 477
Rosenbluth, M. N., 477
Royden, H. L., 27, 62, 94, 97, 118,
128, 130, 156, 573, 579
611
Rozanov, Y. A., 516
Rudin, W., 27, 94, 97, 132, 181,
195, 581, 590, 593
Sethuraman, J., 472
Shreve, S. E., 494, 499, 504
Shohat, J. A., 308
Singh, K., 536, 537, 545, 547
Stenflo, O., 460
Stone, C. J., 455
Strassen, V., 279, 576
Stromberg, K., 30
Stroock, D. W., 504
Szego, G.,107
Tamarkin, 308
Taylor, H. M., 489, 493, 504
Teicher, H., 364, 417, 430
Teller, A. H., 477
Teller, E., 477
Tweedie, R. L., 456
Varadhan, S.R.S., 504
Withers, C. S., 516
Woodroofe, M., 407
Zinn, J., 541
Subject Index
λ-system, 13
π-λ theorem, 13
π-system, 13, 220
σ-algebra, 10, 44
Borel, 12, 299
product, 147, 157, 203
tail, 225
trace, 33
trivial, 10
Abel’s summation formula, 254
absolute continuity, 53, 113, 128,
319
absorption probability, 484
algebra, 9
analysis of variance formula, 391
arithmetic distribution (See distribution)
BCT (See theorem)
Bahadur representation, 557
Banach space, 96
Bernouilli shift, 272
Bernstein polynomial, 239
betting sequence, 408
binary expansion, 135
Black-Scholes formula, 504
block bootstrap method, 547
bootstrap method, 533
Borel-Cantelli lemma, 245
conditional, 427
first, 223
second, 223, 232
branching process, 402, 431, 561
Bienyeme-Galton-Watson,
562
critical, 562, 564, 565
subcritical, 562, 564, 566
supercritical, 562, 565
Brownian bridge, 375, 498
Brownian motion, 373
laws of large numbers, 499
nondifferentiability, 499
reflection, 495
reflection principle, 496
scaling properties, 495
standard, 493, 507
time inversion, 495
translation invariance, 495
Subject Index
cardinality, 576
Carleman’s condition, 308
Cauchy sequence, 91, 581
central limit theorem, 343, 510,
519
α-mixing, 521, 525, 527, 528
ρ-mixing, 529
functional, 373
iid, 345
Lindeberg, 345, 521, 535
Lyapounov’s, 348
martingale, 510
multivariate, 352
random sums, 378
sample quantiles, 379
change of variables, 81, 132, 141,
193
Chapman-Kolmogorov equation,
461, 488
characteristic function (See function)
complete
measure space, 160
metric space, 91, 96, 124, 237
orthogonal basis, 171
orthogonal set, 100
orthogonal system, 169
complex
conjugate, 322, 586
logarithm, 362
numbers, 586
conditional
expectation, 386
independence, 396
probability, 392
variance, 391
consistency, 281
convergence
almost everywhere, 61
almost surely, 238
in distribution, 287, 288, 299
in measure, 62
pointwise, 61
in probability, 237, 289
of moments, 306
613
radius of, 164, 581
of types, 355
weak, 288, 299
with probability one, 238
vague, 291, 299
convex
function, 84
convolution of
cdfs, 184
functions, 163
sequence, 162
signed measures, 161
correlation coefficient, 248
Cramer-Wold device, 336, 352
coupling, 455, 516
Cramer’s condition, 365
cumulative distribution function,
45, 46, 133, 191
joint, 221
marginal, 221
singular, 133
cycles, 445
DCT (See theorem)
de Morgan’s laws, 72, 575
Delta method, 310
detailed balance condition, 492
differentiation, 118, 130, 583
chain rule of, 583
directly Riemann integrable, 268
distribution
arithmetic, 264, 319
Cauchy, 353
compound Poisson, 359, 360
initial, 439, 456
nonarithmetic, 265
discrete univariate, 144
finite dimensional, 200
lattice, 264, 319, 364
nonlattice, 265, 364, 536
Pareto, 358
stationary, 440, 451, 469, 492
domain of attraction, 358, 541
Doob’s decomposition, 404
dual space, 94
614
Subject Index
EDCT (See theorem)
Edgeworth expansions, 364, 365,
536, 538, 554
empirical distribution function,
241, 375
equivalence relation, 577
ergodic, 273
Birkhoff’s (See theorem)
Kingman’s (See theorem)
(maximal) inequality, 274
sequence, 273
essential supremum, 89
exchangeable, 397
exit probability, 502
extinction probability, 491, 562,
565
Feller continuity, 473
filtration, 399
first passage time, 443, 462
Fourier
coefficients, 99, 166, 170
series, 187
transform (See transforms)
function
absolutely continuous, 129
Cantor, 114
Cantor ternary, 136
characteristic, 317, 332
continuous, 41, 299, 582, 592
differentiable, 320, 583
exponential, 587
generating, 164
Greens, 463
Haar, 109, 493
integrable, 54
regularly varying, 354
simple, 49
slowly varying, 354
transition, 458, 488
trigonometric, 587
Gaussian process, 209
Gibbs sampler, 480
growth rate, 503
Harris irreducible (See
irreducible)
Harris recurrence (See
recurrence)
heavy tails, 358, 540
Hilbert space, 98, 388
hitting time, 443, 462
independence, 208, 211, 219, 336
pairwise, 219, 224, 244, 247,
257
inequalities
Bessel’s, 99
Burkholder’s, 416
Cauchy-Schwarz, 88, 98, 198
Chebychev’s, 83, 196
Cramer’s, 84
Davydov’s, 518
Doob’s Lp -maximal, 413
Doob’s L log L maximal, 414
Doob’s maximal, 412
Doob’s upcrossing, 418
Hölder’s, 87, 88, 198
Jensen’s, 86, 197, 390
Kolmogorov’s first, 249
Kolmogorov’s second, 249
Levy’s, 250
Markov’s, 83, 196
Minkowski’s, 88, 198
Rio’s, 517
smoothing, 362
infinitely divisible distributions,
358, 360
inner-product, 98
integration
by parts formula, 155, 586
Lebesgue, 51, 54, 61
Riemann, 51, 59, 60, 61, 585
inversion formula, 175, 324, 325,
326, 338
irreducible, 273, 447
Harris, 462
isometry, 94
isomorphism, 100
iterated function system, 441, 460
Subject Index
Jordan decomposition, 123
Kolmogorov-Smirnov statistic,
375, 499
Lp -
norm, 89
space, 89
large deviation, 368
law of large numbers, 237, 448,
470
strong (See SLLN)
weak (See WLLN)
law of the iterated logarithm,
278, 538
lemma
C-set, 464
Fatou’s, 7, 54, 389
Kronecker’s, 255, 433
Riemann-Lebesgue, 171, 320
Wald’s, 415
liminf, 223, 580, 595
limits, 580
limsup, 222, 580, 595
Lindeberg condition, 344
linear operator, 96
Lyapounov’s condition, 347
MBB, 547
MCMC
MCT (See theorem)
m-dependence, 515, 545
Malthusian parameter, 566
map
co-ordinate, 203
projection, 203
Markov chain, 208, 439, 457
absorbing, 447
aperiodic, 454
communicating, 447
embedded, 488
periodic, 453
semi-, 468
solidarity, 447
Markov process
615
branching, 490
generator, 489
Markov property, 397, 488
strong, 444, 497
martingale, 399, 501, 509, 563
difference array, 509
Doob
Téléchargement