
Econometrics2011
.pdf
APPENDIX B. PROBABILITY |
|
263 |
If (B.4) does not hold, evaluate |
|
Zg(x)>0 g(x)f(x)dx |
I1 |
= |
|
I2 |
= |
Zg(x)<0 g(x)f(x)dx |
If I1 = 1 and I2 < 1 then we de…ne Eg(X) = 1: If I1 < 1 and I2 = 1 then we de…ne Eg(X) = 1: If both I1 = 1 and I2 = 1 then Eg(X) is unde…ned.
Since E(a + bX) = a + bEX; we say that expectation is a linear operator.
For m > 0; we de…ne the m0th moment of X as EXm and the m0th central moment as
E(X EX)m :
Two special moments are the mean = EX and variance 2 = E(X )2 = EX2 2: We p
call = 2 the standard deviation of X: We can also write 2 = var(X). For example, this allows the convenient expression var(a + bX) = b2 var(X):
The moment generating function (MGF) of X is
M( ) = Eexp ( X) :
The MGF does not necessarily exist. However, when it does and EjXjm < 1 then
|
dm |
|
|
d m M( ) =0 = E(Xm) |
|||
|
|
|
|
which is why it is called the moment generating |
function. |
More generally, the characteristic function (CF) of X is
C( ) = Eexp (i X)
where i = p 1 is the imaginary unit. The CF always exists, and when EjXjm < 1
|
|
|
dm |
|
|
|
|
|
d m |
C( ) =0 = imE(Xm) : |
|
The Lp norm, p |
|
|
|
X is |
|
|
1; of the random variable |
kXkp = (EjXjp)1=p :
B.4 Gamma Function
The gamma function is de…ned for > 0 as
Z 1
( ) = x 1 exp ( x) :
0
It satis…es the property
(1 + ) = ( )
so for positive integers n;
(n) = (n 1)!
Special values include
(1) = 1
and |
|
|
||
|
|
1 |
|
= 1=2: |
2 |
|
|||
|
|
|
Sterling’s formula is an expansion for the its logarithm
log ( ) = |
1 |
log(2 ) + |
|
1 |
log z + |
1 |
|
1 |
+ |
1 |
+ |
|
|
|
|
|
|||||||
2 |
2 |
12 |
360 3 |
1260 5 |




APPENDIX B. PROBABILITY |
|
267 |
For any measurable function g(x; y); |
|
|
|
1 |
1 |
Eg(X; Y ) = Z 1 Z 1 g(x; y)f(x; y)dxdy: |
||
The marginal distribution of X is |
|
|
FX (x) = |
Pr(X x) |
|
= |
lim F (x; y) |
|
|
y!1 |
|
|
x |
1 |
= |
Z 1 Z 1 f(x; y)dydx |
|
so the marginal density of X is |
|
|
d |
|
1 |
Z
fX (x) = dxFX (x) = 1 f(x; y)dy:
Similarly, the marginal density of Y is
Z 1
fY (y) = f(x; y)dx:
1
The random variables X and Y are de…ned to be independent if f(x; y) = fX (x)fY (y): Furthermore, X and Y are independent if and only if there exist functions g(x) and h(y) such that f(x; y) = g(x)h(y):
If X and Y are independent, then |
Z Z |
|
|
E(g(X)h(Y )) = |
g(x)h(y)f(y; x)dydx |
|
|
= |
Z Z |
g(x)h(y)fY (y)fX (x)dydx |
|
|
Z |
Z |
|
= |
g(x)fX (x)dx h(y)fY (y)dy |
|
|
= |
Eg (X) Eh (Y ) : |
(B.5) |
if the expectations exist. For example, if X and Y are independent then
E(XY ) = EXEY:
Another implication of (B.5) is that if X and Y are independent and Z = X + Y; then
MZ( ) = Eexp ( (X + Y ))
=E(exp ( X) exp ( Y ))
=Eexp 0X Eexp 0Y
= MX ( )MY ( ): |
(B.6) |
The covariance between X and Y is
cov(X; Y ) = XY = E((X EX) (Y EY )) = EXY EXEY:
The correlation between X and Y is
XY
corr (X; Y ) = XY = x Y :

APPENDIX B. PROBABILITY |
268 |
The Cauchy-Schwarz Inequality implies that |
|
j XY j 1: |
(B.7) |
The correlation is a measure of linear dependence, free of units of measurement.
If X and Y are independent, then XY = 0 and XY = 0: The reverse, however, is not true. For example, if EX = 0 and EX3 = 0, then cov(X; X2) = 0:
A useful fact is that
var (X + Y ) = var(X) + var(Y ) + 2 cov(X; Y ):
An implication is that if X and Y are independent, then
var (X + Y ) = var(X) + var(Y );
the variance of the sum is the sum of the variances.
A k 1 random vector X = (X1; :::; Xk)0 is a function from S to Rk: Let x = (x1; :::; xk)0 denote a vector in Rk: (In this Appendix, we use bold to denote vectors. Bold capitals X are random vectors and bold lower case x are nonrandom vectors. Again, this is in distinction to the notation used in the bulk of the text) The vector X has the distribution and density functions
F (x) |
= |
Pr(X x) |
||
|
|
|
@k |
|
f(x) |
= |
|
|
F (x): |
|
@x1 @xk |
For a measurable function g : Rk ! Rs; we de…ne the expectation
Z
Eg(X) = g(x)f(x)dx
Rk
where the symbol dx denotes dx1 dxk: In particular, we have the k 1 multivariate mean
= EX
and k k covariance matrix
0
= E (X ) (X )
= EXX0 0
If the elements of X are mutually independent, then is a diagonal matrix and
Xk ! Xk
var |
Xi = var (Xi) |
i=1 |
i=1 |
B.7 Conditional Distributions and Expectation
The conditional density of Y given X = x is de…ned as
f(x; y) fY jX (y j x) = fX(x)

APPENDIX B. PROBABILITY |
269 |
if fX(x) > 0: One way to derive this expression from the de…nition of conditional probability is
fY jX (y j x) |
= |
@ |
lim Pr (Y |
|
y |
j |
x |
|
X |
|
x + ") |
|||||
|
||||||||||||||||
|
@y " |
! |
0 |
|
|
|
|
|
|
|
||||||
|
|
@ |
|
|
Pr (fY yg \ fx X x + "g) |
|||||||||||
|
= |
lim |
||||||||||||||
|
|
|
|
|
|
|||||||||||
|
|
@y "!0 |
|
|
|
Pr(x X x + ") |
||||||||||
|
= |
@ |
lim |
F (x + "; y) F (x; y) |
|
|||||||||||
|
|
|
FX(x + ") FX (x) |
|
||||||||||||
|
|
@y "!0 |
|
|
||||||||||||
|
|
@ |
|
|
|
|
@ |
F (x + "; y) |
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|||||||
|
= |
|
lim |
|
@x |
|
|
|
|
|
|
|
|
|||
|
|
|
fX(x + ") |
|
|
|
|
|||||||||
|
|
@y "!0 |
|
|
|
|
|
@2 F (x; y)
=@x@y
fX(x)
=f(x; y): fX(x)
The conditional mean or conditional expectation is the function
Z 1
m(x) = E(Y j X = x) = yfY jX (y j x) dy:
1
The conditional mean m(x) is a function, meaning that when X equals x; then the expected value of Y is m(x):
Similarly, we de…ne the conditional variance of Y given X = x as
2(x) = var (Y j X = x) |
|
|
=E (Y m(x))2 j X = x
=E Y 2 j X = x m(x)2:
Evaluated at x = X; the conditional mean m(X) and conditional variance 2(X) are random variables, functions of X: We write this as E(Y j X) = m(X) and var (Y j X) = 2(X): For example, if E(Y j X = x) = + 0x; then E(Y j X) = + 0X; a transformation of X:
The following are important facts about conditional expectations.
Simple Law of Iterated Expectations:
E(E(Y j X)) = E(Y ) |
(B.8) |
Proof:
E(E(Y j X)) = E(m(X))
|
1 |
|
= |
Z 1 m(x)fX(x)dx |
|
|
1 |
1 |
= |
Z 1 Z 1 yfY jX (y j x) fX(x)dydx |
|
|
1 |
1 |
= |
Z 1 Z 1 yf (y; x) dydx |
|
= |
E(Y ): |
|
Law of Iterated Expectations:
E(E(Y j X; Z) j X) = E(Y j X) |
(B.9) |
APPENDIX B. PROBABILITY |
|
270 |
Conditioning Theorem. For any function g(x); |
|
|
E(g(X)Y j X) = g (X) E(Y j X) |
(B.10) |
|
Proof: Let |
|
|
h(x) = E(g(X)Y j X = x) |
|
|
|
1 |
|
= |
Z 1 g(x)yfY jX (y j x) dy |
|
|
1 |
|
= g(x) Z 1 yfY jX (y j x) dy |
|
|
= |
g(x)m(x) |
|
where m(x) = E(Y j X = x) : Thus h(X) = g(X)m(X), which is the same as E(g(X)Y j X) = g (X) E(Y j X) :
B.8 Transformations
Suppose that X 2 Rk with continuous distribution function FX(x) and density fX(x): Let Y = g(X) where g(x) : Rk ! Rk is one-to-one, di¤erentiable, and invertible. Let h(y) denote the
inverse of g(x). The Jacobian is |
|
|
J(y) = det |
@ |
h(y) : |
|
||
@y0 |
Consider the univariate case k = 1: If g(x) is an increasing function, then g(X) Y if and only
if X h(Y ); so the distribution function of Y is |
|
|||||||
FY (y) = Pr (g(X) y) |
|
|||||||
= |
Pr (X h(Y )) |
|
||||||
|
|
= FX (h(Y )) : |
|
|||||
Taking the derivative, the density of Y is |
|
|
|
|
|
|
||
|
d |
|
|
|
d |
|
||
fY (y) = |
|
FY (y) = fX (h(Y )) |
|
h(y): |
|
|||
dy |
dy |
|
||||||
If g(x) is a decreasing function, then g(X) Y if and only if X h(Y ); so |
|
|||||||
FY (y) = Pr (g(X) y) |
|
|||||||
|
|
= 1 Pr (X h(Y )) |
|
|||||
|
|
= 1 FX (h(Y )) |
|
|||||
and the density of Y is |
|
d |
|
|||||
fY (y) = fX (h(Y )) |
|
|||||||
|
h(y): |
|
||||||
dy |
|
|||||||
We can write these two cases jointly as |
|
|
|
|
|
|
||
fY (y) = fX (h(Y )) jJ(y)j : |
(B.11) |
This is known as the change-of-variables formula. This same formula (B.11) holds for k > 1; but its justi…cation requires deeper results from analysis.
As one example, take the case X U[0; 1] and Y = log(X). Here, g(x) = log(x) and h(y) = exp( y) so the Jacobian is J(y) = exp(y): As the range of X is [0; 1]; that for Y is [0,1): Since fX (x) = 1 for 0 x 1 (B.11) shows that
fY (y) = exp( y); 0 y 1;
an exponential density.


APPENDIX B. PROBABILITY |
272 |
Proof of Theorem B.9.1. By the change-of-variables formula, the density of Y = a + BX is
|
1 |
|
(y |
|
|
Y |
)0 1 |
(y |
|
|
Y |
) |
|
f(y) = |
|
exp |
|
|
Y |
|
|
|
!; y 2 Rk: |
||||
(2 )k=2 det ( Y )1=2 |
|
|
|
|
2 |
|
|
|
|
|
where Y = a+B and Y = B B0; where we used the fact that det (B B0)1=2 = det ( )1=2 det (B) :
Proof of Theorem B.9.2. First, suppose a random variable Q is distributed chi-square with r degrees of freedom. It has the MGF
|
|
|
Eexp (tQ) = |
1 |
|
|
1 |
|
|
|
|
xr=2 1 exp (tx) exp ( |
|
x=2) dy = (1 |
|
2t) r=2 |
|
|
|
|||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
Z0 |
2r 2r=2 |
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||
|
|
|
|
|
|
|
|
the fact that |
|
|
a |
|
1 exp ( by) dy = b |
|
a (a); which can be found |
|||||||||||||||||||||||||||||||||||
where the second equality uses |
|
|
|
|
|
|
|
|
|
0 y |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||
by applying change-of-variables to the |
|
gamma function. Our goal is to calculate the MGF of |
||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
R |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||
Q = X0X and show that it equals (1 |
|
2t) r=2 ; which will establish that Q |
|
2. |
|
|
|
|||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
P |
r |
Z |
2 |
|
where the Zj |
|
|
|
|
|
|
r |
|
|
|
||||||||||||||||||||
|
Note that we can write Q = X0X = |
j=1 |
|
are independent N (0; 1) : The |
||||||||||||||||||||||||||||||||||||||||||||||
distribution of each of the Zj2 is |
|
|
|
|
|
|
|
|
|
j |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||
|
|
|
|
|
|
|
|
|
1 |
|
|
|
p |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||
|
|
|
|
|
|
|
j |
|
|
= 2 Prpy |
|
j |
x2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||
|
|
|
|
|
|
Pr |
Z2 |
|
y |
|
|
|
|
|
|
|
|
(0 |
|
|
Z |
|
|
|
|
y |
) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
= |
|
|
2 Z0 |
|
|
|
p |
|
exp |
|
|
|
dx |
|
|
|
|
|
|
|
|
|
|
|||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
2 |
|
|
|
|
|
|
2 |
|
|
|
|||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
y |
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
= |
|
|
Z0 |
|
|
|
|
1 |
|
|
|
|
s 1=2 exp |
|
s |
|
|
ds |
|
|
|
|
|
|
|
|||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
21=2 |
2 |
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||
using the change–of-variables s = x |
|
and the fact |
|
= p |
|
: Thus the density of Zj is |
|
|
|
|||||||||||||||||||||||||||||||||||||||||
|
2 |
|
|
|
|
|||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
f1(x) = |
|
|
21 |
1 |
21=2 x 1=2 exp |
x |
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
= (1 2t) 1=2 : |
|
|
||||
which is the 12 and by our above calculation has the MGF of Eexp tZj2 |
|
|
||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
the MGF of Q = |
r |
2 |
|
||||||||
|
Since |
the |
Zj are |
mutually independent, |
(B.6) |
|
implies |
that |
j=1 Zj |
is |
||||||||||||||||||||||||||||||||||||||||
h |
|
|
|
|
|
|
|
|
P |
|||||||||||||||||||||||||||||||||||||||||
|
|
i |
r = (1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
(1 |
|
2t) 1=2 |
|
2t) r=2 ; which is the MGF of the r2 density as desired: |
|
|
|
Proof of Theorem B.9.3. The fact that A > 0 means that we can write A = CC0 where C is non-singular. Then A 1 = C 10C 1 and
Thus |
C 1Z N 0; C 1AC 10 = N 0; C 1CC0C 10 = N (0; Iq) : |
|
Z0A 1Z = Z0C 10C 1Z = C 1Z 0 C 1Z q2: |
||
|
Proof of Theorem B.9.4. Using the simple law of iterated expectations, Tr has distribution