Econometrics2011
.pdfCHAPTER 10. THE BOOTSTRAP |
184 |
Notice that generally this is very di¤erent from the Efron interval C1! They coincide in the special
^
case that Gn(u) is symmetric about ; but otherwise they di¤er.
Computationally, this interval can be estimated from a bootstrap simulation by sorting the
^ |
^ |
|
^ |
sorted |
|
bootstrap statistics Tn = |
|
; which are centered at the sample estimate : These are |
^ |
|
|
to yield the quantile estimates q^ |
(:025) and q^ |
(:975): The 95% con…dence interval is then [ |
|
||
|
n |
n |
|
|
^
q^n(:975); q^n(:025)]:
This con…dence interval is discussed in most theoretical treatments of the bootstrap, but is not widely used in practice.
10.6Percentile-t Equal-Tailed Interval
Suppose we want to test H0 : = 0 |
against H1 : < 0 at size : We would set Tn( ) = |
^ =s(^) and reject H0 in favor of H1 |
if Tn( 0) < c; where c would be selected so that |
Pr (Tn( 0) < c) = :
Thus c = qn( ): Since this is unknown, a bootstrap test replaces qn( ) with the bootstrap estimate qn( ); and the test rejects if Tn( 0) < qn( ):
Similarly, if the alternative is H1 : > 0; the bootstrap test rejects if Tn( 0) > qn(1 ): Computationally, these critical values can be estimated from a bootstrap simulation by sorting
|
|
^ |
^ |
^ |
|
important, that the bootstrap test |
the bootstrap t-statistics Tn = |
^ |
=s( ): Note, and this is ^ |
||||
statistic is centered at the estimate ; and the standard error s( ) is calculated on the bootstrap |
||||||
sample. These t-statistics are sorted to …nd the estimated quantiles q^n( ) and/or q^n(1 ): |
||||||
Let Tn( ) = ^ |
=s(^). Then taking the intersection of two one-sided intervals, |
|||||
1 |
= |
Pr (qn( =2) Tn( 0) qn(1 =2)) |
||||
|
= Pr qn( =2) ^ 0 |
=s(^) qn(1 =2) |
||||
|
= Pr ^ s(^)qn(1 =2) 0 ^ s(^)qn( =2) ; |
|||||
so an exact (1 )% con…dence interval for 0 would be |
|
|||||
|
0 |
^ |
^ |
|
^ |
^ |
|
C3 |
= [ s( )qn(1 =2); |
s( )qn( =2)]: |
|||
This motivates a bootstrap analog |
|
|
|
|
||
|
C3 = [^ s(^)qn(1 =2); |
^ s(^)qn( =2)]: |
This is often called a percentile-t con…dence interval. It is equal-tailed or central since the probability that 0 is below the left endpoint approximately equals the probability that 0 is above the right endpoint, each =2:
Computationally, this is based on the critical values from the one-sided hypothesis tests, discussed above.
10.7Symmetric Percentile-t Intervals
Suppose we want to test H0 : = 0 |
against H1 : 6= 0 at size : We would set Tn( ) = |
^ =s(^) and reject H0 in favor of H1 |
if jTn( 0)j > c; where c would be selected so that |
Pr (jTn( 0)j > c) = :
CHAPTER 10. THE BOOTSTRAP |
185 |
Note that
Pr (jTn( 0)j < c) = Pr ( c < Tn( 0) < c) = Gn(c) Gn( c)
Gn(c);
which is a symmetric distribution function. The ideal critical value c = qn( ) solves the equation
Gn(qn( )) = 1 :
Equivalently, qn( ) is the 1 quantile of the distribution of jTn( 0)j :
The bootstrap estimate is qn( ); the 1 quantile of the distribution of jTn j ; or the number which solves the equation
Gn(qn( )) = Gn(qn( )) Gn( qn( )) = 1 :
Computationally, qn( ) is estimated from a bootstrap simulation by sorting the bootstrap t-
^ ^ ^
statistics jTn j = =s( ); and taking the upper % quantile. The bootstrap test rejects if
jTn( 0)j > qn( ):
Let
^ ^ ^ ^
C4 = [ s( )qn( ); + s( )qn( )];
where qn( ) is the bootstrap critical value for a two-sided hypothesis test. C4 is called the symmetric percentile-t interval. It is designed to work well since
^ ^ ^ ^ Pr ( 0 2 C4) = Pr s( )qn( ) 0 + s( )qn( )
=Pr (jTn( 0)j < qn( )) ' Pr (jTn( 0)j < qn( ))
=1 :
If is a vector, then to test H0 : = 0 against H1 : =6 0 at size ; we would use a Wald statistic
^
or some other asymptotically chi-square statistic. Thus here Tn( ) = Wn( ): The ideal test rejects if Wn qn( ); where qn( ) is the (1 )% quantile of the distribution of Wn: The bootstrap test rejects if Wn qn( ); where qn( ) is the (1 )% quantile of the distribution of
^ |
^ |
0 ^ 1 |
^ |
^ |
Wn = n |
|
V |
|
: |
Computationally, the critical value qn( ) is found as the quantile from simulated values of Wn :
Note in the simulation that the Wald statistic is a quadratic form in |
^ |
^ |
^ |
|
: |
[This is a typical mistake made by practitioners.] |
|
; not |
|
0 |
10.8Asymptotic Expansions
Let Tn 2 R be a statistic such that
d |
(10.3) |
Tn ! N(0; 2): |
CHAPTER 10. THE BOOTSTRAP |
186 |
In some cases, such as when Tn is a t-ratio, then 2 = 1: In other cases 2 is unknown. Equivalently, writing Tn Gn(u; F ) then for each u and F
lim G |
(u; F ) = |
|
u |
; |
|
|
|||
|
|
|
|||||||
n!1 n |
|
|
|
|
|
|
|||
or |
|
u |
|
|
|
|
|
|
|
Gn(u; F ) = |
+ o (1) |
: |
(10.4) |
||||||
|
|||||||||
|
While (10.4) says that Gn converges to u as n ! 1; it says nothing, however, about the rate of convergence; or the size of the divergence for any particular sample size n: A better asymptotic approximation may be obtained through an asymptotic expansion.
The following notation will be helpful. Let an be a sequence.
De…nition 10.8.1 an = o(1) if an ! 0 as n ! 1
De…nition 10.8.2 an = O(1) if janj is uniformly bounded.
De…nition 10.8.3 an = o(n r) if nr janj ! 0 as n ! 1.
Basically, an = O(n r) if it declines to zero like n r:
We say that a function g(u) is even if g( u) = g(u); and a function h(u) is odd if h( u) = h(u): The derivative of an even function is odd, and vice-versa.
Theorem 10.8.1 Under regularity conditions and (10.3),
Gn(u; F ) = |
u |
|
1 |
|
1 |
|
(u; F ) + O(n 3=2) |
||
|
+ |
|
g1 |
(u; F ) + |
|
g2 |
|||
|
n1=2 |
n |
uniformly over u; where g1 is an even function of u; and g2 is an odd function of u: Moreover, g1 and g2 are di¤erentiable functions of u and continuous in F relative to the supremum norm on the space of distribution functions.
The expansion in Theorem 10.8.1 is often called an Edgeworth expansion.
We can interpret Theorem 10.8.1 as follows. First, Gn(u; F ) converges to the normal limit at rate n1=2: To a second order of approximation,
Gn(u; F ) u + n 1=2g1(u; F ):
Since the derivative of g1 is odd, the density function is skewed. To a third order of approximation,
Gn(u; F ) u + n 1=2g1(u; F ) + n 1g2(u; F )
which adds a symmetric non-normal component to the approximate density (for example, adding leptokurtosis).
CHAPTER 10. THE BOOTSTRAP |
|
|
|
|
|
|
|
187 |
||||||||
[Side Note: When Tn = p |
|
Xn |
= ; a standardized sample mean, then |
|||||||||||||
n |
||||||||||||||||
g1(u) = |
|
1 |
3 |
|
u2 |
|
1 (u) |
|
|
|
|
|
||||
6 |
1 |
|
|
|
|
|||||||||||
|
1 |
|
|
|
|
|
|
|
||||||||
g2(u) = |
|
|
|
|
|
|
|
(u) |
||||||||
24 |
4 u3 3u + |
72 |
32 |
u5 10u3 |
+ 15u |
|||||||||||
where (u) is the standard normal pdf, and |
|
|
|
|
|
|||||||||||
|
|
|
|
|
|
|
|
3 |
|
= E(X )3 = 3 |
|
|
||||
|
|
|
|
|
|
|
|
4 |
|
= E(X )4 = 4 3 |
|
|
the standardized skewness and excess kurtosis of the distribution of X: Note that when 3 = 0 and 4 = 0; then g1 = 0 and g2 = 0; so the second-order Edgeworth expansion corresponds to the normal distribution.]
Francis Edgeworth
Francis Ysidro Edgeworth (1845-1926) of Ireland, founding editor of the Economic Journal, was a profound economic and statistical theorist, developing the theories of indi¤erence curves and asymptotic expansions. He also could be viewed as the …rst econometrician due to his early use of mathematical statistics in the study of economic data.
10.9 One-Sided Tests
Using the expansion of Theorem 10.8.1, we can assess the accuracy of one-sided hypothesis tests and con…dence regions based on an asymptotically normal t-ratio Tn. An asymptotic test is based on (u):
To the second order, the exact distribution is |
|
|
|
|
|
|
Pr (Tn < u) = Gn(u; F0) = (u) + |
1 |
g1(u; F0) + O(n 1) |
||||
n1=2 |
||||||
|
|
|
|
|
||
since = 1: The di¤erence is |
|
|
|
|
|
|
(u) Gn(u; F0) = |
|
1 |
g1(u; F0) + O(n 1) |
|||
|
n1=2 |
|||||
= |
O(n 1=2); |
|
so the order of the error is O(n 1=2):
A bootstrap test is based on Gn(u); which from Theorem 10.8.1 has the expansion
G |
(u) = Gn(u; Fn) = (u) + |
1 |
g1(u; Fn) + O(n 1): |
|
|||
n |
n1=2 |
|
|
|
|
Because (u) appears in both expansions, the di¤erence between the bootstrap distribution and the true distribution is
G |
(u) |
|
G |
(u; F ) = |
1 |
(g |
(u; F ) |
|
g |
(u; F )) + O(n 1): |
|
n1=2 |
|||||||||||
n |
|
n |
0 |
1 |
n |
1 |
0 |
CHAPTER 10. THE BOOTSTRAP |
188 |
||||||
Since Fn converges to F at rate |
p |
|
|
is continuous with respect to F; the di¤erence |
|||
n; and pg1 |
|||||||
|
|
|
|
|
|||
(g1(u; Fn) g1(u; F0)) converges to 0 at rate n: Heuristically, |
|||||||
g1(u; Fn) g1(u; F0) |
@ |
g1(u; F0) (Fn F0) |
|||||
|
|
||||||
@F |
|||||||
|
= |
O(n 1=2); |
The “derivative” @F@ g1(u; F ) is only heuristic, as F is a function. We conclude that
Gn(u) Gn(u; F0) = O(n 1);
or
Pr (Tn u) = Pr (Tn u) + O(n 1);
which is an improved rate of convergence over the asymptotic test (which converged at rate O(n 1=2)). This rate can be used to show that one-tailed bootstrap inference based on the t- ratio achieves a so-called asymptotic re…nement –the Type I error of the test converges at a faster rate than an analogous asymptotic test.
10.10Symmetric Two-Sided Tests
If a random variable y has distribution function H(u) = Pr(y u); then the random variable jyj has distribution function
H(u) = H(u) H( u)
since
Pr (jyj u) = Pr ( u y u)
= Pr (y u) Pr (y u) = H(u) H( u):
For example, if Z N(0; 1); then jZj has distribution function
(u) = (u) ( u) = 2 (u) 1:
Similarly, if Tn has exact distribution Gn(u; F ); then jTnj has the distribution function
Gn(u; F ) = Gn(u; F ) Gn( u; F ):
d d
A two-sided hypothesis test rejects H0 for large values of jTnj : Since Tn ! Z; then jTnj !
jZj : Thus asymptotic critical values are taken from the distribution, and exact critical values are taken from the Gn(u; F0) distribution. From Theorem 10.8.1, we can calculate that
Gn(u; F ) = Gn(u; F ) Gn( u; F ) |
|
|
|
|
||||||||||||
|
1 |
|
|
|
1 |
|
|
|
|
|
||||||
= |
(u) + |
|
|
g1(u; F ) + |
|
g2 |
(u; F ) |
|||||||||
n1=2 |
n |
|||||||||||||||
|
1 |
|
|
|
|
1 |
|
|
||||||||
|
( u) + |
|
g1 |
( u; F ) + |
|
g2 |
( u; F ) + O(n 3=2) |
|||||||||
|
n1=2 |
n |
||||||||||||||
= |
|
(u) + |
2 |
g2(u; F ) + O(n 3=2); |
(10.5) |
|||||||||||
|
||||||||||||||||
|
||||||||||||||||
|
|
n |
|
|
|
|
|
|
|
where the simpli…cations are because g1 is even and g2 is odd. Hence the di¤erence between the asymptotic distribution and the exact distribution is
(u) Gn(u; F0) = n2 g2(u; F0) + O(n 3=2) = O(n 1):
CHAPTER 10. THE BOOTSTRAP |
189 |
The order of the error is O(n 1):
Interestingly, the asymptotic two-sided test has a better coverage rate than the asymptotic one-sided test. This is because the …rst term in the asymptotic expansion, g1; is an even function, meaning that the errors in the two directions exactly cancel out.
Applying (10.5) to the bootstrap distribution, we …nd
Gn(u) = Gn(u; Fn) = (u) + n2 g2(u; Fn) + O(n 3=2):
Thus the di¤erence between the bootstrap and exact distributions is
|
|
|
|
|
|
|
2 |
|
|
|
|
(u; F )) + O(n 3=2) |
||
G |
(u) |
G |
(u; F ) = |
(g |
(u; F ) |
g |
||||||||
|
n |
|||||||||||||
|
n |
|
n |
0 |
|
2 |
n |
2 |
0 |
|||||
|
|
|
|
|
|
= |
O(n 3=2); |
|
|
|
p
the last equality because Fn converges to F0 at rate n; and g2 is continuous in F: Another way
of writing this is
Pr (jTn j < u) = Pr (jTnj < u) + O(n 3=2)
so the error from using the bootstrap distribution (relative to the true unknown distribution) is O(n 3=2): This is in contrast to the use of the asymptotic distribution, whose error is O(n 1): Thus a two-sided bootstrap test also achieves an asymptotic re…nement, similar to a one-sided test.
A reader might get confused between the two simultaneous e¤ects. Two-sided tests have better rates of convergence than the one-sided tests, and bootstrap tests have better rates of convergence than asymptotic tests.
The analysis shows that there may be a trade-o¤ between one-sided and two-sided tests. Twosided tests will have more accurate size (Reported Type I error), but one-sided tests might have more power against alternatives of interest. Con…dence intervals based on the bootstrap can be asymmetric if based on one-sided tests (equal-tailed intervals) and can therefore be more informative and have smaller length than symmetric intervals. Therefore, the choice between symmetric and equal-tailed con…dence intervals is unclear, and needs to be determined on a case-by-case basis.
10.11 Percentile Con…dence Intervals
d |
n |
|
|
|
|
|
0 |
|
To evaluate the coverage rate of the percentile interval, set T |
|
= pn |
^ |
|
|
|
: We know that |
Tn ! N(0; V ); which is not pivotal, as it depends on the unknown V: Theorem 10.8.1 shows that
a …rst-order approximation
Gn(u; F ) = u + O(n 1=2);
p
where = V ; and for the bootstrap
Gn(u) = Gn(u; Fn) = u^ + O(n 1=2); where ^ = V (Fn) is the bootstrap estimate of : The di¤erence is
Gn(u) Gn(u; F0) = u u + O(n 1=2)
=u u (^ ) + O(n 1=2)
=O(n 1=2)^
Hence the order of the error is O(n 1=2): |
pn- |
The good news is that the percentile-type methods (if appropriately used) can yield |
convergent asymptotic inference. Yet these methods do not require the calculation of standard
CHAPTER 10. THE BOOTSTRAP |
190 |
errors! This means that in contexts where standard errors are not available or are di¢ cult to calculate, the percentile bootstrap methods provide an attractive inference method.
The bad news is that the rate of convergence is disappointing. It is no better than the rate obtained from an asymptotic one-sided con…dence region. Therefore if standard errors are available, it is unclear if there are any bene…ts from using the percentile bootstrap over simple asymptotic methods.
Based on these arguments, the theoretical literature (e.g. Hall, 1992, Horowitz, 2001) tends to advocate the use of the percentile-t bootstrap methods rather than percentile methods.
10.12Bootstrap Methods for Regression Models
The bootstrap methods we have discussed have set Gn(u) = Gn(u; Fn); where Fn is the EDF. Any other consistent estimate of F may be used to de…ne a feasible bootstrap estimator. The advantage of the EDF is that it is fully nonparametric, it imposes no conditions, and works in nearly any context. But since it is fully nonparametric, it may be ine¢ cient in contexts where more is known about F: We discuss bootstrap methods appropriate for the linear regression model
yi = x0i + ei
E(ei j xi) = 0:
The non-parametric bootstrap resamples the observations (yi ; xi ) from the EDF, which implies
0^ yi = xi + ei
E(xi ei ) = 0
but generally
E(ei j xi ) =6 0:
The bootstrap distribution does not impose the regression assumption, and is thus an ine¢ cient estimator of the true distribution (when in fact the regression assumption is true.)
One approach to this problem is to impose the very strong assumption that the error "i is independent of the regressor xi: The advantage is that in this case it is straightforward to construct bootstrap distributions. The disadvantage is that the bootstrap distribution may be a poor approximation when the error is not independent of the regressors.
To impose independence, it is su¢ cient to sample the xi and ei independently, and then create
^
yi = x0i + ei : There are di¤erent ways to impose independence. A non-parametric method is to sample the bootstrap errors ei randomly from the OLS residuals fe^1; :::; e^ng: A parametric
method is to generate the bootstrap errors ei from a parametric distribution, such as the normal ei N(0; ^2):
For the regressors xi , a nonparametric method is to sample the xi randomly from the EDF or sample values fx1; :::; xng: A parametric method is to sample xi from an estimated parametric distribution. A third approach sets xi = xi: This is equivalent to treating the regressors as …xed in repeated samples. If this is done, then all inferential statements are made conditionally on the observed values of the regressors, which is a valid statistical approach. It does not really matter, however, whether or not the xi are really “…xed”or random.
The methods discussed above are unattractive for most applications in econometrics because they impose the stringent assumption that xi and ei are independent. Typically what is desirable is to impose only the regression condition E(ei j xi) = 0: Unfortunately this is a harder problem.
One proposal which imposes the regression condition without independence is the Wild Bootstrap. The idea is to construct a conditional distribution for ei so that
E(ei j xi) = 0
E ei 2 |
j xi |
|
= |
e^i2 |
|
|
j |
xi |
= |
e^i3: |
|
E ei 3 |
|
|
CHAPTER 10. THE BOOTSTRAP |
191 |
A conditional distribution with these features will preserve the main important features of the data. This can be achieved using a two-point distribution of the form
|
|
1 + p |
|
|
|
! |
|
! |
|
p |
|
1 |
||
Pr |
e = |
5 |
e^ |
= |
5 |
|||||||||
|
|
|
|
|
|
|
|
|
||||||
|
i |
2 |
|
|
|
i |
|
2p5 |
||||||
|
|
p |
|
|
!e^i! |
|
p |
|
+ 1 |
|||||
|
|
5 |
|
|
||||||||||
Pr |
ei = |
1 |
|
= |
25p |
|
|
|||||||
2 |
|
5 |
For each xi; you sample ei using this two-point distribution.