Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Econometrics2011

.pdf
Скачиваний:
10
Добавлен:
21.03.2016
Размер:
1.77 Mб
Скачать

CHAPTER 10. THE BOOTSTRAP

183

The interval C1 is a popular bootstrap con…dence interval often used in empirical practice. This is because it is easy to compute, simple to motivate, was popularized by Efron early in the history of the bootstrap, and also has the feature that it is translation invariant. That is, if we de…ne= f( ) as the parameter of interest for a monotonically increasing function f; then percentile method applied to this problem will produce the con…dence interval [f(qn( =2)); f(qn(1 =2))]; which is a naturally good property.

However, as we show now, C1 is in a deep sense very poorly motivated.

^

It will be useful if we introduce an alternative de…nition C1. Let Tn( ) = and let qn( ) be the quantile function of its distribution. (These are the original quantiles, with subtracted.) Then C1 can alternatively be written as

C1 = [^ + qn( =2);

^ + qn(1 =2)]:

This is a bootstrap estimate of the “ideal”con…dence interval

0

^

^

C1

= [ + qn( =2);

+ qn(1 =2)]:

The latter has coverage probability

Pr 0 2 C10

^ ^

= Pr + qn( =2) 0 + qn(1 =2)

=

Pr qn(1 =2) ^ 0 qn( =2)

=

Gn( qn( =2); F0) Gn( qn(1 =2); F0)

^

which generally is not 1 ! There is one important exception. If 0 has a symmetric distribution, then Gn( u; F0) = 1 Gn(u; F0); so

Pr 0 2 C10

= Gn( qn( =2); F0) Gn( qn(1 =2); F0)

 

= (1 Gn(qn( =2); F0)) (1 Gn(qn(1 =2); F0))

 

 

 

 

 

 

=

1

 

1 1

 

 

 

2

2

 

=

1

 

 

and this idealized con…dence interval is accurate. Therefore, C10 and C1 are designed for the case

^

that has a symmetric distribution about 0:

^

When does not have a symmetric distribution, C1 may perform quite poorly.

However, by the translation invariance argument presented above, it also follows that if there

^

exists some monotonically increasing transformation f( ) such that f( ) is symmetrically distributed about f( 0); then the idealized percentile bootstrap method will be accurate.

Based on these arguments, many argue that the percentile interval should not be used unless the sampling distribution is close to unbiased and symmetric.

The problems with the percentile method can be circumvented, at least in principle, by an alternative method.

^

Let Tn( ) = . Then

1 =

Pr

(qn( =2) Tn( 0) qn(1 =2))

=

Pr

^ qn(1 =2) 0 ^ qn( =2) ;

so an exact (1 )% con…dence interval for 0 would be

0

^

^

C2

= [ qn(1 =2);

qn( =2)]:

This motivates a bootstrap analog

^ ^

C2 = [ qn(1 =2); qn( =2)]:

CHAPTER 10. THE BOOTSTRAP

184

Notice that generally this is very di¤erent from the Efron interval C1! They coincide in the special

^

case that Gn(u) is symmetric about ; but otherwise they di¤er.

Computationally, this interval can be estimated from a bootstrap simulation by sorting the

^

^

 

^

sorted

bootstrap statistics Tn =

 

; which are centered at the sample estimate : These are

^

 

to yield the quantile estimates q^

(:025) and q^

(:975): The 95% con…dence interval is then [

 

 

n

n

 

 

^

q^n(:975); q^n(:025)]:

This con…dence interval is discussed in most theoretical treatments of the bootstrap, but is not widely used in practice.

10.6Percentile-t Equal-Tailed Interval

Suppose we want to test H0 : = 0

against H1 : < 0 at size : We would set Tn( ) =

^ =s(^) and reject H0 in favor of H1

if Tn( 0) < c; where c would be selected so that

Pr (Tn( 0) < c) = :

Thus c = qn( ): Since this is unknown, a bootstrap test replaces qn( ) with the bootstrap estimate qn( ); and the test rejects if Tn( 0) < qn( ):

Similarly, if the alternative is H1 : > 0; the bootstrap test rejects if Tn( 0) > qn(1 ): Computationally, these critical values can be estimated from a bootstrap simulation by sorting

 

 

^

^

^

 

important, that the bootstrap test

the bootstrap t-statistics Tn =

^

=s( ): Note, and this is ^

statistic is centered at the estimate ; and the standard error s( ) is calculated on the bootstrap

sample. These t-statistics are sorted to …nd the estimated quantiles q^n( ) and/or q^n(1 ):

Let Tn( ) = ^

=s(^). Then taking the intersection of two one-sided intervals,

1

=

Pr (qn( =2) Tn( 0) qn(1 =2))

 

= Pr qn( =2) ^ 0

=s(^) qn(1 =2)

 

= Pr ^ s(^)qn(1 =2) 0 ^ s(^)qn( =2) ;

so an exact (1 )% con…dence interval for 0 would be

 

 

0

^

^

 

^

^

 

C3

= [ s( )qn(1 =2);

s( )qn( =2)]:

This motivates a bootstrap analog

 

 

 

 

 

C3 = [^ s(^)qn(1 =2);

^ s(^)qn( =2)]:

This is often called a percentile-t con…dence interval. It is equal-tailed or central since the probability that 0 is below the left endpoint approximately equals the probability that 0 is above the right endpoint, each =2:

Computationally, this is based on the critical values from the one-sided hypothesis tests, discussed above.

10.7Symmetric Percentile-t Intervals

Suppose we want to test H0 : = 0

against H1 : 6= 0 at size : We would set Tn( ) =

^ =s(^) and reject H0 in favor of H1

if jTn( 0)j > c; where c would be selected so that

Pr (jTn( 0)j > c) = :

^ 0 ^ 1
Wn( ) = n V

CHAPTER 10. THE BOOTSTRAP

185

Note that

Pr (jTn( 0)j < c) = Pr ( c < Tn( 0) < c) = Gn(c) Gn( c)

Gn(c);

which is a symmetric distribution function. The ideal critical value c = qn( ) solves the equation

Gn(qn( )) = 1 :

Equivalently, qn( ) is the 1 quantile of the distribution of jTn( 0)j :

The bootstrap estimate is qn( ); the 1 quantile of the distribution of jTn j ; or the number which solves the equation

Gn(qn( )) = Gn(qn( )) Gn( qn( )) = 1 :

Computationally, qn( ) is estimated from a bootstrap simulation by sorting the bootstrap t-

^ ^ ^

statistics jTn j = =s( ); and taking the upper % quantile. The bootstrap test rejects if

jTn( 0)j > qn( ):

Let

^ ^ ^ ^

C4 = [ s( )qn( ); + s( )qn( )];

where qn( ) is the bootstrap critical value for a two-sided hypothesis test. C4 is called the symmetric percentile-t interval. It is designed to work well since

^ ^ ^ ^ Pr ( 0 2 C4) = Pr s( )qn( ) 0 + s( )qn( )

=Pr (jTn( 0)j < qn( )) ' Pr (jTn( 0)j < qn( ))

=1 :

If is a vector, then to test H0 : = 0 against H1 : =6 0 at size ; we would use a Wald statistic

^

or some other asymptotically chi-square statistic. Thus here Tn( ) = Wn( ): The ideal test rejects if Wn qn( ); where qn( ) is the (1 )% quantile of the distribution of Wn: The bootstrap test rejects if Wn qn( ); where qn( ) is the (1 )% quantile of the distribution of

^

^

0 ^ 1

^

^

Wn = n

 

V

 

:

Computationally, the critical value qn( ) is found as the quantile from simulated values of Wn :

Note in the simulation that the Wald statistic is a quadratic form in

^

^

^

 

:

[This is a typical mistake made by practitioners.]

 

; not

 

0

10.8Asymptotic Expansions

Let Tn 2 R be a statistic such that

d

(10.3)

Tn ! N(0; 2):

CHAPTER 10. THE BOOTSTRAP

186

In some cases, such as when Tn is a t-ratio, then 2 = 1: In other cases 2 is unknown. Equivalently, writing Tn Gn(u; F ) then for each u and F

lim G

(u; F ) =

 

u

;

 

 

 

 

 

n!1 n

 

 

 

 

 

 

or

 

u

 

 

 

 

 

 

Gn(u; F ) =

+ o (1)

:

(10.4)

 

 

While (10.4) says that Gn converges to u as n ! 1; it says nothing, however, about the rate of convergence; or the size of the divergence for any particular sample size n: A better asymptotic approximation may be obtained through an asymptotic expansion.

The following notation will be helpful. Let an be a sequence.

De…nition 10.8.1 an = o(1) if an ! 0 as n ! 1

De…nition 10.8.2 an = O(1) if janj is uniformly bounded.

De…nition 10.8.3 an = o(n r) if nr janj ! 0 as n ! 1.

Basically, an = O(n r) if it declines to zero like n r:

We say that a function g(u) is even if g( u) = g(u); and a function h(u) is odd if h( u) = h(u): The derivative of an even function is odd, and vice-versa.

Theorem 10.8.1 Under regularity conditions and (10.3),

Gn(u; F ) =

u

 

1

 

1

 

(u; F ) + O(n 3=2)

 

+

 

g1

(u; F ) +

 

g2

 

n1=2

n

uniformly over u; where g1 is an even function of u; and g2 is an odd function of u: Moreover, g1 and g2 are di¤erentiable functions of u and continuous in F relative to the supremum norm on the space of distribution functions.

The expansion in Theorem 10.8.1 is often called an Edgeworth expansion.

We can interpret Theorem 10.8.1 as follows. First, Gn(u; F ) converges to the normal limit at rate n1=2: To a second order of approximation,

Gn(u; F ) u + n 1=2g1(u; F ):

Since the derivative of g1 is odd, the density function is skewed. To a third order of approximation,

Gn(u; F ) u + n 1=2g1(u; F ) + n 1g2(u; F )

which adds a symmetric non-normal component to the approximate density (for example, adding leptokurtosis).

CHAPTER 10. THE BOOTSTRAP

 

 

 

 

 

 

 

187

[Side Note: When Tn = p

 

Xn

= ; a standardized sample mean, then

n

g1(u) =

 

1

3

 

u2

 

1 (u)

 

 

 

 

 

6

1

 

 

 

 

 

1

 

 

 

 

 

 

 

g2(u) =

 

 

 

 

 

 

 

(u)

24

4 u3 3u +

72

32

u5 10u3

+ 15u

where (u) is the standard normal pdf, and

 

 

 

 

 

 

 

 

 

 

 

 

 

3

 

= E(X )3 = 3

 

 

 

 

 

 

 

 

 

 

4

 

= E(X )4 = 4 3

 

 

the standardized skewness and excess kurtosis of the distribution of X: Note that when 3 = 0 and 4 = 0; then g1 = 0 and g2 = 0; so the second-order Edgeworth expansion corresponds to the normal distribution.]

Francis Edgeworth

Francis Ysidro Edgeworth (1845-1926) of Ireland, founding editor of the Economic Journal, was a profound economic and statistical theorist, developing the theories of indi¤erence curves and asymptotic expansions. He also could be viewed as the …rst econometrician due to his early use of mathematical statistics in the study of economic data.

10.9 One-Sided Tests

Using the expansion of Theorem 10.8.1, we can assess the accuracy of one-sided hypothesis tests and con…dence regions based on an asymptotically normal t-ratio Tn. An asymptotic test is based on (u):

To the second order, the exact distribution is

 

 

 

 

 

Pr (Tn < u) = Gn(u; F0) = (u) +

1

g1(u; F0) + O(n 1)

n1=2

 

 

 

 

 

since = 1: The di¤erence is

 

 

 

 

 

(u) Gn(u; F0) =

 

1

g1(u; F0) + O(n 1)

 

n1=2

=

O(n 1=2);

 

so the order of the error is O(n 1=2):

A bootstrap test is based on Gn(u); which from Theorem 10.8.1 has the expansion

G

(u) = Gn(u; Fn) = (u) +

1

g1(u; Fn) + O(n 1):

 

n

n1=2

 

 

 

Because (u) appears in both expansions, the di¤erence between the bootstrap distribution and the true distribution is

G

(u)

 

G

(u; F ) =

1

(g

(u; F )

 

g

(u; F )) + O(n 1):

n1=2

n

 

n

0

1

n

1

0

CHAPTER 10. THE BOOTSTRAP

188

Since Fn converges to F at rate

p

 

 

is continuous with respect to F; the di¤erence

n; and pg1

 

 

 

 

 

(g1(u; Fn) g1(u; F0)) converges to 0 at rate n: Heuristically,

g1(u; Fn) g1(u; F0)

@

g1(u; F0) (Fn F0)

 

 

@F

 

=

O(n 1=2);

The “derivative” @F@ g1(u; F ) is only heuristic, as F is a function. We conclude that

Gn(u) Gn(u; F0) = O(n 1);

or

Pr (Tn u) = Pr (Tn u) + O(n 1);

which is an improved rate of convergence over the asymptotic test (which converged at rate O(n 1=2)). This rate can be used to show that one-tailed bootstrap inference based on the t- ratio achieves a so-called asymptotic re…nement –the Type I error of the test converges at a faster rate than an analogous asymptotic test.

10.10Symmetric Two-Sided Tests

If a random variable y has distribution function H(u) = Pr(y u); then the random variable jyj has distribution function

H(u) = H(u) H( u)

since

Pr (jyj u) = Pr ( u y u)

= Pr (y u) Pr (y u) = H(u) H( u):

For example, if Z N(0; 1); then jZj has distribution function

(u) = (u) ( u) = 2 (u) 1:

Similarly, if Tn has exact distribution Gn(u; F ); then jTnj has the distribution function

Gn(u; F ) = Gn(u; F ) Gn( u; F ):

d d

A two-sided hypothesis test rejects H0 for large values of jTnj : Since Tn ! Z; then jTnj !

jZj : Thus asymptotic critical values are taken from the distribution, and exact critical values are taken from the Gn(u; F0) distribution. From Theorem 10.8.1, we can calculate that

Gn(u; F ) = Gn(u; F ) Gn( u; F )

 

 

 

 

 

1

 

 

 

1

 

 

 

 

 

=

(u) +

 

 

g1(u; F ) +

 

g2

(u; F )

n1=2

n

 

1

 

 

 

 

1

 

 

 

( u) +

 

g1

( u; F ) +

 

g2

( u; F ) + O(n 3=2)

 

n1=2

n

=

 

(u) +

2

g2(u; F ) + O(n 3=2);

(10.5)

 

 

 

 

n

 

 

 

 

 

 

 

where the simpli…cations are because g1 is even and g2 is odd. Hence the di¤erence between the asymptotic distribution and the exact distribution is

(u) Gn(u; F0) = n2 g2(u; F0) + O(n 3=2) = O(n 1):

CHAPTER 10. THE BOOTSTRAP

189

The order of the error is O(n 1):

Interestingly, the asymptotic two-sided test has a better coverage rate than the asymptotic one-sided test. This is because the …rst term in the asymptotic expansion, g1; is an even function, meaning that the errors in the two directions exactly cancel out.

Applying (10.5) to the bootstrap distribution, we …nd

Gn(u) = Gn(u; Fn) = (u) + n2 g2(u; Fn) + O(n 3=2):

Thus the di¤erence between the bootstrap and exact distributions is

 

 

 

 

 

 

 

2

 

 

 

 

(u; F )) + O(n 3=2)

G

(u)

G

(u; F ) =

(g

(u; F )

g

 

n

 

n

 

n

0

 

2

n

2

0

 

 

 

 

 

 

=

O(n 3=2);

 

 

 

p

the last equality because Fn converges to F0 at rate n; and g2 is continuous in F: Another way

of writing this is

Pr (jTn j < u) = Pr (jTnj < u) + O(n 3=2)

so the error from using the bootstrap distribution (relative to the true unknown distribution) is O(n 3=2): This is in contrast to the use of the asymptotic distribution, whose error is O(n 1): Thus a two-sided bootstrap test also achieves an asymptotic re…nement, similar to a one-sided test.

A reader might get confused between the two simultaneous e¤ects. Two-sided tests have better rates of convergence than the one-sided tests, and bootstrap tests have better rates of convergence than asymptotic tests.

The analysis shows that there may be a trade-o¤ between one-sided and two-sided tests. Twosided tests will have more accurate size (Reported Type I error), but one-sided tests might have more power against alternatives of interest. Con…dence intervals based on the bootstrap can be asymmetric if based on one-sided tests (equal-tailed intervals) and can therefore be more informative and have smaller length than symmetric intervals. Therefore, the choice between symmetric and equal-tailed con…dence intervals is unclear, and needs to be determined on a case-by-case basis.

10.11 Percentile Con…dence Intervals

d

n

 

 

 

 

 

0

 

To evaluate the coverage rate of the percentile interval, set T

 

= pn

^

 

 

 

: We know that

Tn ! N(0; V ); which is not pivotal, as it depends on the unknown V: Theorem 10.8.1 shows that

a …rst-order approximation

Gn(u; F ) = u + O(n 1=2);

p

where = V ; and for the bootstrap

Gn(u) = Gn(u; Fn) = u^ + O(n 1=2); where ^ = V (Fn) is the bootstrap estimate of : The di¤erence is

Gn(u) Gn(u; F0) = u u + O(n 1=2)

=u u (^ ) + O(n 1=2)

=O(n 1=2)^

Hence the order of the error is O(n 1=2):

pn-

The good news is that the percentile-type methods (if appropriately used) can yield

convergent asymptotic inference. Yet these methods do not require the calculation of standard

CHAPTER 10. THE BOOTSTRAP

190

errors! This means that in contexts where standard errors are not available or are di¢ cult to calculate, the percentile bootstrap methods provide an attractive inference method.

The bad news is that the rate of convergence is disappointing. It is no better than the rate obtained from an asymptotic one-sided con…dence region. Therefore if standard errors are available, it is unclear if there are any bene…ts from using the percentile bootstrap over simple asymptotic methods.

Based on these arguments, the theoretical literature (e.g. Hall, 1992, Horowitz, 2001) tends to advocate the use of the percentile-t bootstrap methods rather than percentile methods.

10.12Bootstrap Methods for Regression Models

The bootstrap methods we have discussed have set Gn(u) = Gn(u; Fn); where Fn is the EDF. Any other consistent estimate of F may be used to de…ne a feasible bootstrap estimator. The advantage of the EDF is that it is fully nonparametric, it imposes no conditions, and works in nearly any context. But since it is fully nonparametric, it may be ine¢ cient in contexts where more is known about F: We discuss bootstrap methods appropriate for the linear regression model

yi = x0i + ei

E(ei j xi) = 0:

The non-parametric bootstrap resamples the observations (yi ; xi ) from the EDF, which implies

0^ yi = xi + ei

E(xi ei ) = 0

but generally

E(ei j xi ) =6 0:

The bootstrap distribution does not impose the regression assumption, and is thus an ine¢ cient estimator of the true distribution (when in fact the regression assumption is true.)

One approach to this problem is to impose the very strong assumption that the error "i is independent of the regressor xi: The advantage is that in this case it is straightforward to construct bootstrap distributions. The disadvantage is that the bootstrap distribution may be a poor approximation when the error is not independent of the regressors.

To impose independence, it is su¢ cient to sample the xi and ei independently, and then create

^

yi = x0i + ei : There are di¤erent ways to impose independence. A non-parametric method is to sample the bootstrap errors ei randomly from the OLS residuals fe^1; :::; e^ng: A parametric

method is to generate the bootstrap errors ei from a parametric distribution, such as the normal ei N(0; ^2):

For the regressors xi , a nonparametric method is to sample the xi randomly from the EDF or sample values fx1; :::; xng: A parametric method is to sample xi from an estimated parametric distribution. A third approach sets xi = xi: This is equivalent to treating the regressors as …xed in repeated samples. If this is done, then all inferential statements are made conditionally on the observed values of the regressors, which is a valid statistical approach. It does not really matter, however, whether or not the xi are really “…xed”or random.

The methods discussed above are unattractive for most applications in econometrics because they impose the stringent assumption that xi and ei are independent. Typically what is desirable is to impose only the regression condition E(ei j xi) = 0: Unfortunately this is a harder problem.

One proposal which imposes the regression condition without independence is the Wild Bootstrap. The idea is to construct a conditional distribution for ei so that

E(ei j xi) = 0

E ei 2

j xi

 

=

e^i2

 

j

xi

=

e^i3:

E ei 3

 

 

CHAPTER 10. THE BOOTSTRAP

191

A conditional distribution with these features will preserve the main important features of the data. This can be achieved using a two-point distribution of the form

 

 

1 + p

 

 

 

!

 

!

 

p

 

1

Pr

e =

5

e^

=

5

 

 

 

 

 

 

 

 

 

 

i

2

 

 

 

i

 

2p5

 

 

p

 

 

!e^i!

 

p

 

+ 1

 

 

5

 

 

Pr

ei =

1

 

=

25p

 

 

2

 

5

For each xi; you sample ei using this two-point distribution.

CHAPTER 10. THE BOOTSTRAP

192

Exercises

Exercise 10.1 Let Fn(x) denote the EDF of a random sample. Show that

p

n (Fn(x) F0(x))

d

! N (0; F0(x) (1 F0(x))) :

Exercise 10.2 Take a random sample fy1; :::; yng with = Eyi and 2 = var (yi) : Let the statistic of interest be the sample mean Tn = yn: Find the population moments ETn and var (Tn) : Let fy1; :::; yng be a random sample from the empirical distribution function and let Tn = yn be its sample mean. Find the bootstrap moments ETn and var (Tn ) :

Exercise 10.3 Consider the following bootstrap procedure for a regression of yi on xi: Let

^

 

^

 

denote the OLS estimator from the regression of y on X, and e^ = y X the OLS residuals.

 

(a) Draw a random vector (x ; e ) from the pair f(xi; e^i) : i = 1; :::; ng : That is, draw a random

^

integer i0 from [1; 2; :::; n]; and set x = xi0 and e = e^i0. Set y = x0 + e : Draw (with replacement) n such vectors, creating a random bootstrap data set (y ; X ):

^

(b)Regress y on X ; yielding OLS estimates and any other statistic of interest.

Show that this bootstrap procedure is (numerically) identical to the non-parametric bootstrap.

Exercise 10.4 Consider the following bootstrap procedure. Using the non-parametric bootstrap,

^

generate bootstrap samples, calculate the estimate on these samples and then calculate

^

 

^ ^

T = (

 

)=s( );

n

 

^

where s( ) is the standard error in the original data. Let qn(:05) and qn(:95) denote the 5% and

95% quantiles of Tn , and de…ne the bootstrap con…dence interval

h i

^ ^ ^ ^

C = s( )qn(:95); s( )qn(:05) :

Show that C exactly equals the Alternative percentile interval (not the percentile-t interval).

Exercise 10.5 You want to test H0 : = 0 against H1 : > 0: The test for H0 is to reject if

^ ^

 

 

 

Tn = =s( ) > c where c is picked so that Type I error is : You do this as follows. Using the non-

 

 

 

^

parametric bootstrap, you generate bootstrap samples, calculate the estimates on these samples

and then calculate

^

^

 

Tn

 

= =s( ):

 

Let qn(:95) denote the 95% quantile of Tn .

You replace c

with qn(:95); and thus reject H0 if

Tn = ^=s(^) > qn(:95): What is wrong with this procedure?

 

 

^

 

^

Exercise 10.6 Suppose that in an application, = 1:2 and s( ) = :2: Using the non-parametric

 

 

 

^

bootstrap, 1000 samples are generated from the bootstrap distribution, and is calculated on each

^

 

 

^

sample. The are sorted, and the 2.5% and 97.5% quantiles of the are .75 and 1.3, respectively.

(a)Report the 95% Efron Percentile interval for :

(b)Report the 95% Alternative Percentile interval for :

(c)With the given information, can you report the 95% Percentile-t interval for ?

Exercise 10.7 The data…le hprice1.dat contains data on house prices (sales), with variables listed in the …le hprice1.pdf. Estimate a linear regression of price on the number of bedrooms, lot size, size of house, and the colonial dummy. Calculate 95% con…dence intervals for the regression coe¢ cients using both the asymptotic normal approximation and the percentile-t bootstrap.

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]