SPECIFICATION TESTING WHEN THE NULL IS NONPARAMETRIC OR SEMIPARAMETRIC

This paper discusses the problem of testing misspecifications in semiparametric regression models for a large family of econometric models under rather general conditions. We focus on two main issues that typically arise in econometrics. First, many econometric models are estimated through maximum likelihood or pseudo-ML methods like, for example, limited dependent variable or gravity models. Second, often one might not want to fully specify the null hypothesis. Instead, one would rather impose some structure like separability or monotonicity. In order to address these points we introduce an adaptive omnibus test. Special emphasis is given to practical issues like adaptive bandwidth choice, general but simple requirements on the estimates, and finite sample performance, including the resampling approximations.


Introduction
When estimating structural econometric models, it is quite difficult to find a situation where economic theory or other information from outside the data enables us to fully specify a model without taking the risk of a serious misspecification. On the other hand, to a certain extent (parametric) modeling is wanted by the researcher, or required by the nature of data in order to overcome problems like identification, estimation, interpretation, numerical performance, etc. To manage this trade-off, that is where semiparametric models are made for. They enable us to include available information and necessary restrictions keeping the rest unspecified. The specification imposed is not limited to parametric functional forms; it can be either the separability of inputs, monotonicity or the conditional distribution. The latter for example might be explored to estimate the mean function of limited dependent responses where alternative estimators are either difficult to obtain or are hard to calculate (cf. Lewbel and Linton, 2002).
Although under such semiparametric approaches the risk of misspecification is considerably reduced, the problem is not negligible. For example, within the framework of conditionally parametric models the root-n consistency of the parameters of interest is not obtained for free; it is obtained under stronger conditions including the correctness of all prior information. Consider the so-called Tobit model where a linear structure in the index function might be recommended by economic theory. However, it does not guide us when choosing the conditional distribution of the latent variable.
The censored Gaussian density is typically chosen just for convenience. Similarly, the pseudo-ML fails if either the moments of interest (typically the mean) are not correctly specified or the score functions do not constitute a valid set of moment conditions. For example, the Poisson pseudo-ML for gravity models is inconsistent even if there is just a small zero-inflation, i.e. a hurdle for trade.
As a conclusion of the above discussion we concentrate on problems where the main interest is related to the conditional mean. Regression tests are mainly available for purely parametric null hypotheses. However, economic theory does not fully specify the index plus the conditional distribution. Instead, it might be that it introduces assumptions such as additivity transformations , homothetical (Lewbel and Linton, 2007) or weak separability. In those cases, the null hypothesis to be tested is semiparametric like for example the one of insignificant interactions (Sperlich, Tjøstheim, and Yang, 2002). Therefore we are testing semiparametric null hypotheses that are related to the mean function which is often estimated by (pseudo-) ML-methods. So our null hypothesis is about the mean, not the conditional distribution, though it may be affected by its particular choice.
For testing a parametric regression model against a broad set of alternatives, many specification tests are available. Following Hart (1997), mainly two approaches are considered: statistics that compare parametric vs nonparametric fits, and those that appear as a weighted average of the residuals. The literature on specification testing has been increasing with almost any new volume of an econometrics or statistics journal; see Gonzalez-Manteiga and Crujeiras (2013) for a review. Nonetheless, nonparametric testing is still less common for semiparametric latent variable models (see Pardo-Fernández, Van Keilegom, and González-Manteiga, 2007). Here we introduce a feasible omnibus test where the null hypothesis can be any parametric or semiparametric regression model. The testing target and the test statistic are both original.
Moreover, the test is data adaptive giving special emphasize on the calibration (see Sperlich, 2013). In order to provide a broadly applicable procedure, we require the estimates of the null model to fulfill rather general conditions. Many of the semiparametric econometric models are based on kernel smoothing, especially when thinking of profiled likelihood based estimates. We therefore have decided for a test statistic that evaluates kernel-convoluted differences 2 . Using kernels or other smoothing methods, a bandwidth needs to be chosen which regularizes the smoothness of the potential alternatives.
Although the problem of bandwidth selection for a test is related to the question of calibration and power, it is often not treated very carefully. The so-called adaptive tests try to balance the need for calibration on the one, and the power maximization on the other hand; see for example Kallenberg and Ledwina (1995), Spokoiny (2001), Härdle, Sperlich, and Spokoiny (2001), or Guerre and Lavergne (2005). We adapt the ideas of the latter three articles to our problem. However, there is an important difference between these mentioned articles and ours: Our null hypothesis is semiparametric whereas theirs are fully parametric. This leads to some conditions on how the smoothing of the null model relates to the one of the alternative in order to obtain a consistent test. Like in the above mentioned papers, and since the asymptotic properties are little helpful in finite samples, to approximate the critical values the inference is done via resampling methods. This reminds us that for a nonparametric test with a semiparametric null hypothesis one has to choose three regularization parameters at least: one for estimating the null, one for the test (referring to the smoothness of the alternative), and one for estimating the critical value. The latter is either the bandwidth for generating bootstrap samples under the null or the size of subsamples when subsampling is applied instead. Note that this triple choice problem is not specific to our problem but typical for all smoothing based tests with a semiparametric null 3 .
While the bandwidth for the null model should be chosen along standard criteria for regression estimation (see Köhler, Schindler and Sperlich, 2013), the testing bandwidth should ideally maximize power. This is why we propose an adaptive test which automatically provides such a testing bandwidth. We further discuss methods to select the regularization parameter of the resampling procedure to reach calibration of the test.
As we give special emphasis to the practicability of our procedures, we study in detail the resulting performance of our proposals.
Summarizing, we present a practical though general approach that allows for testing 3 You need even be more if for example ideas of Zheng (1996) are used to construct the test. semiparametric hypothesis that are quite frequently used in the specification of econometric models such as qualitative response models, truncated and Tobit type specifications, duration models, etc. A main theoretical result of this contribution is that a nonparametric test of a semiparametric null hypothesis is considered. It highlights the conditions on the smoothing that have to be performed in the different steps.

Proposal of the Test Statistic
Suppose that we have a sample of n independent replicates {(Y i , X i )} i=1,··· ,n from the pair of random variables Y ∈ IR, X ∈ X being X a compact set X ⊂ IR d , such that the conditional distribution of Y given X is Y |X (y, x). As we already pointed out in the introduction, our interest is focused in testing the correct specification of the regression function There exist a pleiad of consistent regression specification tests that do not depend on a pre-specified choice of Y |X (y, x). However, in some situations of interest for economists, such as multi-index models (Tobit type models, Amemiya, 1985) practitioners prefer to work under standard assumptions such as Gaussian errors of the underlying latent variable model.
If this is the case, the null hypothesis will take the form with a family of conditional distributions 0 Y |X (y, x; θ, η 1 , · · · , η p ) : θ ∈ Θ, η 1 (x 1 ) ∈ H 1 , · · · , η p ( where Θ is a compact subset of IR k , H 1 , · · · , H p are respectively compact subsets in IR, and V is a compact subset in IR ν . The vectors x j ∈ X d j , j = 1, · · · , p, are mutually exclusive subsets of x such that X = X d 1 × · · · × X dp . Further, the η are assumed to be unknown smooth functions η j : X d j → H j that take values in a set In order to motivate our problem of interest consider the latent regression model where the u i 's are assumed to be random drawings from N (0, σ 2 ). γ is an unknown vector of parameters that needs to be estimated and we introduce a nonparametric relationship η (·) that also needs to be estimated. Assume the following mechanism of censoring, and the regression function takes the form where φ(·) and Φ(·) stand respectively for the pdf and the Gaussian distribution function. Note that for the null hypothesis to be true, we do not only need a correct specification of (4). We also need correct specification of both, the censoring mechanism, the conditional distribution and the homoskedasticity assumption. Of course there are situations such as non-existing moments, quantiles or other robust quantities where focusing on the likelihood would be more natural. In many cases the estimator for m (x) under the null can be calculated directly from some consistent estimatorsθ andη 1 , · · · ,η p , and can generally be written as m S (x) = y 0 Y |X y, x;θ,η 1 , · · · ,η p dy.
We will specify the above mentioned consistent estimators in assumption (C.1). Let ω(u) ≥ 0 be a bounded weight function. Then the testing problem can be written as Several alternative testing approaches are thinkable. Several comparison studies (Dette, von Lieres und Wilkau and Sperlich, 200;Roca-Pardiñas and Sperlich, 2007) show that a quite successful one is to construct a statistic with convoluted differences in order to handle the bias problem. For kernel K : IR d → IR, bandwidth h, define The statistical properties of this statistic depend both on the choice of the bandwidth h, and the asymptotic behavior ofθ andη 1 , · · · ,η p . On these grounds, it is important to discuss how the smoothing parameters and the estimators are chosen. With respect to the estimators, for the parametric part θ, we require a root-n consistent estimator.
In this way, the parametric estimator does not affect the asymptotic distribution of the test statistic. This may happen with other estimators that exhibit a slower rate of convergence. For the nonparametric estimators, only one minimal requirement of uniform convergence at a certain nonparametric rate, r, is assumed.
This requirement is rather general since it is fulfilled by most part of nonparametric estimators that appear in the econometric literature. Furthermore, by imposing this condition we avoid that the statistical properties of the test neither depend on the form of the estimator for the η's nor in their bandwidth parameters. As it was already established in Stone (1982), the optimal rate of convergence, r j , depends on the smoothness class in which η j is assumed to be, the dimension of x j or the type of deviation function we are choosing. More precisely, the uniform maximal deviation converges slowly to zero by a factor log n (see Mack and Silverman, 1982, section 3). Note that r j is strictly smaller than 1/2 as it is also expected in the nonparametric literature. For example, if the second derivative of η j is Hölder continuous and the dimension of x j is d j , the optimal rate will take the value r j = 2 4+d j . We also impose that r j ≥ 1 4 . That is, the rate of convergence does not have to be too slow. The same condition is also imposed in Severini and Wong (1992) for the nonparametric estimator that is included in the conditionally parametric model. For an example of estimator for η j that fulfils these conditions see Rodriguez-Poo, Sperlich and Vieu (2013), Section 5.
The choice of bandwidth h in (5) should ideally maximize the power of the test. As the distribution of I h will vary with h, it is natural to consider the standardized version whereB is an estimator of the expectation of nh d/2 I h under H 0 , andV an estimator of the variance of nh d/2 I n , i.e.
Here, we denote by K (2) the convolution of kernel K, p is the marginal density of X, and σ 2 (x) is the conditional variance of Y given X. The standardization (6) creates a family of test statistics {T h , h ∈ H n }, where the choice of h marks the difference between the null and the global alternative. Following the idea of adaptive testing (Spokoiny, 2001) we propose to use test statistic where the value of h is taken from a set of bandwidths H n with cardinality J n , namely Another relevant issue in the computation of the test statistic is the estimation of σ 2 (x).
Note that this variance expression appears in (7) and (8). Horowitz and Spokoiny (2001) propose an estimator for σ 2 (x) that needs to be consistent under the alternative.
Often such an estimator is not easily available or it is only under the costs of further restrictions. The difference estimator proposed by them works only reasonably well in one dimensional regression problems. Here, we do not need such a strong condition but instead we just ask for an estimator of σ 2 (x) fulfilling the following assumption Guerre and Lavergne (2005) discuss some drawbacks of the Horowitz-Spokoiny approach and propose an alternative choice of h that will be considered in Section 4 together with its practical implementation.
An analytical expression for the critical values of T * needs extreme value theory. As this is cumbersome and it gives little helpful approximations in practice, we propose resampling strategies. No matter whether (wild) bootstrap or subsampling is applied, for the resampling estimates, saym S (x) andσ 2 (x), we ask for similar conditions as in Horowitz and Spokoiny (2001), that is: Note that assumption (C.3) is rather general. In order to be more precise about the consistency of the resampling estimates we have to be more specific about the model and the estimators considered, loosing thereby the pretended generality. For the consistency of related bootstrap problems see for example Giné and Zinn (1990) or Härdle, Huet, Mammen, and Sperlich (2004). The alternative subsampling (Politis, Romano, and Wolf, 1999) procedure works similarly. One draws subsamples of size n s < n from the . Take b = h (n/n s ) r as the bandwidth for the subsampling estimator to calculate the subsample analogues,T * . r is the chosen bandwidth rate.
Finally, compute the empirical 1 − α quantile as above. In cases where H 0 is violated, as n tends to infinity, nh d/2 I h grows faster than n s b d/2 I * b , and thereforeT * will become larger than T * resulting in a rejection of H 0 . General results on asymptotic theory can be found in Politis, Romano, and Wolf (2001). To our knowledge, the choice of optimal subsample size n s for non-and semiparametric specification testing has only been treated by Delgado, Rodriguez-Poó, and Wolf (2001), and Neumeyer and Sperlich (2006).
One may also estimateB,V , and this way implicitly also σ 2 (x) by resampling. For with Y * i drawn from the estimated null distribution, and calculateT * like before T * . Repeating this many times we can calculate the empirical 1 − α quantile ofT * , t α , and use it as critical value for T * .

Asymptotic Behavior
For the ease of notation but without loss of generality we set p = 2 for the rest of the paper. Recall that level and power of our test will depend on the theoretical properties ofη 1 andη 2 (p = 2), appearing in the test through m S (x). Therefore, we first analyze the properties of this regression function estimator and afterward the test. We use the following assumption on Y : The first result is a crucial tool when studying our test statistic. Note that by having considering a separable model we avoid the curse of dimensionality under the null.
Lemma 1. Under conditions (C.1) and (B.1), if the null hypothesis H 0 is true, for n tending to infinity, Where b n is an increasing sequence for which n b −s n converges.
To study the properties of statistic T * we need the following additional conditions: (S.2) Densities p and p j of X j , j = 1, 2, are β-times continuously differentiable on X j .
The function α(x) 2+µ is bounded above and below, for µ = 0 and for some µ > 0 in any place in the support of X .
(K.1) K is a compactly supported and continuously differentiable kernel of order β satisfying K(u)du = 1.
These are standard conditions on kernel and smoothness of densities and conditional expectations. The next set of conditions defines the set of bandwidths for the test statistic and relates it to the smoothing applied when estimating the null model m S .
(H.1) The set H n of bandwidths has the structure (10) for some finite constant C H > 0, and r = min{r 1 , r 2 }. Furthermore, J n ∼ log n as n tends to infinity.
Assumption (H.1) is similar to Horowitz and Spokoiny (2001). However, there is an important difference. The fact that we have a non-or semiparametric null hypothesis increases the bias problem. Often, the bias problem has been solved by smoothing the (parametric) estimator under the null hypothesis, too. In our case, we smooth the function estimate under the null which already shows nonparametric rates of convergence, recall Section 4. To control for this problem the upper bound of the set H n depends on the rate of convergence ofm S (x). In the case r = 1 2 , that is the null hypothesis is fully parametric, the upper bound coincides with that assumed by Horowitz and Spokoiny (2001). However, as it is in our case, if the estimator under the null exhibits a nonparametric rate, then h max must tend to zero at a rate that adapts to the convergence rate of the estimator. As usual, conditions (S.1), (S.2), (K.1) take care for the reduction of the higher order terms of the bias. Then, the asymptotic behavior of the test under H 0 is given by hold. If the null hypothesis H 0 is true, then for n tending to infinity In order to determine the power, we define a sequence of local alternatives.
where function ψ(·), not depending on n, shall describe the deviation from the conditional density under the null, respectively from the mean function, and the sequence γ n is supposed to be such that: It is clear that assumptions (A.1) and (A.2) on the one had are also related to the bias problem that has been mentioned above. On the other hand they also guarantee the If the alternative H a is true with (A.1) and (A.2), then we have for n → ∞ that Theorem 2 says that our test has nontrivial power only against sequences of local alternatives for which γ n tends to zero at a rate that is smaller than √ n. It is known that tests based on weighted parametric residuals have nontrivial power against local alternatives for which the rate is exactly √ n. Thus, at least in terms of the asymptotic local power these tests appear to dominate tests that require slower rates. However, as discussed in Guerre and Lavergne (2002) or Horowitz and Spokoiny (2001), at an exact rate of √ n no omnibus test can have nontrivial power uniformly over non-artificial classes of functions ψ(·), respectively ϑ(x) in (11). Moreover, what we can see, recall

Finite Sample Performance and an alternative Bandwidth Choice
To illustrate the performance of our testing procedure we present simulation results for different models with censored responses: where where γ, η(·), and error variance σ 2 are arbitrary. Then we have with φ and Φ indicating the normal density and its cumulated distribution function.
The different real data generating processes (DGP) are as follows: all with an additional N (0, 0.5 2 ) error term u. Further, we generate data as in (15) but replacing error u by So the error variance is always 0.5 2 . Finally, we also consider models with Note that among (15) to (20), models (15) and (19) are the only ones that belong to the null hypothesis H 0 , compare (13). For the other models, the additional terms (deviating from the null model) are centered to zero such that they do not affect the censoring threshold in (12). We simulated also models without such centering (not shown here).
In those cases the power of the test is always overwhelming. We concentrate here on the presentation of the cases when detection of an alternative is a hard problem. Notice that the variation due to the deviation from H 0 (i.e. E[κ 0 (X) − κ j (X)] 2 , j = 1, 2) is 28/9 in (16), whereas only 16/45 in (17). Recall that we only worry about the effect of a misspecification on the regression estimation.
For the simulations we use only 250 bootstrap samples, respectively subsamples. In empirical research one should certainly take more. Having consistent (under H 0 ) estimates of κ 0 and σ 2 (even root-n of the latter), the bootstrap sample can be generated . This is actually a bootstrap from a known parametric distribution, i.e. the normal censored at zero, with semiparametric first and parametric second moment. The estimation of the model under the null hypothesis is done accordingly to Rodríguez-Póo, Sperlich and Vieu (2003). The percentages of rejections under the null are calculated out of 500 simulation runs whereas only 250 are used to approximate the rejection levels under H 1 . As we use uniform distributions for all covariates in this simulation, we set w(x) = 1 for the entire simulation study.
An else obvious choice is either a trimming weight or the density of X such that the integral can be replaced by the sample average.
As has been discussed earlier in this paper, there exist some alternative approaches to make a nonparametric test adaptive in the sense of choosing the test bandwidth h adaptively. Among them the approach of Guerre and Lavergne (2005) where V ar 0 refers to the variance under H 0 . In our context h max is the largest bandwidth in H n , cf. Guerre and Lavergne (2005 . This is what we have done in our simulation study. In our context this approach gives basically the same results for all simulated models and rejection levels. We conclude therefore that both approaches seem to be reasonable alternatives for the choice of the testing bandwidth h in (5) albeit Guerre and Lavergne (2005) showed some theoretical advantages of their approach. In practice this might vary for different testing problems, models and implementations.
The bandwidths h j from (C.1) to estimate the nonparametric part η(·) of the null model (13) can be chosen by jackknife or generalized cross validation, depending on the particular estimation problem. This gives bandwidths that produce slightly undersmoothed estimates. Trying out various bandwidths for each h j on a reasonable range we always got rather similar results. This might be due to the adaptive selection of h, determined by a naive grid search with J n = 15 and a = 1.05, cf. (10). Let us call h * j the bandwidths to (pre-)estimate the null model from which the bootstrap samples will be drawn from. The h * j should be chosen depending on h j to fulfill Assumption (C.3) but in practice are often set just equal to h j .
In Table 1 and Table 2 are given the rejection levels and bootstrap p-values for models (15) to (18) for sample size n = 200 applying bootstrap with h j = 1.75σ x /n 1/5 , h * j = h j and h * j = h j n 1/5−1/6 respectively. Here, σ x denotes the vector of sample standard deviation(s) of X 2 . We used second order quartic kernels, and we set h max = 3σ x /n 1/7 in (10).  (17). One reason could be that model (17) is smoother and thus the alternative gets estimated more easily; another reason could be that the bootstrap samples were generated with the residuals from H 0 . When the functional form is specified correctly and only the error distribution deviates from H 0 , the test does not reject as long as this misspecification has no effect on the regression estimation. In Table 3 we see that (18), u * * i , has no effect on the estimation of the parametric part. Nevertheless, we find nontrivial power for the case of asymmetric error distribution, u * , even though it affects the regression estimation only very mildly. Actually, when we increase n to 400, at the nominal 5% significance level we reject in about 7.5%, and for n = 500 in about 10% of all cases.   h * j = h j n 1/9−1/13 , and a 4th order optimal product kernel. Note that we tried many other bandwidth combinations: However, for moderate sample sizes we can not get rid of the size problem for any reasonable bandwidth. A detailed discussion of the calibration problem in nonparametric testing is given in Sperlich (2013). The power of the test seems to be rather strong now but clearly, this may be misleading due to the size problem. There exists the subsampling as an alternative resampling method. This -at least in our simulations -has turned out to be rather reliable concerning the size of the test, even for small samples and certainly with some loss of power. As the problem of finding the optimal subsample size in nonparametric testing is known (see e.g. Neumeyer and Sperlich, 2006), we concentrate here on size and power of the test but take the subsample size as given.

Horowitz-Spokoiny
In Table 4 are given the number of rejections when n = 300, subsample size n s = 250, bandwidth h 1 = h 2 = 2σ x /n 1/9 with 4th order optimal product kernels, and h max = 3 · stdev(X) · n −1/8 in (10). Again we tried several bandwidths, and again the results do not vary much as long as h 1 , h 2 are providing a reasonable smoothing, i.e. do not strongly over-or undersmooth. Taking a cross validation bandwidth is again a good choice.  Table 4: The subsampling p−values and percentages of rejections at various significant levels for different models when sample size is n = 300, and subsample size n s = 250.

Proof of Lemma 1
For the proof of Lemma 1 we first need to establish the following proposition: Proposition 1. Assume that condition (C.1) holds. Under H 0 we have = O p n −r 1 (log n) r 1 + n −r 2 (log n) r 2 + O p n −1/2 as n tends to infinity.
Proof of Proposition 1. Since by assumption ( This is uniform because X 1 , X 2 and Θ are compact sets. So condition (C.1) applies.
Proof of Lemma 1. The proof of this result will be a direct consequence of Proposition 1 since m(x) = y Y |X (y, x 1 , x 2 ) dy.
Thus we can write Furthermore, under the null hypothesis we have Thus, the left hand side of (22) is equal to Note that (24) is bounded by the following term ydy sup Noting that since P (|y n | > b n ) ≤ b −s n E |y| s , it follows with probability one that |y n | ≤ b n for all sufficiently large n, and hence since b n is increasing, Now, using the fact that by a standard argument, it follows that (25)

Proof of Theorem 1
Before proving Theorem 1 we need first to state a few technical results. This is done in lemmas 2-7 below. In the following, as above, double hat refers to estimators from the re-samples like e.g.η 1 .
Then, for some µ > 0, under the null hypothesis, H 0 , as n tends to infinity. .

Applying (S.2) and a strong law of large numbers we obtain
and by Lemma 1 where recall that r = max {r 1 , r 2 }. Furthermore, since h ∈ H n , by condition (H.1) what closes the proof. Now, for the following we define ε j = Y j − m(X j ).
Then, for some µ > 0 under the null hypothesis H 0 one has as n tends to infinity.
Proof of Lemma 3. We have Integrating by substitution and using a standard inequality for expectations gives Using Lemma 1 and assumptions (S.3) and (H.1) and recalling that h ∈ H n we obtain .
This closes the proof.
Then, under the null hypothesis, H 0 , as n tends to infinity.
Because of Card(I m,s ) = O(n s ), we have directly Let us now come back to the term E[W 2+µ n ]. By using the last equality for any m = 1, .
Then, because of Lemma 1, given that h ∈ H n and assumption (H.1), we have hold. Then, for all z, we have under H 0 and as n tends to infinity: Proof of Lemma 5. By assumptions (S.2) and (S.3) and using definitions (5) and (26), a Taylor expansion around V gives as n tends to infinity. Then, using the triangle inequality we obtain We treat each of the above terms separately. For the first term of the r. h. s. in (28) the following inequality holds for some µ > 0. Let us now decompose I h η 1 ,η 2 ,θ and I 0 h (η 1 , η 2 , θ) as where U n , V n and W n are defined in previous lemmas. Then, the following inequality for some constants C 1 , C 2 , C 3 > 0. Now, applying Lemmas 2, 3 and 4, and assumption (H.1) the first term of the r.h.s. in (28) tends to zero as n tends to infinity.
For the second term holds the following bound, , and substitutingB and B by their definitions in (7) we obtain .

Now using assumption (C.2) and (H.1) we get
This closes the proof. hold. Then, as n tends to infinity, andỸ j = m (X j ) + σ (X j ) ε * j with ε * j being resampled errors (having mean zero and variance one).
We first show (31). Write To prove (31), it suffices to show that for n → ∞ Note that taking iterated expectations, using the i.i.d. structure of our observations and integrating by substitution, it is straightforward to show that Hence, (32) is of order O 1 n 2 h∈Hn 1 h 2d . Now, using (10) and because h ∈ H n , this is indeed of order O andm S (x) andσ 2 (x) are resampling estimators.
Proof of Lemma 7. Using (29) and (33), under conditions (S.2) and (S.3) a Taylor expansion around V giveŝ as n tends to infinity. Now, by the triangle inequality We treat each of the above terms by separate. By a standard inequality, for some µ > 0. Noting that the following bound holds for some constants C 1 , C 2 , C 3 , C 4 , C 5 > 0.
Furthermore, integrating by substitution there exists a constant C > 0 such that Applying (C.2) and the bound achieved in (38) we get Following the same lines as above it is straightforward to show that the second term, is also of order o 1 n 2+µ h d 2 (2+µ) log n , as n tends to infinity.
Proof of Theorem 1. The proof of this result is done if we can show thatT * and T * have identical asymptotic distribution. This follows directly from previous lemma, since as n tends to infinity, we have by Lemma 5 that max h∈Hn T h = max h∈Hn T h0 + o p (1), while Lemma 6 insures that max h∈Hn T h0 and max h∈HnTh0 have identical asymptotic distribution. So, because Lemma 7 insures that max h∈HnTh0 = max h∈HnTh + o p (1), we get that max h∈HnTh and max h∈Hn T h have the same asymptotic distribution.

Proof of Theorem 2
The proof works by following the lines of the proof for Theorem 1 but replacing Lemmas 2-7 by similar results stated under the alternative hypothesis. We start the proof by stating the following Lemmas, along which the notations used above.
Lemma 8. Under the assumptions of Theorem 2 we have for any h ∈ H n and as n tends to infinity: U n = Cγ 2 n + o p n −1 h −d/2 .
Proof of Lemma 8. The proof follows the lines of the one for Lemma 2, but now assuming H 1 and that consequently expression (39) holds. Then we obtain Applying (S.2) and a strong law of large numbers we obtain U 1n = Cγ 2 n + o p (1).
Lemma 9. Under the assumptions of Theorem 2 we have for any h ∈ H n and n tending to infinity: V n = o p (U n ) Then integrating by substitution Using assumptions (C.1) and (H.1) and recalling that h ∈ H n , For the variance expression we obtain Integrating by substitution, using conditions (C.1) and (H.1) and recalling that h ∈ H n , |V ar (V 2n )| = o p 1 n 2 h d .
Apply conditions (A.1) and (H.1) and for all h ∈ H n the proof is done.
Lemma 10. Under the assumptions of Theorem 2 we have for any h ∈ H n and as n tends to infinity: W n = o p (U n ).
Proof of Lemma 10. The proof of this result is based on the proof of Lemma 4. The bias term is E (W n ) = 0. For the variance term proceed as in the proof of Lemma 7 but using H 1 , and therefore equality (39). We obtain E (W 2 n ) ≤ Cγ 2 n n . This closes the proof of the Lemma.
Lemma 11. Under the assumptions required either for Theorem 1 or Theorem 2, we have |t α | ≤ M < ∞.
Proof of Lemma 11. Because of Lemmas 7 and 8, under the conditions of either Theorem 1 or 2 we have max hTh = max h T h0 + o p (1). So it suffices to see that to get the proof. Note that (48) is obtained as long as we have which is easy to obtain by working the terms out.
Lemma 12. Under the assumptions of Theorem 2, and n → ∞, we have for any z P max h∈Hn |T h − T h0 | > z → 1.
Proof of Lemma 12. Using the same kind of decomposition as in Lemma 5 the claimed result follows directly from Lemmas 9, 8, 10 and 11.
Proof of Theorem 2. Note that for the proof of Lemma 6 we did not use the So it suffices to combine this with Lemma 12 to close the proof of Theorem 2.