Efficient nonparametric three-stage estimation of fixed effects varying coefficient panel data models

This paper is concerned with the estimation of a fixed effects panel data model that adopts a partially linear form, in which the coefficients of some variables are restricted to be constant but the coefficients of other variables are assumed to be varying, depending on some exogenous continuous variables. Moreover, we allow for the existence of endogeneity in the structural equation. Conditional moment restrictions on first differences are imposed to identify the structural equation. Based on these restrictions we propose a three stage estimation procedure. The asymptotic properties of these proposed estimators are established. Moreover, as a result of the first differences transformation, to estimate the unknown varying coefficient functions, two alternative backfitting estimators are obtained. As a novelty, we propose a minimum distance estimator that, combining both estimators, is more efficient and achieves the optimal rate of convergence. The feasibility and possible gains of this new procedure are shown by estimating a Life-cycle hypothesis panel data model and a Monte Carlo study is implemented.


Introduction
Two of the most important issues that econometricians must face when modeling individual choice in demand systems or market equilibrium are the presence of endogenous variables and individual heterogeneity (see Heckman (2008)).Traditionally, instrumental variable models (IV) have been proposed as the solution to account for endogeneity, whereas heterogeneity has been handled through the use of panel data techniques (see among others Arellano (2003)).Trying to cope with both issues at the same time, instrumental variable models exhibit a long tradition in the panel data analysis literature (see for example Hsiao (2003), Chapter 5).In many situations, economic theory does not imply tight functional form specifications for instrumental variable models so that it is useful to consider nonparametric and semiparametric extensions.Unfortunately, the introduction of this flexible specifications has a cost in terms of curse of dimensionality (see Härdle (1990) for details).One solution to this problem that exhibits a clear motivation from economic theory (see Chamberlain (1992)), encompasses many alternative models (i.e., fully nonparametric models and partially linear models) and avoids the so-called ill-posed inverse problem in general nonparametric instrumental variable models (see Newey and Powell (2003)) are the so-called varying coefficient models.

INTRODUCTION
This paper is then concerned with the estimation of a fixed effects panel data model that adopts a partially linear form, in which the coefficients of some variables are restricted to be constant but the coefficients of other variables are assumed to be varying, depending on some exogenous continuous variables.Moreover, we allow for the existence of endogeneity in the structural equation.This structure leads itself naturally to a semiparametric three stage estimation procedure that is based in a transformed (first order differenced) structural model.In the first stage, endogenous variables are projected on a set of instrumental variables, in the second stage constant coefficients are estimated through a profile least squares approach and finally, in the third step nonparametric techniques are used to estimate the varying coefficients.Unfortunately, the estimators obtained in this last stage achieve a rather slow rate of convergence.In order to improve its rate, and following the ideas in Fan and Zhang (1999), a one-step backfitting procedure is developed.It turns out that the resulting estimator is oracle efficient and exhibits an optimal rate of convergence.However, as a result of the first differences transformation, two alternative backfitting estimators for the same unknown function of the varying parameters are obtained.With the aim of improving the efficiency, we combine both estimators through a minimum distance estimation technique and hence, the resulting estimator is more efficient.As far as we know, this approach is completely new and the minimum distance estimation technique applied to this problem has never been used before in the literature.
To avoid the ill-posed inverse problem (see Newey and Powell (2003) for details) but at the same time, to keep some model specification flexibility, conditional moment restrictions on first differences are imposed to identify the structural equation (see Ai and Chen (2003), Hall and Horowitz (2005) and Newey (2013) among others for a similar approach).Other procedures such as the so-called control function approach proposed in Heckman and Robb (1985), Blundell et al. (2013), Darolles et al. (2004), Gao and Phillips (2013), and Su and Ullah (2008) among others are available at the price of assuming other type of identification assumptions.To the best of our knowledge, Cai et al. (2006), Cai and Xiong (2012) and Cai et al. (2017) are the most relevant references for varying coefficient with endogenous covariates, but they completely ignore the panel data case.On its part, several papers (see Rodriguez-Poo and Soberon (2017) for a survey) analyze panel data varying coefficient models, but the resulting estimators are not robust to the presence of endogeneity.Recently, Fève and Florens (2014) consider the estimation of nonparametric panel data models using an instrumental variable condition.However, their results do not apply straightforwardly

INTRODUCTION
to the varying coefficient model.Finally, some IV methods have been proposed in the context of varying coefficient panel data models with random effects.In Cai and Li (2008) it is proposed to estimate the unknown functions of interest by using the so-called nonparametric generalized method of moments.However, this method does not control for heterogeneity when it is correlated with some explanatory variables, and hence it renders to asymptotically biased estimators when fixed effects are present.Since the semiparametric partially linear varying coefficient model encompasses several alternative specifications that can be of great interest for econometricians (i.e.partially linear model, fully linear parametric model), based in Cai et al. (2017) and references therein, we propose a Wald-type test statistic.Furthermore, we provide a technique to compute confidence bands for the varying coefficients.To show the feasibility and possible gains of this new procedure, it is applied to extend a Life Cycle Hypothesis (LCH) model as the one proposed in Chou et al. (2004) to a panel data model.
The structure of this paper is as follows.In Section 2 we set up the econometric model and we describe the three-step estimation procedure.In Section 3 we show their asymptotic properties.In Section 4 more efficient estimators such as one-step backfitting and minimum distance estimators are provided.Section 5 develops a Wald-type test for the constant coeffi-

MODEL AND ESTIMATION PROCEDURES
cients and a pointwise confidence bands for the functional coefficients.In Section 6 a Monte Carlo study is presented to investigate the finite sample performance of the proposed estimators and test statistic.Section 7 applies our methods to the estimation of a LCH model.Finally, Section 8 concludes the paper.Assumptions and proofs of the main results are relegated to the Supplementary Material.

Model and estimation procedures
A partially varying coefficient panel data model assumes the following form: where Y it is an observed scalar random variable, X 1it and U 1it are (d 1 × 1) and (k 1 × 1) vectors of endogenous random variables, respectively, X 2it and U 2it are vectors of exogenous random variables of dimension (d 2 × 1) and (k 2 ×1), respectively, it is the random error, and µ i denotes the unobserved individual heterogeneity.Also, the structural equation (2.1) includes some unknown functions (i.e., m 1 (•) and m 2 (•)) of a (q × 1) vector of exogenous continuous random variables, Z it , and some constant coefficients (i.e., β 1 and β 2 ) that need to be estimated.Furthermore, let us denote L it as a ( ×1) vector which contains all exogenous variables (i.e., Z, X 2 , and U 2 ) and a M -Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing)

MODEL AND ESTIMATION PROCEDURES
dimensional vector of other instrumental variables, where = q+d 2 +k 2 +M and ≥ d 1 + k 1 , which is the identification condition that the number of instruments is larger than the number of endogenous variables.A similar definition is given for L i(t−1) .We assume the following moment condition The above model is general enough to include relevant empirical examples in the economics literature.For example, based on the Life-cycle hypothesis (LCH) theory, Gourinchas and Parker (2002) and Kuan and Chen (2013) show that the elasticity of preventive savings to changes in net wealth and/or medical expenses varies according to certain households' features such as the age of the household head.
Further, the model above allows for two different sources of endogeneity.First, there exists a subset of endogenous explanatory variables (i.e., X 1 and U 1 ).Second, the heterogeneity term, µ i , can be also arbitrarily correlated with Z, X and/or U (i.e., fixed effects).It is already well-known that, whether we ignore both sources of endogeneity, direct estimation of the functions of interest rends to estimators asymptotically biased.The second source of endogeneity can be handled by taking a first difference Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing)
For any given β, and ∆Y * it = ∆Y it − ∆U it β, in Rodriguez-Poo and Soberon (2014) it is proposed to estimate the quantities of interest, m (•), for a given point z ∈ A where A is a compact subset in a nonempty interior of IR, by minimizing the following criterion function with respect to γ, where γ = m(z).In addition, H is a q × q symmetric positive definite bandwidth matrix and K is a q-variate such that Note that the kernel weights in (2.4) are related to both Z it and Z i(t−1) .This is a significant issue because it enables us to overcome the non-negligible asymptotic bias characteristic of the differencing nonparametric estimators.
Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing)

MODEL AND ESTIMATION PROCEDURES
If we consider kernels only around Z it , the remainder term in the Taylor's approximation will not be negligible since the distance between Z is (s = t) and z does not vanish asymptotically.This phenomena was already pointed out in Mundra (2005) and Lee and Mukherjee (2014), but it was solved in Rodriguez-Poo andSoberon (2014, 2015) for a local linear regression.
Unfortunately, although the resulting estimator for γ using the latter proposal is robust to fixed effects, it is still subject to the first endogeneity problem (i.e., the endogeneity of X 1 and U 1 ).Taking expectation on both sides of the structural equation (2.3), conditioning on both L it and L i(t−1) , and using the condition (2.2) one can obtain the following where Then, taking into account (2.5) and proceeding as above the coefficient functions m(•) can be estimated by minimizing the following criterion function with respect to γ, where γ = m(z), where ∆W Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing)

MODEL AND ESTIMATION PROCEDURES
Assuming Unfortunately, m β (z; H 2 ) is an infeasible estimator because the vector of parameters, β, and the nonparametric functions ) and ) are unknown and need to be estimated.The first stage in our estimation procedure is the estimation of the nonparametric functions.Let us denote by respectively, with bandwidth H 1 .They can be for example local linear or constant estimators.
The second stage is to estimate β.We propose a conventional profile least squares estimator (see Fan and Huang (2005)) is the vector ∆W U it where the unknown functions have been replaced by Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing)

MODEL AND ESTIMATION PROCEDURES
consistent estimators and finally S is a smoothing matrix, i.e.
∆ W X it and W X it are the vectors ∆W X it and W X it respectively, where the unknown functions have been replaced by consistent estimators.Finally Note that β can be considerably affected by the residuals from the first stage.To overcome it, following Cai et al. (2017) we propose a modified estimator of the form where S is a smoothing matrix of the form Finally, once obtained the estimator for β, replacing the unknown quantities by their estimated objects the resulting three stage estimator for m (•), Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing)

STATISTICAL PROPERTIES
at any given value of z, is Note that the criterion function (2.7) stands for the local constant approximation to m (.).A straightforward extension would be to extend our results to the local linear case.In Section 1 of the Supplementary Material we provide all expression of the three stage estimators for this case.

Statistical properties
In this section, we investigate some asymptotic properties of the estimators proposed in the previous section.Under some technical assumptions provided in Section 2 of the Supplementary Material, we present their asymptotic behavior.The detailed proofs of the following results are also given in Sections 3 to 5 of the Supplementary Material.From now on, let us denote n = N (T − 1).
Theorem 3.1.Suppose that Assumptions S2.1-S2.10hold.When N tr(H 2 ) 2 → 0, as N tends to infinity and T is fixed, we have Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing)

STATISTICAL PROPERTIES
In addition, the asymptotic normality of the three stage estimator m β (z; H 2 ) can be established as follows Theorem 3.2.Suppose that Assumptions S2.1-S2.10.As N tends to infinity and T is fixed, we have where Moreover, D mκ (z) is the first order derivative vector of the κth component of m(•), H mκ (z) its Hessian matrix, and D f (z) the first order derivative vector of the density function, for κ = 1, . . ., d.Also, diag d (tr(H mκ (z)H 2 )) and Under the previous assumptions, the asymptotic normality of the local linear version of the three stage estimators is collected in Corollaries S1.1 and S1.2 that appear in Section 1 of the Supplementary Material.Furthermore, note that the results from Theorem 3.2 show a bias term that asymptotically depends only on the smoothness of m(•) and E(∆X 1it |L it , L i(t−1) ).
The dependence on β and and Rodriguez-Poo and Soberon ( 2015), we will propose a one-step backfitting algorithm that will make the rate of convergence of our estimators optimal.

One-step backfitting and minimum distance estimators
In this section we first propose a one-step backfitting algorithm that will enable us to achieve optimal nonparametric rates of convergence of the estimators for m(•).In addition, as it will be detailed further in this section, because of the additive structure of the regression model, the backfitting procedure generates two alternative estimators for m(•).Nevertheless, by combining both estimators through a minimum distance estimation technique it is possible to obtain a more efficient estimator for m (•).
Applying the well-known one-step backfitting procedure, we propose the following three stage estimator.Assuming Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing)

MORE EFFICIENT ESTIMATORS
is definite positive, m The main idea of the application of the backfitting algorithm here is to sum X i(t−1) m β (Z i(t−1) , H 2 ) in both terms of the first differenced structural equation in (2.3).By doing so, the structural model is transformed into a very simple expression, and Then, the unknown function m(•) in (4.12) can be estimated following the same steps as in (2.6)-(2.9)obtaining now (4.11).
Given the additive structure of (2.3), a second estimator for m (•) can be obtained.Assuming

MORE EFFICIENT ESTIMATORS
definite matrix, then an alternative backfitting estimator for m (•) is where ∆Y it and again m β (•, H 2 ) is the estimator defined in (2.10).Substracting X it m β (Z it , H 2 ) in both terms of (2.3) and proceeding as above we obtain (4.13).
Therefore, this technique provides two different estimators, m β (z; H 3 ) and m (2) β (z; H 3 ), for the same m(z).A natural idea to combine both in an efficient way would be to obtain such estimator by minimizing the following criterion function We propose to calculate the estimators m (1) β (z; H 3 ) and m (2) and, as N → ∞, N j /N → c j for c j > 0, j = 1, 2 and c 1 + c 2 = 1.These subsamples need to be chosen randomly across individuals (see Politis et al. (1999) for details).For a given value of z, the value of m(z), m where and the (d × d) matrix W ij m (z) is the (i, j)-th component of the block partitioned matrix W −1 m (z), for i, j = 1, 2. This estimator belongs to the class of the so-called minimum distance estimators.In order to select the weighting matrix, W m (z) many alternatives are available.We choose the matrix, W * m (z), that minimizes the asymptotic variance-covariance matrix of m (mde) β (z; H 3 ).Indeed, in Hansen (1982) it is shown that , where V (1+2) (z) stands for the asymptotic variance-covariance matrix of m (1) .
We now proceed to analyze the asymptotic properties of both the backfitting and the minimum distance estimator.
Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing)

Asymptotic properties
The following theorems present the limiting distribution of the backfitting estimators.Note that it achieves the optimal rate of convergence for this smoothness class.
Theorem 4.1.Suppose that Assumptions S2.1-S2.12hold.As N → ∞ and T is fixed, we have for j = 1 and j = 2, The proof of this result is postponed to the Supplementary Material.
We focus now on the asymptotic properties of the minimum distance estimator, m

INFERENCE
Theorem 4.2.Suppose that Assumptions S2.1-S2.12hold.As N → ∞ and T is fixed, we have where The proof of this result is relegated to the Supplementary Material.
Finally, focusing on Theorem 4.2 it can be highlighted that the asymptotic bias of the minimum distance estimator is the same as in Theorem 4.1.
Moreover, the asymptotic variance exhibits the optimal rate of convergence of this type of problems.Finally, note that it is easy to show that for any Therefore, it is proved that this technique enables us to obtain more efficient estimators for m(•) and, at the same time, to achieve optimality.

Inference
The statistical model that is of our concern, see equations (2.1) and (2.2), is considerably rich and it nests many models of interest in Econometrics and

Statistics. For example, it is natural to investigate whether certain variables
Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing)

INFERENCE
in this component are statistically significant after fitting the model.More generally one might consider the set of linear hypothesis where and Q is the number of hypotheses on the null.Indeed, using Theorem 3.1, this testing problem can be handled by using the following Wald-type test statistic Following (2.9) and lemma S3.2 it is easy to show that are consistent estimators of Σ and Σ * respectively.For V we propose , Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing)

MONTE CARLO EXPERIMENT
This is because of Assumption S2.2 and the first difference structure of the model.The level of the test is given by the following result that is proved in the Supplementary material.
Corollary 5.1.Suppose that Assumptions S2.1-S2.10hold.When N tr(H 2 ) 2 → 0, as N → ∞ and T is fixed, under the null hypothesis we have where χ 2 Q denotes a chi-square distribution with degrees of freedom Q.
From a nonparametric point of view, we can be interested in the construction of pointwise confidence interval for m(•) for each given point z.In Section 1 of the Supplementary Material we provide confidence bands for all three stage estimators based in the local linear version.

Monte Carlo experiment
To assess the finite sample properties of the different estimators and statistical tests proposed in this paper, some Monte Carlo simulations are performed.To this end, we consider the following data generating process (DGP) where the coefficients m 1 (Z it ) = (1.6 + 0.6Z it )exp(−0.4(Zit − 3) 2 ), β 1 = −1, and β 2 = 1.The smoothing variable Z it follows a uniform [2, 6] distribution, U 2it is exogenous following a N (0, 1) distribution, whereas X 1it and U 1it are some endogenous variables following the reduced form equations: where V 1it and V 2it are instrumental variables independently generated from a uniform [0, 4] distribution and the noises follow , where ρ controls the correlation between the residues in the structural equation and in the reduced form equation, and σ controls the variation of residues in the reduced form equation.Further, to allow for the presence of heterogeneity in the form of fixed effects, we generate where ζ i is an i.i.d.N (0, 1) random variable and In order to check the performance of the proposed estimators we set ρ = 0.7, σ 2 = 1, and conduct simulations by considering that the number of periods T equal to 4, while the cross-sections N is 100, 200, and 400.
Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing)

MONTE CARLO EXPERIMENT
To meet the requirement that N tr(H 2 ) 2 → 0, we assume H 2 = h 2 I q and fix the bandwidth for estimating β at three values h 2 = 1.25N −1/3 , 2.5N −1/3 and 5N −1/3 .Because of the need for undersmooting in the first stage for asymptotic reasons, required by Assumption S2.8, we set the first stage bandwidth H 1 to be 0.8 times the second stage one H 2 , i.e., H 1 = 0.8H 2 .
For the sake of comparison we analyze the finite sample behavior of the following estimators: β E is the estimator proposed in (2.8) with ∆U instead of ∆ W U (i.e., when the endogenity problem has not been solved); β N F is the β estimator with ∆W U instead of ∆ W U (i.e., the nonfeasible estimator); β F is the estimator proposed in (2.8); and β F is the estimator proposed in (2.9).
In Table 1 we report the means, standard deviation, and root mean squared error (RMSE) of the estimated 1000 values for β under different settings.Analyzing these results, we find that the performance of all these estimators is not sensitive to the choice of bandwidth.All estimators give a similar asymptotically unbiased estimation for β 2 because there is no endogeneity involved in this parameter.As the sample size increase, all of them converge.However, a totally different phenomenon exhibits for β 1 (i.e., the parameter related to the endogenous variable X 1 ).As it is expected, β N F present the best results, but both β F and β F performs quite Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing)

MONTE CARLO EXPERIMENT
well when they deal with the endogeneity problem.Although both methods have similar standard deviations, a large bias is resulted from β F and β F exhibits the lower RMSE.
In order to verify the asymptotic results analyzed in the previous sections for the functional coefficient, now we compare the finite sample behavior of different nonparametric estimators, where β is used as a √ n-consistent estimator of β and the bandwidth in this stage H 3 = h 3 I q is chosen by the Silverman's rule-of-thumb, i.e., h 3 = 1.06 σ Z N −1/5 , where σ Z is the sample standard deviation of Z it .In addition, to meet the requirement that H 1 and H 2 have to be chosen undersmoothed for asymptotic reasons, we set h 2 = 1.75 σ Z N −1/3 and h 1 = 1.25 σ Z N −1/3 .As a measure of accuracy, we use the following RMSE , where ϕ is the ϕ-th replication and R is the number of replications.
For the sake of comparison, Figure 1

MONTE CARLO EXPERIMENT
estimator with ∆W X instead of ∆ W X .As it is expected, in Figure 1  Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing)

MONTE CARLO EXPERIMENT
−1, β 2 = 1 against the alternative hypothesis H 1 : where the power is indexed by φ 1 .To this end, we use the Wald test proposed previously for sample size T = 4 and N = 200 and we conduct 1000 Monte Carlo simulations.Also, the bandwidths used here were h 2 = 2.5N −1/3 and h 1 = 0.8 * h 2 .Figure 4 plots the power curves for three significance levels.Note: The dotted line is the power curve for 1% significance level, the dashed line and the solid line are for 5% and 10% significance levels, respectively.
From Figure 4, it can be pointed out that when φ 1 = 0, the power collapses to the test size.More precisely, the simulated sizes of the proposed test are 9%, 3% and 2% corresponding to the significance levels 1% (dotted Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing)

EMPIRICAL RESULTS
line), 5% (dashed line), and 10% (solid line), respectively.Therefore, it is proved that the simulated sizes are close the nominal size so our test can deliver a correct test size.On the contrary, when φ 1 deviates from 0 our test is reasonably powerful since the power curves tend to 1 quickly.

Empirical results
With the of displaying the usefulness of the proposed method, in this section we analyze the impact of unexpected health expenses on households savings.Along with liquidity constraints and habits in consumer preferences, uncertainty about possible economic hardships and household risk aversion are key determinants of household's consumption/savings decisions; see Friedman (1957).In this situation, precautionary savings appears as a protection tool for the individuals against either potential income downturns or unforeseen out-of-pocket medical expenses in latter stages of life, see Chou et al. (2004) for a further discussion.
To do it, we propose to extend the analysis in Chou et al. (2004) and estimate the following regression model where i index the household, t the time, Z it is the age of the household head, X 1it the health-care (log), Y it are the savings, and X 2it the permanent income (log).In this sense, household savings are characterized by the uncertainty about both future health-care expenses, m 1 (•), and income downturns, m 2 (•).Note that household permanent income is not directly observable.In order to approximate this variable, we follow the proposal in Chou et al. (2004).Thus, assuming that the interest rate equals to the productivity rate of growth and 65 years old is the maximum age at which people works, the permanent earnings at age τ 0 is calculated as where f (τ ) is the estimated quadratic function of age, Y it the household income and X 3 a vector of demographic characteristics.
Choosing the bandwidths as in the simulation study, estimation results are shown in Figure 5.The estimated curves are plotted against the age variable jointly with 95% pointwise confidence intervals calculated adapting the wild bootstrap technique of Härdle et al. (2004) to this context.the results in Figure 5, it can be noted that when we control for uncertainty  (2004).Comparing the behavior of the elasticities for the different savings results, it is obtained that consumption in durable goods reacts more to unexpected changes in income, whereas consumptions in non-durable goods is more sensitive to potential health-care payout.This holds specially for households over 45 years old.
Finally, in order to evaluate the empirical relevance of the endogeneity problem we compare the results of our technique (line grey) against those obtained without considering endogeneity (line black); see Panels 3 of Figure 5.By looking at these results, there are some significant differences.When we control for uncertainty about health-care expenditures, households accumulate assets in the middle of their life, whereas when endogeneity is not taken into account there is a more or less constant path over the life cycle.
Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing)3.STATISTICAL PROPERTIESComparing the results of Theorem 3.2 and Corollary S1.2 it can be noted, as expected, the best behavior, in terms of bias, of the Local Linear three stage estimator against the Naradaya-Watson version.For other advantages seeFan and Gijbels (1995).Nevertheless, in this framework with endogenous regressors it is also necessary to take into account that the local linear estimator requires the use of three different nonparametric estimators as IV, with their corresponding bandwidths, whereas the Nadaraya-Watson only needs one.Therefore, a better performance of the Nadaraya-Watson estimator in finite samples is expected, as it is going to be shown later in the Monte Carlo experiments.That is the reason why we focus on the Nadaraya-Watson estimators throughout the paper, although all the proposed results could be extended to the local polynomial case.
H 3 ), getting the following result Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing) .20) i = 1, . . ., N ; t = 1, . . ., T, Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing) depicts boxplots of the 1, 000 RMSE values of the functional coefficient estimators using different types of estimators proposed previously.To show the effect of the generated regressors in constructing the feasible estimators, Figure 1(a) collects the results for the Nadaraya-Watson (NW) estimator without the adjustment for endogeneity and Figure 1(b) collects the results for the nonfeasible NW Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing)

Figure 1 :
Figure1(a).In addition, for the sake of comparison between the NW and

Figure 3 :
Figure 3: The boxplots of the minimum distance estimator in 1000 independent

Figure 4 :
Figure 4: The power curves for sample size T = 4 and N = 200.
Figure 5 is divided into two panels, B and C, which in turn are split up into three graphics.Panels B exhibit the corresponding elasticity to changes in health-care expenditures, i.e., m 1 (•), whereas Panels C show the precau-Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing) 7. EMPIRICAL RESULTS tionary savings elasticity to changes in household income; i.e., m 2 (•).In addition, Panel B-1 shows the estimated curves when durable goods are not taken into account.B-2 focuses on the second definition of savings, whereas Panel B-3 compares the estimated curves when endogeneity is not considered.This structure is maintained for Panels C. Focusing on

Figure 5 :
Figure 5: Household savings over the life-cycle concerned with the nonparametric estimation of a structural panel data varying coefficient model, where the individual heterogeneity is allowed to be correlated with some explanatory variables.This specification is rather frequent nowadays in many standard econometric applications as the study of household consumption behavior or labor supply analysis.Therefore, it is of interest to have available estimators that, at the same time, keep a reasonable degree of flexibility and are robust to both endogeneity and fixed effects.Trying to satisfy these requirements, in this paper a nonparametric three-stage procedure is developed where IV techniques are used to deal with endogeneity, and differencing techniques are used to cope with fixed effects.Furthermore, to achieve efficiency, a minimum distance estimator is proposed.The feasibility and possible gains of this new procedure are shown by estimating a LCH panel data model and simulation results support the empirical findings.Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing) Rodriguez-Poo and Soberon (2014)er rate of convergence for this type of estimators is n|H 2 | 1/2 (seeHärdle (1990)for details).For this reason, in the next section, and followingRodriguez-Poo and Soberon (2014) and, under the assumptions established in Section 2 of the Supplementary Material, E(∆X 1it |L it , L i(t−1) ; H 1 ) converges uniformly toE(∆X 1it |L it , L i(t−1)).Finally, the dependence on E(∆X 1it |L it , L i(t−1) ) vanishes because of condition tr(H 1 ) = o p (tr(H 2 )).As it can be also remarked, Theorem 3.2 shows a variance term that exhibits a rate of convergence that 1) , H 2 ), and m β (•, H 2 ) is the estimator defined in (2.10).Note that in this case, we use β instead of β.In terms of asymptotics the results are the same because in both cases the rate of

Table 1 :
Mean, Standard Deviations and RMSE's of the different estimators for β 1 and β 2 . 7.21) Statistica Sinica: Newly accepted Paper (accepted author-version subject to English editing)7.EMPIRICAL RESULTSabout health-care expenditures (Panels B) younger households (26-33) exhibit a declining savings rate, following by a constant path till the age of 40, where the hump-shaped appears again.In addition, these results are combined with the delay in the wealth accumulation process of the Spanish households (note that in the U.S. it begins around 40 age whereas in Spain at 45 age), we realize the negative impact that public health programs have on precautionary savings, confirming the results inChou et al.