Differencing techniques in semi-parametric panel data varying coefficient models with fixed effects: a Monte Carlo study

Recently, some new techniques have been proposed for the estimation of semi-parametric fixed effects varying coefficient panel data models. These new techniques fall within the class of the so-called differencing estimators. In particular, we consider first-differences and within local linear regression estimators. Analyzing their asymptotic properties it turns out that, keeping the same order of magnitude for the bias term, these estimators exhibit different asymptotic bounds for the variance. In both cases, the consequences are suboptimal non-parametric rates of convergence. In order to solve this problem, by exploiting the additive structure of this model, a one-step backfitting algorithm is proposed. Under fairly general conditions, it turns out that the resulting estimators show optimal rates of convergence and exhibit the oracle efficiency property. Since both estimators are asymptotically equivalent, it is of interest to analyze their behavior in small sample sizes. In a fully parametric context, it is well-known that, under strict exogeneity assumptions the performance of both first-differences and within estimators is going to depend on the stochastic structure of the idiosyncratic random errors. However, in the non-parametric setting, apart from the previous issues other factors such as dimensionality or sample size are of great interest. In particular, we would be interested in learning about their relative average mean square error under different scenarios. The simulation results basically confirm the theoretical findings for both local linear regression and one-step backfitting estimators. However, we have found out that within estimators are rather sensitive to the size of number of time observations.


Introduction
Since the last ten years, semi-parametric panel data varying coefficient models with fixed effects have become a very useful tool to handle many statistical problems in empirical studies (see for example Card 2001;Kottaridi and Stengos 2010;Kuan and Chen 2013). If individual effects are assumed to be uncorrelated with the explanatory variables (random effects), the smooth functions can be estimated by any standard nonparametric techniques of varying coefficient models. See for example the local least squares method with kernel weights proposed in Li et al. (2002) or other references that can be found in Su and Ullah (2011). Nevertheless, when the cross-sectional heterogeneity is correlated with some covariates, as it is in our case, the problem is more complex and direct estimation with the aforementioned techniques provides estimators that are asymptotically biased. The reason is that the correlation of the heterogeneity term with some of the explanatory variables causes a non-negligible asymptotic bias.
Recently, in order to cope with this problem, some new procedures have been developed within the framework of differencing techniques. By taking differences one can remove the heterogeneity effect and then estimate the resulting model by standard nonparametric techniques. Unfortunately, this is not as easy as it would look at a first glance since the model in differences appears, for each individual, as an additive function with same functional form at different times. That is the main reason why some proposals to estimate this transformed model are closely related to estimation techniques initially designed for additive models. In Henderson et al. (2008) it is developed an iterative procedure based on a maximum likelihood approach, whereas in Mammen et al. (2009) it is proposed a smooth backfitting algorithm (see Buja et al. (1989) for the original idea of the backfitting). Recently, in Qian and Wang (2012) it is developed a two-step procedure in which in the first-step the whole nonparametric term is estimated through a multivariate non-parametric estimator and later the function, in each point, is obtained via marginal integration techniques. Su and Lu (2013) estimate the unknown function as a solution of a second-order Friedholm integral equation. However, despite the great contributions of these techniques they are not very appealing as they are computationally intensive.
In view of these results, in Soberon (2013, 2015) it is proposed a direct strategy to estimate the unknown varying coefficients that can be applied either in the context of first differences or the within transformation. The basic idea of both estimators is to approximate the unknown additive functions through a local linear regression technique with higher-dimensional kernel weights. Although the proposed estimation strategies enable us to solve the non-negligible asymptotic bias problem that it is usual in differencing estimators, they reflect the standard dilemma of the nonparametric estimates, i.e. any attempt to hold back the bias is offset by an increase of the variance term and therefore the resulting estimators achieve suboptimal rates of convergence. In order to solve this problem, by exploiting the additive structure of this model, a one-step backfitting algorithm is proposed. Under fairly general conditions, it turns out that the resulting estimators show optimal rates of convergence and exhibit the oracle efficiency property. This is already a well-known result (see Fan and Zhang 1999): additional smoothing can reduce the variance without affecting the asymptotic order of the bias.
Since both estimators are asymptotically equivalent, it is of interest to analyze their behavior in small sample sizes under a standard panel data setting, that is, fixed number of time observations and increasing number of individuals. In a fully parametric context, it is well-known that, under strict exogeneity assumptions the performance of both differencing estimators is going to depend on the stochastic structure of the idiosyncratic random errors (see Wooldridge 2002). Following these ideas, in this paper, we perform a Monte Carlo simulation experiment that is designed to compare the performance of both the first-differences and the within estimator in finite samples under fairly standard conditions. Moreover, in the non-parametric setting, apart from the previous issues other factors such as the number of covariates, the number of time periods, and more importantly the number of individuals, are of great interest. In particular, we would be interested in learning whether, for different scenaria of the idiosyncratic error terms, which estimator is more efficient.
Finally, note that differencing techniques are not always suitable to remove heterogeneity individual effects. For example, in nonseparable panel data models, other estimation strategies need to be undertaken. Altonji and Matzkin (2005), Bester and Hansen (2009), and Hoderlein and White (2012) focus on both the estimation of the structural functional itself or the local average derivatives. They also analyze some identification conditions. The rest of the paper is organized as follows. In Sect. 2, we present the local linear estimation procedures for both differencing estimators and we also study their asymptotic properties. As it has been already pointed out before, the estimators at this stage present different and, in both cases, suboptimal rates of convergence. In Sect. 3, we apply a one-step backfitting algorithm to both estimators allowing them to achieve asymptotically optimal rates. In Sect. 4, we compare the estimators considered via a Monte Carlo simulation. Finally, we conclude in Sect. 5. All assumptions and technicalities required to show the asymptotic behavior of the estimators are relegated to the "Appendix".

Local linear estimation procedure
To illustrate the estimation procedures to be compared in this paper we consider a panel data varying coefficient model where some regression coefficients are allowed to be varying depending on some exogenous variables of the form where X and Z are d × 1 and q × 1 vector of covariates, respectively, m(Z ) is a d × 1 vector of smooth functions to be estimated, μ i is the unobserved individual heterogeneity and v it the random disturbance. We assume μ i is correlated with X and/or Z with an unknown correlation structure.
As it was already pointed out in the previous section, direct estimation of the unknown function m (·) through nonparametric standard techniques rends asymptotically biased estimators. One solution to this problem are the so called differencing techniques. Among the most popular transformations we consider first differences and the within transformation. Other estimation strategies such as profile likelihood (least squares) techniques are available (see Sun et al. 2009). However, conditions for consistency of the parameters (functions) of interest are rather strong and this is why we focus on estimators based on differencing transformations.
The first differences transformation implies subtracting from time t of (2.1) that of time t − 1, i.e.
whereas the within transformation implies subtracting the within-group mean, i.e.
However, as we have stated previously, direct non-parametric estimation of m(·) in both (2.2) and (2.3) has been considered as a cumbersome task (see Su and Ullah 2011). The reason is that, for each individual, the right part of both specifications are linear combinations of X it m(Z it ) for different time periods t. Therefore, to estimate the unknown function it is necessary to consider m(·) as an additive function whose elements share the same functional form.
Just to clarify our proposal to solve the previous problem, we will first focus on the univariate regression model and later we will extend these results to the multivariate case. Consider the first differences transformation in (2.2) with d = q = 1. In this case, for any z ∈ A, where A is a compact subsect in a nonempty interior of R, one has the following Taylor expansion Similarly, for the within regression in (2.3), one has the following Both expressions (2.4) and (2.5) suggest that we can estimate m(z), m (z), . . . , . . , p, with different kernel weights. The quantities of interest in both cases can be estimated using locally weighted linear regression (see Fan and Gijbels 1995b).
(2.6) and for (2.5) we have where h is a bandwidth and K is an univariate kernel such that Let us denote by β F 0 and β F 1 the minimizers of (2.6) and by β w 0 and β w 1 the minimizers of (2.7). The above exposition suggests as estimators in the first differences case for m(·) and m (·), m h (z) = β F 0 and m h (z) = β F 1 , respectively. For the within regression the estimators will be m h (z) = β w 0 and m h (z) = β w 1 , respectively.
Note that in (2.6) we propose a bivariate kernel that also contains Z i(t−1) instead of considering only Z it . The reason is that, if we consider only a kernel around Z it , the transformed regression equation (2.2) would be originally localized around Z it without considering all other values. Consequently, the distance between Z is (for s = t) and z cannot be controlled by the fixed bandwidth parameter and so that the transformed remainder terms cannot be negligible. The consequence of all that would be a nondegenerated bias in this type of local linear estimator that it is removed by considering a local approximation around the pair Z it , Z i(t−1) . The same can be said for (2.7). Although there, the non-degenerated bias must be removed by considering a local approximation around the T × 1 vector (Z i1 , . . . , Z i T ). The difference in the local approximation makes a substantial difference in terms of the asymptotic variance in both estimators. In fact, in Theorems 1 and 2 it is shown that under similar conditions the order of the bias for the univariate case will be the same, O(h 2 ), but the variance is for T > 1 rather different. For the first-differences estimator the variance is of order

O(1/N T h 2 ), whereas for the other estimator is of order O(1/N T h T ).
For d = q = 1, the estimators have the following form. Denote by where H is a q × q symmetric positive definite bandwidth matrix and K is a q-variate kernel.
Let D(z) = vec(D m (z)) be a dq × q vector and let D m (z) = ∂m(z)/∂z be a d × q matrix of partial derivatives of the dth component of m(z) with respect to the elements of the q × 1 vector z. Denote H m (z) = ∂m(z)/∂z∂z a dq × d matrix of the Hessian matrix of the dth component of m(z). We suggest as estimators of m(z) and Then, the local weighted linear least-squares estimator of m(z) is defined as We focus now on the within estimator. Let β w = ( β w 0 β w 1 ) be a d(1 + q)-vector that minimizes the expression (2.5) in the multivariate case, i.e., where now K is the product of univariate kernels such that K (u 1 , u 2 , . . . , u T ) = T =1 K (u ) and u is the th component of u. We suggest as estimators for m(z) and D m (z), m w (z; H ) = β w 0 and vec( D w m (z; H )) = β w 1 , respectively. Thus, assuming Z w W w Z w is nonsingular, the matrix form of the solution of the minimization problem (2.11) can be written as The local weighted linear least-squares estimator of m(z) for the within regression is then defined as (2.13) Note that for the sake of simplicity we use the same bandwidth matrix for these two estimators. As it is well-known in the non-parametric literature, the optimal bandwidth matrix H should be obtained using several standard procedures such as, for example, the residual squares criterion proposed in Fan and Gijbels (1995a). Then, for empirical applications we must not forget that although the resulting bandwidths are very close, they are different.
Once obtained the non-parametric estimators for both the first-differences and the within transformation, the next step is to establish the behavior of the two estimators in large samples. Under some standard assumptions collected in the "Appendix", their asymptotic distributions are derived in the next theorems. Conditions for the proof are rather general. Assumption A.1 characterizes the data-generating process for a panel data model. Assumption A.2 is a standard strict exogeneity condition and A.3 imposes the so-called fixed effects. In addition, for conditional moments, densities and kernel functions we need some smoothness and boundedness conditions that are collected in assumptions A.4-A.9 and A.11-A.12. Finally, assumptions A.10 and A.13 are required to show that Lyapunov conditions hold. Let is the probability density function of the random variable Z it , Z i(t−1) and we denote by f Z i1 ,...,Z i T (z, . . . , z) the probability density functions of (Z i1 , . . . , Z i T ) evaluated at point z.
In this context, in Soberon (2013, 2015) it is shown the following result for the locally weighted least-squares first-differences estimator (2.9): Theorem 1 Assume conditions A.1-A.10 hold. Then,

as N tends to infinity and T is fixed
and On the other hand, for the locally weighted least-squares within estimator (2.12), Soberon (2013, 2015) obtain the following asymptotic properties: Theorem 2 Assume conditions A.1-A.7 and A.11-A.13 hold, then as N → ∞ and T remains to be fixed we obtain and As we have already pointed out above in Theorems 1 and 2, the use of a higher dimensional kernel weight enables us to solve the problem of non-negligible asymptotic bias. It provides local linear estimators with a bias term of the same order as the standard results, O(tr(H )). However, as it is usual in the non-parametric techniques any attempt to reduce the bias is offset by an enlargement of the variance term. Thus, these two estimators are consistent but exhibit a suboptimal rate of convergence. Note that the standard rate of this type of problems is N T |H | 1/2 . The first-differences estimator exhibits a rate of order N T |H | and the within estimator shows a rate of order N T |H | T /2 .

One-step backfitting procedure
In this section we analyze alternative procedures to obtain non-parametric estimators that exhibit the optimal rate of convergence. Firstly, we focus on the first differences transformation. Later we present the corresponding within estimator. We conclude with a comparison between the asymptotic properties of the resulting estimators.
As it is noted in Fan and Zhang (1999), the variance can be reduced by further smoothing but the bias cannot be reduced by any kind of smoothing. Thus, in order to achieve optimality we propose to combine previous estimators with a one-step backfitting algorithm. Therefore, this estimation strategy allows us to exploit the additive structure of the model in order to cancel asymptotically the additive terms of the model.
Let m F (z; H ) be the first-step local weighted linear least-squares first-differences estimator (2.8) and define the variable ΔY b it such that and replace (2.2) in this previous equation obtaining where the composed error term has the form By the same reasoning as before, the quantities of interest of (3.2) can be estimated as a solution for γ F to the following locally weighted linear regression where H is a q × q symmetric positive definite bandwidth matrix of this step. Denote by . . . . . .
On the other hand, and following a similar procedure as before, we propose a backfitting estimator for the within transformation such as (2.3). Let m w (z; H ) be the first-step within estimator proposed in (2.11), they defineŸ where the error term is Denote by γ w = γ w 0 γ w 1 the d(1 + q)-vector that minimizes the following problem we propose as estimator of m(z) and D m (z), m w (z; H ) = γ w 0 and vec( D w m (z; H )) = γ w 1 , respectively, of the form In order to show that these two backfitting estimators achieve optimal rates of convergence and furthermore, they are oracle efficient, we need the sampling scheme conditions established in Assumptions A.1-A.3 and the smoothness and boundedness conditions already considered in Assumptions A.4-A.7, A.8-A.9 and A.11-A.12. Furthermore, as they are obtained via a one-step backfitting algorithm we need to ensure that both bias and variance rates of the first-step estimates, m F (z; H ) and m w (z; H ), are uniform. Therefore, following Masry (1996) we impose some assumptions about the bandwidth H and its relationship with H . This is already considered in Assumptions A.14 and A.15.
Let diag d (tr(H m r (z) H )) be the diagonal matrix of elements tr(H m r (z) H ) and i d a d × 1 unit vector, Rodriguez-Poo and Soberon (2013, 2015) obtain the following asymptotic expressions for the backfitting first-differences estimator (3.4), Theorem 3 Assume conditions A.1-A.9 and A.14-A.15 holds, then, as N tends to infinity and T is fixed we get Under similar conditions, in Soberon (2013, 2015) are proved the following asymptotic results for the backfitting within estimator (3.4): Theorem 4 Assume conditions A.1-A.7, A.11-A.12 and A.14-A.15 hold, then as N → ∞ and T remains to be fixed we obtain With the aim of itemizing the asymptotic behavior of these two backfitting estimators, m F (z; H ) and m w (z; H ), we analyze in detail the bias and variance-covariance matrix of Theorems 3 and 4. On one hand, in both cases the conditional bias is very close to the standard one of the local polynomial regression estimators. Thus, as each entry of H m r (z) is a measure of the curvature of m(·) at z in a particular direction, we can intuitively conclude that these estimators show a higher conditional bias as far as the unknown function exhibits a higher curvature and more smoothness. On the other hand, regarding to the conditional variance we observe that both estimators achieve the optimal rate of convergence, but they show different constants. Thus, while the firstdifferences estimator exhibits a variance-covariance matrix which increases when the smoothness becomes lower or the data becomes sparse near z, the conditional variance of the within estimator is also influenced by the time-demeaned covariates BẌẌ (z).
In this way, it is shown that direct estimation techniques allow obtaining estimators with different rates of convergence that depend on the type of differencing transformation. Meanwhile, one-step backfitting procedures provide estimators that achieve the optimal rate of convergence for both transformations. In this situation, the rate of convergence should not be used as an efficiency criterion between both backfitting estimators and, in order to analyze efficiency, it is necessary to study their finite sample behavior.

Monte Carlo experiment
In this section, we conduct an extensive Monte Carlo simulation with the aim of comparing the small sample behavior of both first-differences and within non-parametric estimators introduced in Sects. 2 and 3.
In a fully parametric context, it is well-known that, under strict exogeneity assumptions the performance of both estimators is going to depend on the stochastic structure of the v it 's random errors. Furthermore, as it appears in Theorems 1 and 2, the asymptotic bound for the variance term of the first-differences estimator is O(1/N T |H |), whereas the corresponding term for the within estimator is O(1/N T |H | T /2 ). Given that both estimators exhibit the same expression for the bias, for different values of T , one might expect a different behavior in their average mean square error (AMSE). On the opposite, for the one-step backfitting estimators proposed in Sect. 3 their performance should be affected by T in the same direction, since both bias and variance terms are now of the same order, i.e. O(tr( H )) and O(1/N T | H | 1/2 ), respectively. Finally, the asymptotic bounds in the non-parametric setting reflect other factors that are also of interest when analyzing the AMSE behavior of both estimators such as the dimension of q or the rate at which the AMSEs tend to zero as N increases.
In this situation, the main goal that we pursue with this simulation experiment is to establish whether the behavior of these estimators depends on the stochastic structure of the error term and to determine if these results hold when facing the curse of dimensionality problem and for different number of time observations. Finally, it is also of interest to check wether the performance of the one-step backfitting estimator is, as expected, better than the corresponding for the local linear regression estimator in small samples sizes.
We propose a Monte Carlo experiment in which observations are generated from the following varying coefficient panel data model: where X dit and Z qit are random variables generated such that X dit = 0.5ζ dit +0.5ξ dit (ζ 1it and ζ 2it are i.i. d. N (0, 1)), Z qit = ω qit +ω qi(t−1) (ω 1it and ω 2it are i.i.d. N (0, 1)) and we consider three different cases of study: where the chosen functional forms are m 1 (Z 1it ) = sin (Z 1it Π ), m 1 (Z 1it , Z 2it ) = sin 1 2 (Z 1it + Z 2it ) Π and m 2 (Z 1it ) = exp −Z 2 1it . Treating the cross-sectional heterogeneity as the fixed effect, we allow that the individual effect can be correlated with one or more of the covariates. In particular, the dependence between μ qi and Z qit is imposed by generating μ qi = c 0 Z i· + u i and N (0, 1) random variable and i = 2, . . . , N . The correlation between the fixed effects and some of the explanatory variables of the model is controlled by c 0 = 0.5. Also, let it be an i.i. d. N (0, 1) and v it a scalar random variable, for each model we work with the following three different specifications of the error term: (a) v it = it ; (b) v it follows a random walk, such us v it = 1 + v i(t−1) + it ; (c) v it is generated as stationary AR(1) process of the form v it = ρv i(t−1) + it .
In each experiment we use 1000 Monte Carlo replications (M). The number of period (T ) is varied to be 3 and 5, whereas the number of cross-sections (N ) takes the values 50, 100 and 150. For the calculations we use a Gaussian kernel and the bandwidth is chosen as H = h I , and h = σ z (N T ) −1/5 , where σ z is the sample standard deviation of Z qit In order to state the performance of the first-differences and within estimator, we use the Mean Square Error (MSE) as a measure of their estimation accuracy. Thus, denoting the ϕth replication by the subscript ϕ, which can be approximated by the Averaged Mean Squared Error (AMSE), Simulation results are summarized in Tables 1, 2, 3, 4, 5, 6, 7, 8 and 9 that are relegated to the "Appendix". Specifically, Tables 1, 2 and 3 contain the AMSE obtained for each of the three varying coefficient specifications proposed in the simulation when the error term is i.i.d., Tables 4, 5 and 6 focus on the results for the random walk case, whereas Tables 7, 8 and 9 contain the simulation results when the idiosyncratic errors are generated according to an AR(1) stationary structure. In every table, we present results for the two differencing estimators, m F (z; H ) and m w (z; H ), and for both one-step backfitting estimators. We consider the cases when T = 3, 5 and N = 50, 100, 150. Note that our asymptotic results hold as N becomes larger whereas T is kept fixed.
Based on these simulation results, we first try to determine how the structure of the stochastic error term affects the AMSE behavior of the proposed estimators. Later, we focus on the impact that both curse of dimensionality and number of time observations have on the behavior of the AMSE. We conclude the analysis by checking whether, regardless of the structure of the error term, the one-step backfitting estimator exhibits a better performance than the local linear regression estimator, as we expected from their statistical properties.
According to our theoretical findings and taking into account some standard results in fully parametric settings, in the i.i.d. case, the within estimator should perform better, in terms of the AMSE, than the first-differences one, whereas the later should be preferred when the disturbance follows a random walk. A quick look to Table 1 confirms in general our theoretical findings. As expected, as N increases all AMSEs tend to zero but the rates of convergence are not similar. In fact, if we look at the relative AMSE, defined as AM S E( m F (z; H ))/AM S E( m w (z; H )), we find out that the rate at which the AMSE of the within estimator converges to zero is faster than the convergence of the first-differences estimator. Therefore, in this stochastic setting, the within estimator is preferred.
On the other hand, for the random walk case, some standard results for the fully parametric case point out that first-differences estimator should exhibit a better per-formance. This is true if we compare the AMSE of the first-differences estimator in Tables 1, 2 and 3 against their counterparts in Tables 4, 5 and 6. In all cases, the AMSE is smaller when the idiosyncratic errors are generated as a random walk.
Finally, in Tables 7, 8 and 9 we find out that when the random errors follow an autoregressive process AR(1), the within estimator performs better than the firstdifferences estimator in terms of their AMSE, although its performance becomes worse if we compare it against other error settings. Moreover, the results of the firstdifferences estimator are better than the other ones obtained in the random walk setting. This is somehow unexpected. Now, to establish the impact of the curse of dimensionality on the performance of the proposed estimators we compare the AMSE of m F (z; H ) and m w (z; H ) when q = 1 (Tables 1, 4, 7), against their counterparts when q = 2 (Tables 2, 5, 8). As expected, we point out that when q = 2 the relative performance in terms of the AMSE of the proposed estimators is worse in all error settings, although the within estimator is the most affected by the curse of dimensionality. When the idiosyncratic error term is i.i.d. the relative AMSE for N = 150 and T = 3 goes from 4.463 (q = 1) to 1.688 (q = 2), whereas for the random walk case the relative AMSE for N = 150 and T = 3 goes from 3.228 (q = 1) to 1.190 (q = 2). The same can be said when the error term is generated according to an AR(1) stationary structure. Finally, the results shown in Tables 3, 6 and 9 indicate that the d-dimension of the vector of covariates X does not affect the asymptotic behavior of the estimators.
On the contrary, if we focus now on the impact of the number of time observations we find out that the within estimator m w (z; H ) is much more sensitive to T than the other. As we can realize in all tables, as the number of time observations (T ) increases the relative performance of m w (z; H ) becomes worse. In Table 1, for example, if we set N = 150 its relative AMSE goes from 4.46 (T = 3) to 3.33 (T = 5). This effect can be explained in terms of the asymptotic bounds of both estimators. In Tables 7, 8 and 9 we can see that the within estimator is much more sensitive to the size of T than the first-differences estimator when the random error term follows an autoregressive structure.
As a summary, the results of m F (z; H ) are much more stable across different specifications of the error term. The same cannot be said about the m w (z; H ). In fact, it performs quite well with q = 1 and under i.i.d. or an AR(1) stationary process, but it shows a much worse performance when the errors are generated following a random walk process. Therefore, we can conclude that the within estimator is preferred when the idiosyncratic error term is i.i.d. or exhibits an autoregressive structure, whereas when v it follows a random walk the first-differences estimator has a better performance.
Finally, in all error settings, the performance of the one-step backfitting estimator is better than its corresponding local linear regression estimator. However, the improvement is higher for the within estimator. Also, as we expected, the curse of dimensionality is overridden. This is of course the main reason why we have applied this second stage estimation procedure. Furthermore, the rate at which the AMSE tends to zero for both estimators seems to be faster, according to the predictions of the asymptotic results. However, for small sample sizes, although both estimators do have the same rates of convergence the constants are different. This is considered in simulations under different scenarios of the error terms. Hence, under the i.i.d. and the AR(1) setting the first-step backfitting algorithm of the within estimator performs better than the first-differences one. This can be realized by analyzing the relative AMSE. On the contrary, under the random walk specification, the performance is better in the opposite sense.
We finish this section by highlighting that as N increases, that is, asymptotically, the AMSE tends to converge for each estimator under different specifications of the error term. For different values of T , the AMSE of the first-differences estimator tends to dominate in terms of the AMSE of the within estimator. This can be also observed by looking at the relative AMSE values.

Conclusions
Recently, some new techniques have been proposed for the estimation of semiparametric within varying coefficient panel data models. These new techniques fall within the class of the so-called differencing estimators. In particular, we consider first-differences and within local linear regression estimators. Analyzing their asymptotic properties it turns out that, keeping the same order of magnitude for the bias term, these estimators exhibit different asymptotic bounds for the variance. In both cases, the consequences are suboptimal non-parametric rates of convergence. In order to solve this problem, by exploiting the additive structure of this model, a one-step backfitting algorithm is proposed. Under fairly general conditions, it turns out that the resulting estimators show optimal rates of convergence and exhibit the oracle efficiency property. Since both estimators are asymptotically equivalent, it is of interest to analyze their behavior in small sample sizes. In a fully parametric context, it is well-known that, under strict exogeneity assumptions the performance of both first-differences and within estimators is going to depend on the stochastic structure of the idiosyncratic random errors. However, in the non-parametric setting, apart from the previous issues other factors such as dimensionality or sample size are of great interest. In particular, we would be interested in learning about their relative average mean square error under different scenarios. The simulation results basically confirm the theoretical findings for both local linear regression and one-step backfitting estimators. However, we have found out that within estimators are rather sensitive to the size of number of time observations.