Nonparametric and Semiparametric Panel Data Models: Recent Developments

In this paper, we provide an intensive review of the recent developments for semiparametric and fully nonparametric panel data models that are linearly separable in the innovation and the individual&#8208;specific term. We analyze these developments under two alternative model specifications: fixed and random effects panel data models. More precisely, in the random effects setting, we focus our attention in the analysis of some efficiency issues that have to do with the so&#8208;called working independence condition. This assumption is introduced when estimating the asymptotic variance&#8211;covariance matrix of nonparametric estimators. In the fixed effects setting, to cope with the so&#8208;called incidental parameters problem, we consider two different estimation approaches: profiling techniques and differencing methods. Furthermore, we are also interested in the endogeneity problem and how instrumental variables are used in this context. In addition, for practitioners, we also show different ways of avoiding the so&#8208;called curse of dimensionality problem in pure nonparametric models. In this way, semiparametric and additive models appear as a solution when the number of explanatory variables is large.


Introduction
In empirical research, the complexity of econometric models has been greatly enriched by the availability of panel data sets. These data are characterized by the observation of a group of individuals (households, consumers, countries, and so on) over time, so they allow us to extract some unknown information about the idiosyncratic characteristics of individuals. From a theoretical point of view, this double index enables us to specify econometric models that account both for the impact of unobserved actions of individuals and observable individual characteristics (explanatory variables). Hence, through the use of panel data econometric models, under some standard assumptions on the data generating process, it is possible to draw inference on the parameters of interest that otherwise would be impossible to obtain. As it is often the case in applied econometrics, we are interested in partial effects of the observable explanatory variables in the population regression (quantile) function but, following the approach in Chamberlain (1984), when there exists time-invariant or/and individual invariant omitted latent variables.
In this context, the statistical properties of the estimators of the unknown parameters are going to depend crucially on the set of assumptions that we are willing to impose on the relationship between the observable explanatory variables and the unobserved effects in the conditioning set. On the one hand, we might consider the unobserved individual heterogeneity as statistically independent from the observed explanatory variables (the so-called random effects case). Then, the individual heterogeneity is just another unobserved factor affecting the explanatory variable that is not systematically related to the observed explanatory variables whose effects are of interest. On the other hand, in empirical applications, many times this assumption is too strong and therefore applied researchers prefer to avoid it by allowing for some type of statistical dependence between individual time-varying heterogeneity and explanatory variables. More precisely, for example, it is commonly assumed that the expected value of the random heterogeneity term, conditionally on the set of values of the explanatory variables, is constant and varies only across individuals. This is the so-called fixed effects model. Under this assumption, if the number of time observations (T ) is fixed, it raises the incidental parameters issue because when the sample size increases, that is, when the number of individuals (N ) grows, the number of parameters to be estimated also increases (see Lancaster (2000) for a survey). The classical law of large numbers or central limit theorems relies on the assumption that the number of unknown parameters to be estimated remains fixed as the sample size increases. Therefore, in this case, they do not apply straightforwardly (see Neyman and Scott, 1948). Under this setting, it is clear that standard estimation techniques for random effects panel data models might result in miss-leading inferential results and hence, more specific estimation techniques are needed.
If panel data models are linearly separable in the innovation and the individual-specific term, a simple linear transformation can eliminate the random individual heterogeneity from the transformed model (see Anderson and Hsiao, 1981). However, if this relationship is nonlinear, there is no general rule of transformation to eliminate the incidental parameters existence. In this case, a specific structure for the nonlinear model needs to be specified in order to find an appropriate transformation to eliminate the incidental parameters. We refer to Arellano (2003), Baltagi (2013), or Hsiao (2014) for an intensive review of techniques devoted to estimate panel data models and Maddala (1987) to obtain good arguments about random versus fixed effects. Finally, although from the applied point of view, fixed effect models seem to be more popular and useful, random effects models are still important for economics and statistics, for example, in situations where the researcher is interested in estimating time invariant effects (without using instrumental variables [IVs]), which, in general, do not work with fixed effects models.
Nevertheless, the suitable treatment of these unobserved heterogeneity effects is not enough to guarantee proper statistical properties for the estimators of interest. In most cases, estimation of the parameters of interest also depends on some statistical restrictions imposed on the data generating process. However, sometimes these assumptions are too restrictive with respect to functional forms or densities and the risk of misspecification is high. If this is the case, the resulting estimators can lead us to missleading inference. In this context, nonparametric panel data models are very appealing since they do not make too restrictive assumptions on the specification of the model and they allow data to tailor the shape of the regression function by themselves. However, in some situations this flexibility presents some drawbacks. First, it can be unable to incorporate prior information so the resulting estimator for the unknown function tends to have a higher variance term. Second, it is subject to the so-called curse of dimensionality, which practically disables standard nonparametric methods when the number of explanatory variables is high. In order to solve these shortcomings, semiparametric panel data models appear as a reasonable compromise between fully nonparametric and parametric models. In fact, they enable us to incorporate some prior information coming from economic theory or past experience by keeping at the same time more flexibility in the specification of the model. Furthermore, although there is a nonparametric part that shows a slower rate of convergence, the estimators obtained from the parametric part do exhibit the same statistical properties as if the whole model would be fully parametric. That is the so-called √ N -consistency property, see among others Robinson (1988) or Speckman (1988). For early discussions on semiparametric panel data models see Ullah and Roy (1998), while we refer to Ai and Li (2008) for a review about partially linear and limited dependent nonparametric and semiparametric panel data models.
In this paper, we provide an intensive review of the recent developments for semiparametric and fully nonparametric panel data models that are linearly separable in the innovation and the individual-specific term. Furthermore, we analyze these developments under two alternative settings, the so-called fixed and random effects panel data models. Note that Su and Ullah (2011) focus on similar modelings, although in this case we include the most recent results and pay special attention to the so-called incidental parameters problem as well as with endogenous explanatory variables. Meanwhile, in Chen et al. (2013) this type of models are studied when deterministic trends and single-index specifications are present.
The rest of the paper is organized as follows. In Sections 2 and 3, we analyze the literature about nonparametric panel data models with random and fixed effects, respectively. In Section 4, we focus on semiparametric models with random effects. In Section 5, we study the corresponding models with fixed effects. Section 6 refers to nonparametric and semiparametric panel data models when the presence of endogenous explanatory variables is allowed. Finally, Section 7 concludes.

Nonparametric Panel Data Models with Random Effects
The basic nonparametric unobserved effects model can be written, for a randomly drawn cross-section observation i, as where Z it is a q × 1 vector of observable explanatory variables, m(·) is an unknown function that needs to be estimated, and it is an unobservable error term. Typically, in panel data analysis, the error term of the model follows a one-way error component structure of the form where v it is referred as the idiosyncratic error term and μ i is called the unobserved individual heterogeneity. Through the paper we will assume the following: Note that, the first equality establishes the relationship between the dependent variable Y and the past values of Z . Furthermore, the second equality constrains the regression function to be the sum of a nonparametric function, m(Z it ), plus an unobservable heterogeneity term, μ i . Using (1) and (2) note that assumption (3) can be stated in terms of the idiosyncratic errors as Let v i = (v i1 , . . . , v it ) be a T × 1 vector. The error vector v i and the heterogeneity term μ i are such that where I T is a T × T identity matrix. Furthermore, let i = ( i1 , . . . , it ) be a T × 1 vector and let E( i i ) be a T × T matrix. Under the assumptions above note that and and ı T is a T -dimensional vector of ones. Finally note that for a randomly drawn cross-section observation i, the vector of explanatory variables Z i1 , . . . , Z it is strictly stationary, whereas for fixed t the vector Z 1t , . . . , Z Nt are independent and identically distributed (i.i.d.) random variables. In general, the asymptotic behavior of the estimators that appear in the paper is analyzed in the standard panel data framework where N tends to infinity and T is fixed. In those particular cases where other asymptotic behavior is needed, it will be pointed out.
All previous assumptions will be common for both random and fixed effects nonparametric panel data models. Now, to characterize the random effects model, we further assume that Note that using (1) and (2), applying assumptions (3), (4), (5), and (8) and by the law of iterated expectations we obtain that Hence, the function m(·) and its first-order derivatives can be directly estimated through a pooled standard nonparametric technique. However, the resulting estimator is inefficient given that the composed error term is serially correlated by the presence of μ i in each time period. Hence, it should be possible to improve the efficiency of the estimator by taking into account the information contained in the variancecovariance matrix. Among others, in Ullah and Roy (1998), Lin and Carroll (2000), and Su and Ullah (2007) several nonparametric estimators of m(·) and its derivatives are considered. Furthermore, and with the aim of achieving efficiency, in Ruckstuhl et al. (2000), Wang (2003), and Henderson and Ullah (2005), different strategies are proposed to incorporate the information contained in the disturbances.

Local Linear Least-Squares (LLLS) versus Nadaraya-Watson Estimators
For any z ∈ A, where A is a compact subset in IR q , the basic idea behind the standard nonparametric estimation of m(z) = E(Y it |Z it = z) is to obtain a smoothed average of the Y it values by taking into account the values of Z it contained in a small interval around z. In order to understand further developments, it is useful to start with the analysis of the univariate case, where q = 1. Then, taking a Taylor expansion of the unknown smooth function m(·) around z, we obtain, The above exposition suggests that we can estimate m(z), m (z), . . . , m ( p) (z) by regressing Y it on the terms (Z it − z) λ , for λ = 0, 1, . . . , p, with kernel weights. Thus, the quantities of interest can be estimated by minimizing the following criterion function, with respect to the values, γ 0 , . . . , γ p , where γ 0 = m(z), γ 1 = m (z), and γ p = m ( p) (z). Let us denote by γ 0 , . . . , γ p the solution to the minimization problem. Then, the above exposition suggests that m(z; h) = γ 0 , m (z; h) = γ 1 , and m ( p) (z; h) = γ p . Note that h is the bandwidth that needs to be selected empirically and K h (u) = 1 h K (u/ h) is the so-called kernel function that must fulfill the following conditions: The kernel function is a weight function defined in such way that, for fixed h, it takes values close to zero when Z it is far away from z. The solution to the problem above is the so-called local polynomial regression (see Ruppert and Wand (1994), Fan and Gijbels (1995), and Zhan- Qian (1996) for a detailed description of this technique). As it is pointed out in Ullah and Roy (1998) and the value of γ 0 that minimizes (13) is This is the so-called Naradaya-Watson estimator proposed alternatively in Nadaraya (1964) and Watson (1964). When p = 1 and q > 1, the previous Taylor expansion can be rewritten as where D m (z) = vec(∂m(z)/∂z ) is a q × 1 vector of partial derivatives of the function m(z) with respect to the elements of the q-vector z. Then, as it is suggested in Ullah and Roy (1998), Lin and Carroll (2000), Ruckstuhl et al. (2000), Henderson and Ullah (2005), and Su and Ullah (2007), among others, m(z) and its first-order derivatives are estimated by minimizing the following criterion function: where we denote by γ = ( γ 0 γ 1 ) a (1 + q)-vector that minimizes (15). Thus, the above exposition suggests m(z; h) = γ 0 and D m (z; h) = γ 1 as estimators for m(z) and D m (z), respectively. Assuming Z z K z Z z is nonsingular, the solution (15) in matrix form can be written as where e 1 is a (1 + q) selection vector having 1 in the first entry and all other entries 0. Under the previous assumptions, imposing some smoothness conditions on both m(·) and f (·) and letting h → 0 in such a way that N h q → ∞, as N → ∞ and T is fixed, in Kneisner and Li (1996) and Let H m (z) be the Hessian matrix of m(·) and let D f (z) be the first-order derivative vector of the density function of q × q and 1 × q dimension, respectively, the conditional bias terms are Based on these results, we can highlight that although the asymptotic variance of these two estimators is the same, the bias is not. More precisely, the bias of the LLLS estimator only depends on the curvature of m(·) at z in a particular direction whereas the bias term of the Nadaraya-Watson estimator emerges mainly from both the curvature of m(·) and the term D m (z)D f (z) f (z) −1 . Furthermore, it is well known that the local linear estimator usually exhibits a better performance near the boundary of the support of the density function; see Fan (1993) for more details. Note that, under the conditions above, both bias and variance terms converge at the same rate, √ N h q . This makes the comparison of both estimators in terms of relative efficiency rather difficult because we would need to compare asymptotic mean squared errors. Just to avoid it, we choose the sequence of bandwidths, h ≡ h(N ), in such a way that N T h 4+q → 0, as N tends to infinity. By imposing this rate the variance term dominates the bias asymptotically, and therefore we can compare variance terms only.
Note that, by assuming just conditional heteroskedastic errors in , the variance term takes the form where σ 2 (z) can be replaced by its consistent estimator, σ 2 = (ı N T K z ı N T ) −1 ı N T K z 2 . In this case, 2 is the vector of nonparametric squared residuals. If in the previous expression we impose condition (5) then the conditional variance-covariance matrix can be written as where is the variance-covariance matrix of the error term defined as in (7). Finally, note that under the assumptions above m N W (z; h) and m L L L S (z; h) are equally efficient. As we will show in the following subsection, a relative efficiency improvement can be made by defining an estimator that accounts for the compounded error term assumed in (2).

Local Linear Weighted Least-Squares (LLWLS) Estimator
With the aim of accounting for the variance-covariance assumed in (7), under different specifications of the weighting matrix, in Henderson and Ullah (2005) it is proposed a feasible nonparametric random effects estimators for the two estimators developed in Lin and Carroll (2000) and an alternative version for the estimator introduced in Ullah and Roy (1998). Following the same lines as in the previous subsection, it is proposed to obtain estimators for m(z) and D m (z) by minimizing the following criterion function with respect to γ : where γ and Z z are defined as in (16). Let W z be a weighting matrix based on the kernel function that contains the information of the error structure, in Henderson and Ullah (2005) it is proposed the following LLWLS estimator for m(z): The first step of this procedure is to propose a specific form for W z . Specifically, Lin and Carroll (2000) use two types of weighting matrices, W z = K 1/2 z −1 K 1/2 z and W z = −1 K z , whereas in Ullah and Roy (1998) an estimation procedure with W z = −1/2 K z −1/2 is developed. Note that when is a diagonal matrix, these alternative specifications for W z are the same.
Furthermore, note that (25) is an infeasible estimator for m(·) given that depends on some unknown terms, that is, σ 2 v and σ 2 μ . Therefore, in order to get a feasible solution for the minimization problem (24), an estimator for this covariance matrix is necessary. Following this idea and based on the spectral decomposition of , in Henderson and Ullah (2005), a local linear feasible weighted least-squares estimator where the unknown covariance components are replaced by their consistent estimators is developed.
Let it = Y it − m L L S (Z it ; h) be the LLLS residual, in Henderson and Ullah (2005) it is proposed to estimate the unknown terms of the variance-covariance matrix (7) as where i· = T −1 T t=1 it . By plugging these consistent estimators into (7) the following is obtained: where σ 2 1 and σ 2 v are defined in (26). Then, replacing with in W z in (25), in Henderson and Ullah (2005) the following feasible local linear weighted least-squares (FLLWLS) estimator is proposed: where W z is either K 1/2 z −1 K 1/2 z or −1 K z and is the result of plugging (26) into (7). In addition, they show that under some standard regularity conditions, for N large and T fixed, the asymptotic bias and variance of m L LW L S (z; h) are bounded by O p (h 2 ) and O p ((N h q ) −1 ), respectively, and they hold for m F L LW L S (z; h). See Lin and Carroll (2000) for a detailed analysis of the proof of these results.
Nevertheless, note that Lin and Carroll (2000) and Henderson et al. (2008) demonstrate that these methods of accounting for the correlation could lead to losses of efficiency in comparison to the working independence method proposed in Lin and Carroll (2000). Specifically, these authors argue that higher efficiency is obtained by assuming independence rather than using the correlation structure. The reason is that since h → 0 asymptotically, the chance of having more than two observations from the same subject in the local estimation procedure is small. Hence, the observations locally will come from different subjects, which are assumed to be independent.

Local Linear Two-Stage Least-Squares Estimator
In order to develop a procedure that enables us to consider the information of the variance-covariance matrix of the error term for the estimators and, at the same time, to improve the efficiency with respect to the LLLS estimators, in Ruckstuhl et al. (2000) a two-step nonparametric procedure is proposed.
More precisely, these authors argue that the slower rate of convergence of the LLLS estimator is due to the elements of the off-diagonal of −1 . To solve it, they propose to estimate a nonparametric model that only depends on both the unknown function m(·) and the error term that is i.i.d. Thus, unlike what was proposed in Lin and Carroll (2000) and Ullah and Roy (1998), the intuitive idea of the approach developed in Ruckstuhl et al. (2000) is to multiply both sides of (1) by the square-root of −1 and add and subtract m(Z ) obtaining . . , m(Z N T )) and = ( 11 , . . . , N T ) are N T -dimensional vectors. Note that −1/2 satisfies the independence condition because it exhibits an identity variance-covariance matrix.
In order to provide feasible estimators of (29), in the first step Ruckstuhl et al. (2000) propose to obtain the LLLS estimator for the unknown functions of (1) and the corresponding residual term that enables us to compute the matrix as in the previous section. Later, in the second stage, they use this result to compute Y * = −1/2 Y + (I − −1/2 ) m L L L S (Z ; h) and regress Y * on Z through the local polynomial regression method. Thus, these authors provide the following local linear two-step least-squares (LL2SLS) estimator where Z z and K z are defined as in (16). Later, in Martins-Filho and Yao (2009) this two-step procedure to propose a local linear estimator in a regression model where the error term has a nonspherical covariance structure and the regressors are dependent and heterogeneously distributed is used. Alternatively, in Wang (2003) it is shown that the efficiency result obtained in Lin and Carroll (2000) is a natural consequence of how standard kernel methods incorporate the within-subject correlation to control the bias, but at the price of ignoring some input from correlated elements within each individual.
In order to consider this information and reduce the variance simultaneously, in Wang (2003) a twostep procedure that achieves asymptotic improvements over the working independence technique if the covariance is correctly specified is proposed.
In order to efficiently use all the correlated data within a subject, the basic idea proposed in Wang (2003) is as follows: once a data point from one subject is near the estimation point and significantly contributes to the local estimation, all data points from this subject will be used. To avoid the bias, the contributions of all these data points except the data point near the local estimation point are through their residuals. Then, for the nonparametric model (1) this two-step procedure can be described as follows: Step 1. Compute an initial nonparametric estimator for m(z), say m(z; h), using, for example, the working independence method.
Step 2. Obtain the final estimator for m(z), say m(z; h), by solving the following kernel weighted estimating equation where i is the variance-covariance matrix (7) for the i-th subject while we define G i to be a T × (q + 1) matrix with the t-th row to be Z zit = [1, (Z it − z)] and 0 otherwise. Thus, the s-th element of m * it (γ ) is Z zit γ when s = t and the s-th element of m * it (γ ) is m(Z is ; h) when s = t. Note that γ is defined as in (16).

Nonparametric Panel Data Models with Fixed Effects
In this section, we maintain assumptions (3), (4), and (5) about the data generating process but we replace (8) by This new assumption introduces a relationship of statistical dependence between the heterogeneity term, μ, and the explanatory variables, Z 1 , . . . , Z q . Using now (1) and (2), applying assumptions (3) to (5) and (32) and by the law of iterated expectations we obtain that Given the specification in (33) it is clear that direct estimation of m(·) through standard nonparametric techniques, as in the previous section, would result in estimators with nonnegligible asymptotic bias. As in the fully parametric case, several estimation methods have been developed to estimate nonparametric panel data models with fixed effects of the form (33); see Hsiao (2014), Wooldridge (2010), and Baltagi (2013), for example. As we appreciate hereinafter, they can be classified into two broad approaches. On the one hand, there is a first type of nonparametric estimators that use differencing transformations to remove the unobserved individual heterogeneity from the structural model. Thus, the unknown function of the transformed model can be estimated consistently through a direct nonparametric approach. On the other hand, a second type of estimators based on the spirit of the least-squares dummy variable (LSDV) approach are proposed to estimate the function of interest, that is, m(·). In what follows, we review the latest nonparametric literature based on both approaches. Later, we focus on the resulting estimators for different specifications of these nonparametric models, that is, allowing for additive structures of the unknown smooth function or the presence of time lagged endogenous explanatory variables.

Profile Least-Squares Estimators
When we want to estimate directly m(·) in (33) we need an estimation procedure that takes into account the information contained in the unobserved individual heterogeneity. Following the idea of the LSDV approach, a profile least-squares method can be proposed. In this section, we first analyze this method under the different identification conditions considered in Sun et al. (2009), Su and Ullah (2011), Gao and Li (2013), and Lin et al. (2014), and show why it is so important to impose strong identification conditions in this setting. Later, we focus on alternative feasible forms of the local linear approach according to Li et al. (2013). . . . , v N T ) vectors of N T × 1 dimension and denote μ = (μ 1 , . . . , μ N ) to be a N -dimensional vector and D = (I N ⊗ ı T ) an N T × N dummy matrix. Proceeding as in Section 2.1, we choose an estimator for m(z) that minimizes where K z is a N T × N T diagonal kernel weighting matrix. Let z be an interior point of the neighborhood of Z , replacing m(Z ) by m(z), the first-order condition with respect to m(·) yields the following local constant kernel estimator for m(·): However, note that μ is not directly observable so this local constant estimator is infeasible. In order to solve it, we can minimize (34) with respect to μ obtaining Then, if we substitute (36) into (34) and rearrange terms we obtain the following (concentrated) weighted least-squares criterion function where Consider again an interior point, z, of the neighborhood of Z . Replacing m(Z ) by m(z), the first-order condition with respect to m(·) in (37) yields the following local constant kernel estimator: It is important to highlight that the weighting matrix W z has been designed to directly remove any time invariant term in the structural model (33). To see this, note that M(z)D = 0. However, since ı N T m(z) is time invariant and W z ı N T ≡ 0, the matrix ı N T W z ı N T is noninvertible so the resulting estimator of this method is infeasible. See Lin et al. (2014) for a detailed description of this problem.
To overcome this situation, it is necessary to use a weighting matrix that removes the unobserved cross-sectional heterogeneity either completely or asymptotically and, at the same time, enables us to select only those values of Z it close to z. In other words, we need a weighting matrix that enables ı N T W z ı N T to be invertible and (ı N T W z ı N T ) −1 ı N T W z Dμ asymptotically negligible. If we assume that μ i is an i.i.d. random variable with zero mean and finite variance, in Lin et al. (2014) it is suggested to asymptotically remove the individual effects via the proposal of a new weighting matrix W z0 that satisfies Su and Ullah (2006b), Sun et al. (2009), andLin et al. (2014) propose to replace D with D 0 in (34). Then, considering a local linear instead of a local constant approximation, the LLWLS criterion becomes where Z z and γ are defined as in (16). As in (34), the quantities of interest can be estimated by minimizing the following criterion function: where now the weighting matrix is in such a way that M 0 (z)D 0 = 0. Let γ = ( γ 0 γ 1 ) be a (1 + q) vector of minimizers of (40). Then, the profile LLWLS estimator is Note that although this new weighting matrix enables us to obtain feasible estimators, the resulting estimator has an extra component in the bias term that comes from the existence of unobserved crosssectional heterogeneity. See Theorem 2.1 in Lin et al. (2014) for the profile local constant estimator. In this framework, a standard solution is the estimation of the nonparametric regression under further strong identification conditions regarded to the individual effects. Specifically, Mammen et al. (2009) and Su and Ullah (2011) Gao and Li (2013) develop a profile least-squares method under the condition E(μ i ) = 0. As it is proved in Sun et al. (2009) for partially linear models, this stronger identification condition allows us to obtain standard asymptotic properties in the nonparametric framework and, simultaneously, override the individual effects.
Alternatively, in Li et al. (2013) a profile method in which it is not necessary to pay special attention to the invertibility problem noted in Lin et al. (2014) and Gao and Li (2013) is proposed. Again, assuming N i=1 μ i = 0, a profile LLLS method for the nonparametric components of the regression function is proposed. But, unlike the previous methods, in Li et al. (2013) it is assumed that μ 0 is known and a leastsquares procedure for the nonparametric components in β can be proposed. In this way, the quantities of interest can be estimated by minimizing the resulting criterion function of a local linear fitting with respect to γ obtaining As previously, this estimator is not feasible but we can multiply the true model by e 1 (Z z K z Z z ) −1 Z z K z and choose the μ 0 that minimizes the following criterion function: Thus, the minimizer of (43) is of the form and replacing μ 0 with μ PLLLS in (42) the profile local weighted linear least-squares estimator is Finally, analyzing the asymptotic normality of (45), in Li et al. (2013), it is shown that under standard smoothing conditions, as N → ∞ and q = 1, Note that, as we will detail in later sections, these asymptotic properties are similar to the result obtained in Su and Ullah (2006b) for partially linear panel data models with fixed effects. In fact, although this profiling technique enables to obtain estimators with a negligible asymptotic bias, in most cases additional assumptions such as N i=1 μ i = 0 or E(μ i ) = 0 are needed. Furthermore, in most part of the estimators quoted previously the asymptotic analysis is performed when both N and T tend to infinity. Thus, with the purpose of using standard assumptions as the ones used in the previous sections of this paper and standard asymptotic theory, that is, letting N tend to infinity and fixed T , in the following subsection differencing transformations are studied.

Differencing Estimators
As it has been pointed out previously, to overcome the main difficulties of the profile techniques and to remove the individual effects from the regression model, differencing transformations are proposed. In this section, we first review the resulting estimators from the first-derivative function proposed in Mundra (2005) and Lee and Mukherjee (2008). Later, the iterative nonparametric kernel estimator based on a profile likelihood approach in Henderson et al. (2008) is analyzed.
As in the fully parametric case, there are several transformations from the model of interest, which enable us to remove the heterogeneity of unknown form that does not vary in time. Among the most popular, we consider the first differences and the differences from the mean. First differences transformation can be understood as the subtraction from time t of (33) that of time t − 1, that is, or that of time 1, that is On the other hand, differences from the mean implies subtracting from time t the within-group mean, that is, As the reader can realize, the right-hand side of (47), (48), or (49) is linear combinations of m(Z it ) for different periods t. As noted in Su and Ullah (2011), this makes the estimation of m(·) rather cumbersome because m(·) takes the form of an additive function whose elements share the same functional form.
Assuming m(·) admits some number of derivatives and all assumptions introduced in Section 3 are fulfilled, in Ullah and Roy (1998) it is proposed to use either first differences or mean deviation transformations and then, take a linear approximation of the unknown function m(·) around z. By doing so, they expected that the resulting first-difference and fixed-effects estimators for the marginal effects of m(·) (i.e., the partial derivatives of m(z) with respect to z) satisfy the standard properties of the local linear regression approach. However, as it is proved in Lee and Mukherjee (2008), this statement is not true because this technique provides estimators with a nonnegligible asymptotic bias.
For the sake of simplicity, let us consider the univariate problem (q = 1) of the first differences regression model (47). By approximating m(·) via a Taylor expansion we obtain for some ξ ∈ IR between Z it and z.
On the contrary, the transformed regression of the mean deviation (i.e., within-group) expression is where the corresponding error term is For the transformed regression models (50) and (51), Lee and Mukherjee (2008) propose the following local linear estimators for the first-order derivatives, and This definition is similar forZ it andv it . For the sake of simplicity, when analyzing the asymptotic properties of these estimators Lee and Mukherjee (2008) use the leave-one-out average in (51) instead of the within-group mean.
Let Z = (Z 11 , . . . , Z N T ) be the vector of observed values of the explanatory variable, under the conditions established at the beginning of Section 3, these authors obtain the following conditional bias for these two local linear estimators and Analyzing these results, we can highlight that these two local linear estimators exhibit a nonnegligible asymptotic bias. More precisely, as it can be seen in (52) and (53), the nondegenerated bias is due to the fact that the transformed regression equations are localized around Z it , without taking into account all other values. Consequently, the distance between Z is and z cannot be controlled by a fixed bandwidth parameter h so the residual terms of the Taylor approximation do not vanish. Therefore, it is not possible to assume that v it (z) and v it are close enough so we can conclude that the local linear regression approach provides inconsistent estimators by the correlation between the transformed error terms v it (z) and the transformed regressors Z it . The same can be said forv it (z) andZ it .
To the best of our knowledge, there are two different strategies to overcome this problem. On the one hand, Lee and Mukherjee (2008) propose the estimation of a local within transformation that uses a locally weighted average to remove the fixed effects. On the other hand, Mundra (2005) develops a direct procedure based on the use of higher dimensional kernel weights. In the following, we detail the main particularities of both techniques.
In order to remove the unobserved individual heterogeneity and, at the same time, propose estimators that take into account all the values of the regressors involved in the estimation, in Lee and Mukherjee (2008) it is proposed to follow a differencing strategy that uses the locally weighted average of Z it , for a given z, to remove the fixed effects. Let We define Y i· (z) and v i· (z) in a similar way as the locally weighted averages of Y it and v it (z), respectively.
, the functions of interest can be estimated as the values of β that minimize the following criterion function: Let us denote by γ the value of γ that minimizes (58). Proceeding as in other previous local constant regression problems, Under the same conditions as in (54) and (55), in Lee and Mukherjee (2008) it is shown that this local weighted linear estimator m LW A (z; h) has the following conditional bias and variance: and where κ j = z j K (z)dz, for j = 2, 4, and ϕ 2 = z 2 K 2 (z)dz.
Looking at these results, we can point out that both conditional bias and variance terms tend to zero as h → 0 and N T h 3 → ∞, when both N and T tends to infinity. Therefore, m LW A (z; h) is a consistent estimator for m (z). However, note that the variance term is of order O p (1/N T h 3 ). This makes the rate of convergence of this estimator rather slow with respect of the standard rate of convergence of these family of estimators that would be of Another way to overcome this problem is to use a higher dimensional kernel weight. As it is suggested in Mundra (2005), the bias associated to (52) can be removed by considering a local approximation around the pair (Z it , Z i(t−1) ) obtaining the following first-difference local linear estimator As the reader can appreciate, these procedures are very appealing as they provide estimators for local marginal effects in a framework of differencing models. However, they are unable to identify the function m(·). In this context, in Henderson et al. (2008), to estimate m(·), a nonparametric kernel estimator is proposed based on an iterative profile likelihood approach. More precisely, in their paper they propose the following differencing transformation in (33): In order to estimate m(z), following Wang (2003) and Lin and Carroll (2006), in Henderson et al. (2008) a profile likelihood approach is proposed. In fact, the likelihood function for the i-th individual is defined as Defining L i,tm = ∂L i (·)/∂m it , where m it = m(Z it ), the unknown function m(z) can be estimated by solving the following first-order condition: Based on this structure, Henderson et al. (2008) develop an iterative procedure. Specifically, let m [ −1] (z) be the current estimator at the [ − 1]-th step, in the next step they propose to estimate m [ ] (z) = α 0 (z), by choosing ( α 0 α 1 ) as the minimizers of This procedure will iterate till convergence. Note that as it is pointed out in Henderson and Parmeter (2015) the actual derivative of m(z) for a particular explanatory variable requires you to divide α 1 (z) by the bandwidth for that particular explanatory variable. Under the assumption that h r ∼ N −1/(4+q) , for r = 1, . . . , q, and by defining κ = K 2 (u)du and κ 2 = u 2 K (u)du, as N → ∞ in such way that h r → 0 and N h 1 · · · h q → ∞, they obtain where

is a bounded and continuous function that is the solution to
where d ts = 1 if t = 1 or s = 1, and d ts = −1 otherwise. Similarly to other nonparametric estimators developed for differencing models, this iterative estimator has the advantage of completely removing the unobserved individual heterogeneity. However, it is true that the estimator does not achieve the optimal rate of convergence for this type of nonparametric estimators. Alternatively, other authors propose consistent estimators for the m(·) function in this context of differencing models. On the one hand, Baltagi and Li (2002) use the series approximation to estimate the nonparametric component. On the other hand, Qian and Wang (2012) propose a two-step procedure for a partially linear model with fixed effects. As we will detail in Section 5, in the first step the fully nonparametric component, that is, 1) ) is estimated using a multivariate nonparametric estimator. Later, in the second stage they turn to the marginal integration technique proposed originally in Linton and Nielsen (1995). Note that the marginal integration method presents some awkward features such as its high computational cost. Specifically, to obtain an estimator of this type we must compute O(N T 3 h 1/2 ) operations, that is, if we have to make N T 2 regressions each one requires O(N T h 1/2 ) operations. Therefore, other estimation techniques may be preferred.

Nonparametric Additive Panel Data Models
As it has been already pointed out previously, nonparametric smoothing regression techniques have been intensively used in the last few decades since they enable us to consider some hidden features of the data that cannot be captured by a predetermined parametric model. However, they exhibit an important drawback: The curse of dimensionality. That is, the rate of convergence of nonparametric estimators slows down as the number of explanatory variables enlarges. Nevertheless, there are situations where the researcher needs to handle a large number of these variables. In these cases, it is recommended to estimate m(·) in (1) by imposing the additional additivity restriction In the random effects setting (see assumptions (1), (2), and (3) to (8)), direct application of standard backfitting (see Hastie and Tibshirani, 1990) or marginal integration techniques (see Linton and Nielsen, 1995) provides consistent estimators for m(·) and the additive functions. However, if we introduce the fixed effects setting, (i.e., replacing (8) by (32)) the estimator that results from applying these techniques does not exhibit the same desirable statistical properties as in the random effects case. In Mammen et al. (2009) the estimation of a nonparametric additive panel data model under different forms of the unobserved heterogeneity and for two asymptotic frameworks, N → ∞ and T fixed and both N , T → ∞, is considered. In order to do so, they rely heavily on the smoothed backfitting approach introduced in Mamen et al. (1999). The nonparametric model that they propose to estimate presents mainly three differences with respect to the fixed effects panel data model introduced at the beginning of this section. First, the additivity restriction (68) is introduced in the model. Second, temporary effects are also considered. Third, among the explanatory variables there can be included time lagged values of Y it . Then, the model to estimate is Note that we denote by η t the temporal effects. Just to understand the estimation technique we will introduce some more details in the characterization of the explanatory variables Z it . In Mammen et al. (2009) it is assumed that Z it has a density on [a, b] = [a 1 , b 1 ] × · · · × [a q , b q ]. The conditional density of Z it given that Z it lies in [a, b] is denoted by f t . For simplicity of notation, a 1 = · · · = a q = 0 and b 1 = · · · = b q = 1. Thus, [a, b] = [0, 1] q . Estimation of m 1 (·), . . . , m q (·) will be considered on [0, 1]. We denote by n the number of explanatory variables Z it ∈ [0, 1] q for i = 1, . . . , N , t = 1, · · · , T . The numbers n t and n i are respectively the number of Z it ∈ [0, 1] q for fixed t and the number Z it ∈ [0, 1] q for fixed i. The one-and two-dimensional marginals of f t are denoted by f t j (Z j ) or f t j,k (Z j , Z k ), respectively. The one-and two-dimensional marginals of f i are denoted by The local constant smooth backfitting estimator m 1 , . . . , m q for the functions m 1 , . . . , m q proposed in Mammen et al. (2009) is based on kernel smoothing. Using the following modification of a convolution kernel: e l s e the estimators for m 1 , . . . , m q , μ 1 , . . . , μ N and η 1 , . . . , η T are defined as minimizers of a smoothed least-squares criterion, under the following constraints: The functions f j (·) are the following kernel density estimators based on the explanatory variables: Based on (70) note that the estimators only use the values of the Y variable in the smoothing if the corresponding values of the explanatory variables lie in [0, 1] q . By using derivatives of the criterion function (70) one gets that the minimizers (estimators) must fulfill where m j , η j , μ i are the following marginal estimators: and the functions f jk , f t j , f i j are the estimators for the kernel density of the form For the sake of clarity, we reproduce here the algorithm for calculating the local constant smoother given in Mammen et al. (2009, p. 448). In fact, our aim here is to estimate the function m j (·) at some given points Z 0 j . Equations (74), (75), and (76) suggest an iterative calculation of estimators. Application of (74) for j = 1, . . . , q can be used to update m j . In each application one plugs the current values of m k , j = k, and of both μ i and η t into the right-hand side of (74). Afterwards, one uses (75) and (76)  Step 1. Set a = 0 and calculate the smoothing weights around Z 0 j .
The local constant estimator, m j , exhibits a complicated bias. In order to avoid this problem, in Mammen et al. (2009) it is proposed to use local linear estimators. Then, intercepts m 1 , . . . , m q , slopes m 1 , . . . , m p , and both individual and time effects, μ 1 , . . . , μ N and η 1 , . . . , η T , are defined as minimizers of under the following constraints: By changing accordingly the steps of the algorithm proposed previously for the local constant estimator, we obtain estimators for m j that under the conditions established in Section 3 and some additional conditions detailed in Mammen et al. (2009) we obtain that, as both T and N tend to infinity in such a way that T 3/2 N −1+δ → 0 for some δ > 0,

Nonparametric Dynamic Panel Data Models with Fixed Effects
In the previous section, we have analyzed a rather complex situation where lagged endogenous explanatory variables and additive models are allowed for in a fixed effects context. However, the need of considering some dynamics in fixed effects panel data models appears of great interest even if the curse of dimensionality is not present. More precisely, we consider the following fixed effects nonparametric panel data model where Z it is a q × 1 vector of explanatory variables, Y i(t−1) a scalar lagged dependent variable, μ i the cross-sectional heterogeneity, and v it the error term. All assumptions introduced at the beginning of Section 3 still hold here but with the additional assumption of the presence of lagged endogenous variables as explanatory variables. Using a first difference transformation to remove the fixed effect, we obtain Note that after this differencing transformation the error term v it has the form of moving average process of order 1, MA(1), that, in general, is correlated with the explanatory variable Y i(t−1) . Therefore, in this setting, the conventional kernel estimation based on marginal integration or backfitting procedures does not provide consistent estimators for m(·).
In this framework, in Su and Lu (2013) an iterative estimator that is based on a local polynomial regression technique is developed. Let U i(t−2) = (Y i(t−2) Z i(t−1) ) and assume U i(t−2) has a positive density on ϕ, f t−2 (·), where ϕ denotes a compact set on IR q+1 . Then, since v it is (conditionally) meanindependent of U i(t−2) , by the law of iterated expectations the following conditional moment condition can be obtained: and then Rearranging terms in (93) we obtain, where is the conditional density function of U i(t−1) given U i(t−2) and u is the mean value of u.
For the sake of simplicity, let us denote by ρ t−2 = P(U i(t−2) ∈ ϕ) and ρ = T t=3 ρ t−2 , so if we multiply both sides of (94) by ρ t−2 /ρ and use the fact that T t=3 ρ t−2 /ρ = 1 we get Under certain regularity conditions (95) can be rewritten as where A is a bounded linear operator defined as Am(u) = m(u) f (u|u)du. Therefore, from (96) we can intuitively conclude that the estimator for the parameter of interest m(·) can be defined as a solution to a second-order Fredholm integral equation in an infinite-dimensional Hilbert space. However, since both r and Am(u) are not directly observable, the resulting estimator for (96) is infeasible and an iterative procedure is needed. In this situation, in Su and Lu (2013) a plug-in estimator for m(·) that is the solution to where r and A are nonparametric estimators obtained from a local polynomial regression of pth order is proposed. In particular, r (u) can be estimated as the solution to the following criterion function: where |j| ≡ q i=1 j i and γ stacks the γ j 's (0 ≤ |j| ≤ q) that minimizes (98) in lexicographic order (with γ 0 indexed by 0 ≡ (0, . . . , 0) in the first position, the element with index (0, 0, . . . , 1) next, etc.). Also, note that let j 0 + j 1 + · · · + j q = k, Analogously, Am(u) is defined as the resulting estimator for Am(u) when − Y it is replaced by m(U i(t−1) ) in the problem to minimize (98). However, note that a feasible estimator for A needs to observe the m(·) function. In that case, in Su and Lu (2013) it is proposed to resort to the sieve method and, after obtaining m(u), to replace it in the final regression to estimate. See Chen (2007) for an intensive revision of the sieve method.
Let h! = q r=0 h 2 r and ||h|| 2 = q r=0 h 2 r , under certain standard smoothing conditions, Su and Lu (2013) also establish the uniform consistency and the asymptotic normality of the plug-in estimator when N → ∞, T is fixed, ||h|| → 0, N h!/logN → ∞ and N ||h|| 4 h! → c ∈ [0, ∞]. However, before proceeding to the analysis of the asymptotic behavior of this plug-in estimator, it is necessary to emphasize that this estimator only uses those observations of U i(t−2) that lie in a compact set ϕ on IR q+1 . Thus, allowing U i(t−2) to have a noncompact support facilitates the study of the asymptotic peculiarities of these estimators. In this context, they obtain that for the local linear estimator where A is a Hilbert-Schmidt operator and for μ 2 (K ) . Note that the asymptotic variance of this iterative estimator has a similar structure to that presented by a conventional local polynomial estimator for m (U i(t−1) when m(U i(t−2) ) is observed. In addition, as regards to the asymptotic bias, we can see that it shows significant variations with reference to the standard results. Specifically, this iterative estimator presents an additional operator, (I − A) −1 , which reflects the cumulative bias of the iterative process.
Furthermore, in Su and Lu (2013) it is noted that, although consistency and asymptotic normality of the resulting estimator are shown, it is possible to propose an oracle efficient estimator. That is, an iterative estimator that exhibits the same asymptotic properties as if the iterations were evaluated at the true parameter (function) values. Note that this concept of oracle efficiency is different from the standard concept of efficiency. As long as Y i(t−1) and Z it are compactly supported variables and the density function is bounded away from zero to the union of their supports (ϕ), they propose to use the infra-smoothing kernel estimator obtained previously as an estimator for m (Y i(t−1) . Finally, note that since the error process is not invertible, it is not possible to apply a similar procedure as in Xiao et al. (2003) and Su and Ullah (2006a) to provide more efficient estimators via the use of the MA(1) structure of this model.

Semiparametric Panel Data Models with Random Effects
As we have just shown, nonparametric panel data models are very appealing since they do not make too restrictive assumptions on the model specification and they allow data to tailor the shape of the regression function by themselves. However, as it has been pointed out in the "Introduction," in some situations this flexibility presents some shortcomings. To solve them, semiparametric panel data models appear as a reasonable compromise between fully nonparametric and parametric models. In fact, they enable us to incorporate some prior information coming from economic theory or past experience by keeping at the same time more flexibility in the specification of the model. Among the most popular semiparametric panel data model we consider here the so-called partially linear models.
Instead of (1), the basic partially linear unobserved effects model can be written, for a randomly drawn cross-section observation i, as where X it and Z it are vectors of explanatory variables of d × 1 and q × 1 dimension, respectively, β is a d × 1 vector of unknown parameters and m(·) is an unknown smooth function. Both objects need to be estimated and it is an unobservable error term. Typically, the error term of the model follows a one-way error component structure as in (2). Furthermore, instead of (3), now we assume Note that, this first equality establishes the relationship between Y and the past values of both Z and X , whereas the second one establishes that the regression function is the sum of a parametric term, X it β, a nonparametric function, m(Z it ), and an unobservable heterogeneity term μ i . Using (100) and (2), the assumption in (101) can be stated in terms of the idiosyncratic errors as Let v i = (v i1 , . . . , v it ) be a T × 1 vector. The error vector v i and the heterogeneity term μ i are such that . . , Z iT ) = σ 2 μ (103) As in the fully nonparametric case, under the assumptions above and since the observations are independent in i and j, the variance-covariance matrix of the composed error term has the standard form as in (7). Finally, to characterize the random effects model, instead of (8) we assume Note that using (100) and (2), under assumptions (101), (102), (103), and (104) and by the law of iterated expectations, Given the previous result, root-N consistent estimation of the parameters of interest, β, is possible through the use of standard techniques in partially linear models (see Robinson (1988) and Speckman (1988) among others). Then, following Li and Stengos (1996) and Li and Ullah (1998) the unknown function m(·) is removed from (100) by taking the conditional expectation on Z it in (100) and assuming Then, subtracting (106) from (100) we obtain Once the unknown term m(Z it ) has been removed we can estimate the parameters of interest, β, by using a standard OLS techniques and then where X is an N T × d matrix whose typical row element is X it = X it − E(X it |Z it ), and Y is an N T × 1 vector whose typical row element is of the form Note that both E(X it |Z it ) and E(Y it |Z it ) are unknown terms, and therefore (108) is an infeasible estimator. To overcome this problem, we typically substitute the unknown quantities by their estimators, that is, For the sake of simplicity, let us denote f h (Z it ) = f it . Replacing the unknown conditional expectations in (108) by their respective nonparametric estimators in (109)-(111) we obtain As in other applications of kernel regression, the estimators X it and Y it cause technical difficulties owing to the random denominator, f it , that can be small. In order to avoid it, we trim out small values of f as it is done for example in Powell et al. (1989). For a constant b > 0 define is the usual indicator function; so the feasible ordinary least-squares (FOLS) estimator for β can be written as Under standard regularity conditions that include those assumed at the beginning of Section 4, h, b → 0, as N tends to infinity, in Li and Ullah (1998) the following result is shown: are consistent estimators of and , respectively. Note that it is the it-th component of the matrix in (27).
Also, this estimation strategy can be extended to other types of situations of interest such as models where endogenous or lagged dependent variables are allowed as explanatory variables. As we will state in Section 6, in Li and Stengos (1996) an IV method to solve the endogeneity problem is proposed. Later, in Kneisner and Li (2002) a dynamic semiparametric panel data model and, under the assumption that the error term is serially uncorrelated, is analyzed and it is shown that it is possible to obtain a √ N -consistent estimator for β adapting the previous weighted density problem to this dynamic case.
Unlike other results such as those in Li and Stengos (1996), in Kneisner and Li (2002) a two-step local linear method to estimate the smooth function m(·) is proposed. By subtracting the (estimated) fully parametric part in both terms of (100) we have that Given that β FOLS = β + O p ( 1 √ N ), we can write the previous equation such as a standard nonparametric problem, that is, Then, m(Z it ) can be consistently estimated through standard nonparametric regression techniques. We refer to Kneisner and Li (2002) for the asymptotic properties of this type of estimators.
An alternative approach is introduced in Fan and Huang (2005). The main idea in this paper is to transform a semiparametric problem into a nonparametric one. This is done by subtracting the parametric component to both terms in (100). Then, If β were known, then m(·) can be estimated by a standard local linear regression problem. Then, let γ 0 and γ 1 be the minimizers of We suggest as estimators for m(z) and D m (z) = vec(∂m(z)/∂z ), m(z; h) = γ 0 and D m (z; h) = γ 1 , respectively, and where 0 q and ı q are q-vectors of zeros and ones, respectively, However, because β is a vector of unknown parameters that need to be estimated, we can replace m(Z it ) with m(Z it ; h) = γ 0 in (117), so the regression function to estimate now is of the form where (  We denote by β F L SS as the feasible semiparametric least-squares estimator for (121) of the form whereas the local linear estimator for m(·) is written as These results are standard in semiparametric partially linear models (see Robinson, 1988;Speckman, 1988): The presence of nonparametric components, typically estimated at nonparametric rates, does not affect the rate of convergence of β O L S , β FOLS , and β F L SS that is fully parametric ( √ N -consistency). It is also interesting to note that both OLS and FOLS estimators exhibit the same asymptotic variance that is, in both cases, regardless of the fact that nonparametric components are either estimated or taken at its true values, the asymptotic variance is the same. That is the so-called oracle efficiency property. However, it is true that they do not achieve the semiparametric efficiency bounds (see Chamberlain, 1992) due to the one-way error component structure that they ignore. In Li and Ullah (1998) it is suggested that the use of the structure of the variance-covariance matrix can be of interest in order to achieve this efficiency bound. As in the parametric case, they propose a feasible generalized least-squares (FGLS) semiparametric estimator for β.
As an estimator for they propose If we replace the unknown terms by these estimators in (7), the FGLS estimator for β is where (X − X ) I is a matrix of N T × d dimension with a typical row element (X it − X it ) I it and Y − Y is a vector of dimension N T × 1. Under some standard regularity conditions that include the assumptions established at the beginning of Section 4 in Li and Ullah (1998) it is shown that , and it is the it-th element of the matrix defined in (7).
As the reader can realize, this estimator achieves the semiparametric efficiency bound for this type of problem. The estimation strategy developed in Robinson (1988) can be easily extended to other contexts within the framework of partially linear panel data models. However, it is true that the presence of heteroskedastic errors in the model of interest complicates this procedure considerably. In this context, You et al. (2010) propose an alternative method to obtain consistent nonparametric estimators that take into account the one-way error component structure and allow for unequal error variances, that is, heteroskedastic errors.
More precisely, in You et al. (2010) a one-way error component structure with heterokedasticity in the idiosyncratic error of the following form is considered: where σ 2 v = 1 is assumed without loss of generality so the variance-covariance matrix of the error term is written as With this complex variance-covariance structure, we would need an estimator for each error variances and therefore the previous procedures cannot be used directly. In You et al. (2010) a semiparametric weighted least-squares estimator for β based on those previous results is developed. Specifically, they propose to estimate both the variance of the error term and the error structure and, later, use this information to obtain an efficient semiparametric estimator. More precisely, assuming that Z it is a scalar, in You et al. (2010) the following residuals are used: and because E( it is ) = σ 2 μ , when t = s, and E( 2 it |Z it ) = σ 2 μ + σ 2 (Z it ), consistent estimators for σ 2 μ and σ 2 (·) can be written as where ω it (z) is some weight function of the local linear estimator such that Replacing with , the feasible weighted least-squares semiparametric estimator (WSLSE) is Under the conditions established in Section 2 they show that, as N → ∞, , let i the i-th element of the matrix defined in (7) and

Semiparametric Panel Data Models with Fixed Effects
In this section, we are interested in statistical techniques that provide √ N -consistent estimators of the parameters β in (100) when the relationship between the heterogeneity term μ i and the explanatory variables X i1 , . . . , X it , Z i1 , . . . , Z iT is modeled as Then, using (100) and (2), under assumptions (101), (102), (103), and (133) and by the law of iterated expectations we obtain that By comparing (105) and (134) we realize that direct application of the statistical estimation techniques applied for the random effects case provide asymptotically biased estimators of the parameters of interest.
As an alternative, we will use similar instruments to those already used in Section 3. More precisely, we will distinguish between the so-called profiling techniques and the differencing methods. Finally, as in the fully fixed effects nonparametric case, we will also consider the problem when the nonparametric object is of high dimension and hence some restriction of additivity is needed to cope with the curse of dimensionality. Profiling techniques in this context have been introduced in Su and Ullah (2006b) and Zhang et al. (2011). On the other side, differencing techniques have been originally proposed in Baltagi and Li (2002) and Qian and Wang (2012). The former paper proposes to estimate m(·) using series estimators whereas the latter use marginal integration techniques. Finally, we analyze the proposal of Ai et al. (2014) to ameliorate the dimensionality problem related to the explanatory variables through the estimation of an additive version of the semiparametric regression model as in (68). In Su and Ullah (2006b) it is proposed to profile both the heterogeneity term and the nonparametric part to consistently estimate the parameter vector β. Let Y = (Y 11 , . . . , Y N T ) be an N T × 1 vector and X = (X 11 , . . . , X N T ) a matrix of N T × d dimension. Furthermore, let μ 0 = (μ 2 , . . . , μ N ) be an (N − 1)-dimensional vector and D d = (I N ⊗ ı T )d an N T × (N − 1) dimensional matrix, where d = (−ı N −1 I N −1 ) is an N × (N − 1) matrix, the standard locally weighted linear least squares regression to estimate the quantities of interest in (100) can be written in matrix form as is a kernel function, |H | is the determinant of the bandwidth matrix H and Z z is an N T × (1 + q) matrix of the form, The above exposition suggests as estimators for m(z) and D m (z) = vec(∂m(z)/∂z ), m(z; H ) = γ 0 and D m (z; H ) = γ 1 , respectively, where s(z) = e 1 S(z) for S(z) = (Z z K z Z z ) −1 Z z K z , and e = (1, 0 q ) is a (1 + q) × 1 selection matrix. Then, by replacing (136) in the following optimization problem: the minimizers of (137) can be written as where . . , s(Z it )) being a T × T smoothing matrix. Note that by identification conditions μ 1 = − N i=2 μ i . Under some standard conditions, in Su and Ullah (2006b) it is obtained the asymptotic distribution of these estimators as N → ∞ and T is fixed, Zhang et al. (2011) an empirical maximum likelihood estimator for β is proposed. This estimator is of the same form as the so-called feasible profile likelihood estimator in Su and Ullah (2006a). In the former paper, and based in the first-order conditions of (138), it is proposed the following auxiliary random vector to meet E(η i (β)) = 0 when β is unknown: where with the constraints To find the values ρ 1 , . . . , ρ N , we maximize the log-likelihood function (142) subject to the constraints (143) to (145). Using the Lagrange multiplier method, we obtain and substituting (146) into (142) we obtain where λ(β), the Lagrange multiplier, is determined by Now, we define the value of β that maximizes (147) as the maximum empirical likelihood estimator (MELE) of β, that is, so from this expression they obtain Note that as it is pointed out by these authors, the MELE β MELE is identical to the profile likelihood estimator β FPL in Su and Ullah (2006b). In addition, following standard conditions and similar definitions as in Su and Ullah (2006b), in Zhang et al. (2011) as N tends to infinity, where An alternative approach to the so-called profiling methods are the differencing techniques. Using the first differences transformation in (100) we obtain where m(Z it , Z i(t−1) ) = m(Z it ) − m (Z i(t−1) ). Estimation of β, as it is suggested in Li and Stengos (1996), can be implemented by conditioning (152) Then subtracting (153) to (152) we obtain, where β can be estimated using the standard FOLS technique, where conditional expectations are replaced by conventional nonparametric estimators (see (108)-(111) for details). However, as it is pointed out in Baltagi and Li (2002) this technique presents some weaknesses. On the one hand, taking conditional expectations on (Z it , Z i(t−1) ) implies having to deal with the curse of dimensionality problem. In that case, it is necessary to estimate the nonparametric regression of Z i(t−1) ) by the kernel method. This estimator has to be defined on IR 2q rather than IR q . On the other hand, and although these authors suggest how to estimate m(Z it , Z i(t−1) ), they ignore the additive structure of (152) and do not provide a nonparametric estimator for m(Z it ). In this framework, Baltagi and Li (2002) develop an estimation method based on the series approach, which enables us to impose the additive structure characteristic of first differences regression models and propose a nonparametric estimator for m(Z it ). Alternatively, in Qian and Wang (2012) a method based on marginal integration techniques to provide an estimator for this smooth function allowing for the presence of some endogenous explanatory variables is presented.
If m(·) is a two times differentiable function twice differentiable in the interior of its support A, being A a compact subset in IR q , and E[m (Z i(t−1) ) belongs to the class of additive functions M (m ∈ M). Then, with the aim of taking into account the restriction that both additive functions share the same functional form, Baltagi and Li (2002) propose to approximate m(z) through the series ρ L (z) of L × 1 dimension, where L = L(N ).
Note that, as Baltagi and Li (2002) emphasize, the approximation function ρ L (z) has to meet a number of special features for the series method that can be summarized in the following: (i) ρ L (z) ∈ M, (ii) as far as L increases, there is a linear combination of ρ L (z) that may approximate any m ∈ M arbitrarily well in mean square error.
In this way, ρ L (z) approximates m(z) and For any scalar or vector function W (z), E M (W (z)) denotes an element which belongs to M and that is the closest function to W (z) among all the functions in M. 1) ). For the sake of simplicity, let us define θ (z) = E(X |Z = z) and m(z) = E M (θ (z)) so the expression (152) can be written in matrix form as where Y and M are N T -dimensional vectors with a typical element Y it and m(Z it , Z i(t−1) ), respectively. This is similar to X and v.
Multiplying both sides by P = P(P P) −1 P and subtracting the resulting expression from (156), where Y = P Y = Pγ Y , and γ Y = (P P) −1 P Y . This definition is similar for M, X and v. Thus, the least-squares estimator for β is defined as whereas for the smooth function m(z) they propose m(z) = ρ L (z) γ , where Under standard conditions of the series approach, in Baltagi and Li (2002) it is established that defining . We refer to the appendix in Baltagi and Li (2002) for the proofs of these results. Finally, note that they extend these results to the situation in which endogenous explanatory variables are allowed. As we will see in Section 6, they use a semiparametric regression model with IVs to avoid the endogeneity problem and to provide consistent estimators.
An alternative approach to Baltagi and Li (2002) can be found in Qian and Wang (2012). In this paper, they propose an estimator for the nonparametric component, m(·), that does not suffer from the curse of dimensionality. where Then, (100) can be written as where However, as it has been pointed out before, the estimation of m(·) is cumbersome due to the fact that m(Z it , Z i(t−1) ) is an additive function.
In Qian and Wang (2012) a noniterative method based on the marginal integration technique is proposed. More precisely, they develop a two-step procedure in which they first use conventional multivariate nonparametric techniques such as the Nadaraya-Watson or the local linear regression, and later the function m(·) is obtained through the marginal integration of the previous estimator. Thus, using the local linear regression procedure to estimate m(Z it , Z i(t−1) ), Qian and Wang (2012) propose to solve the following locally weighted linear least-squares problem for α: where z 1 and z 2 are points in the interior of the support of f (·). Let α be a minimizer of (166), the estimator for m(Z it , Z i(t−1) ) is of the form where now K z and Z z are N (T − 1) × N (T − 1) and N (T − 1) × (1 + 2q) matrices, respectively, of the following form: Note that if our interest is the estimation of the partial derivatives of m(·), that is, D m1 (z) = vec(∂m(z 1 , z 2 )/∂z 1 ) and D m2 (z) = vec(∂m(z 1 , z 2 )/∂z 2 ), it would be enough to minimize (166) for γ 0 and γ 1 . Thus, we could propose as estimators for γ 0 and γ 1 , vec( D m1 (z 1 ; H )) = γ 0 and vec( D m2 (z 2 ; H )) = γ 1 , respectively. However, since the objective of these authors is to provide an estimator for the unknown function, m(Z it ), they propose to integrate marginally the estimated function m(z 1 , z 2 ), that is, where q(·) is a predetermined density function.
With the aim of avoiding strict usual identification restrictions of the marginal integration technique, such as the assumption m(z 1 )q(z 1 )dz 1 = 0 proposed in Hengartner and Sperlich (2005), or numerical integration methods such as Simpson's or Trapezoidal rules, Qian and Wang (2012) develop an alternative strategy. In particular, they propose to generate i.i.d. samples of the q(·) distribution as Z * k , for k = 1, . . . , N T , and compute As it is emphasized in Qian and Wang (2012), if N T is large enough m MC (·) approximates considerably well to m(·) and we choose q(·) to be the density function of Z it , the sample version of (167) can be used rather than (168), that is, Under standard conditions of the marginal integration technique, these authors show that the nonparametric estimator (170) behaves asymptotically equal to (168) when q(·) is the density function of Z it , bounded and twice differentiable and when it satisfies m(z 1 )q(z 1 ) = 0. Thus, they obtain where q(z 2 )dz 2 σ 2 = T −1 T t=2 σ 2 t and H m (z 1 ) is the Hessian matrix of m(·) evaluated at z 1 . Analyzing in detail these asymptotic results, in Qian and Wang (2012) it is pointed out that if Z it is i.i.d. across t, as well as for i, and q(·) = f (·), the asymptotic variance takes the conventional form T |H | 1/2 f (z 1 ) −1 . In addition, when Z it is accurately predictable by Z i(t−1) the conditional density function f (z 1 |z 2 ) is close to zero, except in a small neighborhood of z 2 , and this method can fail. Finally, note that if m(z 1 , z 2 ) is estimated using the Nadaraya-Watson kernel smoothing the asymptotic variance remains without change but the asymptotic bias is different. See Qian and Wang (2012) for further details.
However, note that despite the great advantages offered for the empirical analysis by the procedures above, the dimensionality problem characteristic of the nonparametric models is unsolved. As we have stated previously, when the dimension of the nonparametric component is large we have to deal with the curse of dimensionality. In order to avoid the slower rates of convergence of these nonparametric estimators, a possible solution is to analyze an additive alternative expression for m(·). More precisely, substituting (68) into (100) we obtain, Y it = X it β + m 1 (Z 1it ) + · · · + m q (Z qit ) + μ i + v it , i = 1, . . . , N ; t = 1, . . . , T where now m(·) = (m 1 (·), . . . , m q (·)) is a vector of unknown functions to estimate and the remaining components are defined as in (100). In this context, in Ai et al. (2014) it is proposed to combine the polynomial spline series approximation with the profile least-squares procedure to obtain a semiparametric least-squares dummy variables (SLSDV) estimator for the parametric component, and a series estimator for the nonparametric term. Under very weak conditions, these authors show that the SLSDVs estimator is asymptotically normal and the series estimator achieves the optimal rate of convergence of the nonparametric regression. Later, with the aim of obtaining estimators that exhibit the oracle efficiency property, a two-step local polynomial procedure is developed based on a series method that makes it possible to impose the additive structure of the m(·) function. Since the nonparametric smoothing spline technique is beyond the scope of this study, we refer to Ai et al. (2014) for a detailed analysis of the proposed procedure and the study of the main asymptotic properties of the resulting estimators.

Semiparametric Panel Data Models with Endogeneity
As we have already remarked in the fully nonparametric case, there exists many applied problems where it is necessary to include lagged dependent variables as explanatory variables. Furthermore, the presence of endogeneity is also frequent in applied econometrics. In order to solve these problems in most part of cases it is common to use IVs techniques. For example, in Li and Stengos (1996) and Baltagi and Li (2002) the estimation of partially linear dynamic panel data models using IVs methods is considered.

Endogenous Partially Linear Panel Data Models with Random Effects
Consider the partially linear panel data model introduced in Section 4 with the random effects specification. Instead of assuming E(υ it |X it , Z it , μ i ) = 0, we are willing to assume only E(υ it |Z it , μ i ) = 0. In this context, in Li and Stengos (1996) an IV technique that follows the proposal in Robinson (1988) is developed. Thus, these authors use a kernel estimation method with the aim of removing m(·) before proposing an estimator for β. Taking conditional expectations given Z it in both sides of (100) and subtracting the resulting expression from (100), the regression model to estimate is and assuming there is a vector of instruments, W it ∈ IR d , which holds E( it |Z it , W it ) = 0 and E(X it W it ) = 0, the endogeneity problem can be avoided using the IV approach, that is, where W and X are N T × d matrices whose typical row element is W it = W it − E(W it |Z it ) and X it = X it − E(X it |Z it ), respectively, whereas Y is an N T -dimensional vector whose typical row element is However, as in Li and Ullah (1998), the conditional expectations E(W it |Z it ), E(X it |Z it ), and E(Y it |Z it ) are some unknown terms that can be replaced with their consistent estimators, that is, W it , X it , and Y it , respectively, to obtain feasible IV estimators for the parametric component of (100). Thus, in Li and Stengos (1996) the following feasible IV estimator is proposed: where and X it , Y it , and f h (Z it ) are defined as in (109), (110), and (111), respectively.
As it has been pointed out in Section 4, in order to avoid the technical difficulties owning to the random denominator, f it , we trim out again small values of f it . Then, for a constant b > 0, we define I it = 1(| f i | > b), where 1(·) is the usual indicator function. Therefore, the feasible IV estimator for β can be written as Under some standard regularity conditions, these authors provide the following asymptotic distribution for this IV estimator: where = T −1 T t=1 E( W it X it f 2 it ) and = T −2 T t=1 T s=1 E( it is W it W is f 2 it f 2 is ). Similarly, following this procedure and allowing for the presence of lagged dependent variables in the vector X it , in Baltagi and Li (2002) an alternative estimator that pays special attention to the choice of the instruments in order to avoid the existence of weak IVs is proposed. Finally, note that in both studies the authors leave the estimation of the nonparametric component for future research.

Endogenous Partially Linear Panel Data Models with Fixed Effects
Considering a partially linear model as in (100) that fulfills all conditions established in Section 5. Furthermore, as in Section 6.1, instead of assuming E(v it |X it , Z it , μ i ) = 0, we are willing to assume only E(v it |Z it , μ i ) = 0. Then, in order to avoid the incidental parameters problem, first differences are used in (100). Taking as a benchmark the technique developed in Li and Stengos (1996), in Qian and Wang (2012) it is proposed to estimate the linear component β in a regression model such as Assuming there is a vector of instruments, W it ∈ IR d , and replacing the unknown parameters E( Y it |Z it , Z i(t−1) ), E( X it |Z it , Z i(t−1) ), and E( W it |Z it , Z i(t−1) ) by their consistent estimators, the feasible IV estimator is of the form (179) where I it is defined as in (113), and X it and W it are defined as in (180). Note that this technique makes it possible to avoid the random denominator problem usual in the nonparametric estimation of the regression model, but at the cost of having to define the resulting estimator on IR 2q rather than IR q . Adapting the assumptions in Li and Stengos (1996) and imposing that f h (Z it , Z i(t−1) ) is a bounded density function and at least first-order partially differentiable with a remainder term that is Lipschitz-continuous, in Qian and Wang (2012) that as N tends to infinity the following is obtained: Z i(t−1) ) and f is = f h (Z is , Z i(s−1) ).

Conclusion
In this paper, we have made an intensive review of the recent developments for semiparametric and fully nonparametric panel data models that are linearly separable in the innovation and the individual-specific term. We have analyzed these developments under two alternative model specifications: fixed and random effects panel data models. More precisely, in the random effects setting we have focused our attention in the analysis of some efficiency issues that have to do with the so-called working independence condition. In the fixed effects setting, to cope with the so-called incidental parameters problem, we have considered two different estimation approaches: profiling techniques and differencing methods. We have been also interested in the endogeneity problem and in the use of IVs in this setting. In addition, for practitioners, we have also shown different ways of avoiding the so-called curse of dimensionality problem in pure nonparametric models. In this way, semiparametric and additive models appear as a solution when the number of explanatory variables becomes large. Note that Su and Ullah (2011) and Chen et al. (2013) focus on similar models, although in this case we include the most recent results and pay special attention to the so-called incidental parameters problem as well as with endogenous explanatory variables.