Assessing skewness, kurtosis and normality in linear mixed models

Linear mixed models provide a useful tool to ﬁt continuous longitudinal data, with the random e ﬀ ects and error term commonly assumed to have normal distributions. However, this restrictive assumption can result in a lack of robustness and needs to be tested. In this paper, we propose tests for skewness, kurtosis, and normality based on generalized least squares (GLS) residuals. To do it, estimating higher order moments is necessary and an alternative estimation procedure is developed. Compared to other procedures in the literature, our approach provides a closed form expression even for the third and fourth order moments. In addition, no further distributional assumptions on either random e ﬀ ects or error terms are needed to show the consistency of the proposed estimators and tests statistics. Their ﬁnite-sample performance is examined in a Monte Carlo study and the methodology is used to examine changes in the life expectancy as well as maternal and infant mortality rate of a sample of OECD countries.


Introduction
Linear mixed models are often used to study intra-group correlation patterns.They have attracted considerable attention, e.g., in biomedical, social and economic sciences.For mathematical tractability, a common assumption in these models is that the random effects and the error terms are normally distributed.When this is the case, it is well known that maximum likelihood estimation (MLE) and restricted maximum likelihood estimation (RMLE) perform quite well; see, e.g., [2,5,6].However, conclusions derived under these restrictive assumptions may not be robust to departures from Gaussianity, especially when data show multimodality and skewness; see, e.g., [16].
The impact of misspecification in the random effects distribution has been extensively investigated in the literature; see, e.g., [9,10,15,16].However, there seems to be no general consensus about its effect and the proposed alternatives are restricted to specific mixed models.Formal tests to detect mixture distributions in the random effects have also been proposed.For example, [7] provides a goodness-of-fit test for both random effects and error terms, but the statistic does not have an exact χ 2 distribution and it is a bit cumbersome to implement in practice.More recently, [11] proposed to check the normality of the random effects using gradient functions.
Therefore, it is of practical interest to develop a simple test that enables to check the normal distribution assumption of both random effects and error terms.This is not an easy task, however, because in linear mixed models the lack of Gaussianity can arise in more than one component of the regression error.The identification of which component is causing the departure from normality is then crucial.This paper is concerned with the efficient estimation and testing of linear mixed models when standard distributional assumptions such as normality or symmetry cannot be justified.Specifically, some very simple and intuitive tests to detect departures from normality in the form of skewness and/or kurtosis are proposed based on moment conditions, for which estimators of the higher order moments are necessary.
Note that despite the great importance of higher order moments for statistical inference, there are few references in the literature that define and study estimators for higher than second order moments.Two examples are [3,8] but the methods they advocate exhibit some weaknesses: the first is not easily extended to multivariate problems, and the second is not valid if symmetry fails in practice.To overcome this situation, [14] developed an alternative method that provides consistent estimators of higher order moments.However, their technique is somewhat problematic for the third and fourth moments.In these cases there are several choices of estimating equations that can provide consistent estimators, so finding efficient ones becomes a critical issue.
Thus, the aim of this paper is twofold: (i) to develop a new approach leading directly to efficient estimators of the higher order moments of the error term, thereby solving the efficiency issue in [14]; (ii) to propose tests for detecting departures from normality for both random effects and error terms, based on a proposal made by [4] in the context of longitudinal models.The method proposed here has the following promising features.First, it results in closed form expressions for the estimators of the higher order moments, including the third and fourth order moment.Second, using generalized least squares (GLS) residuals, no further distributional assumptions on either random effects or error terms are needed to show the consistency of these estimators.Third, the proposed tests enable one to identify departures from normality in the form of skewness and/or kurtosis of each component of the regression error, jointly or separately.Fourth, the estimators of the higher order moments allow one to study distributional properties of the estimators through their Fourier transform.To the best of our knowledge, all these results are original.For methodological reasons, we restrict our attention to the linear mixed model, but the underlying ideas are applicable to handle non-linear, semi-parametric or nonparametric models as well.
Finally, in order to illustrate the feasibility and possible gains of the proposed method, a Monte Carlo study is conducted to assess the finite-sample performance of our proposed estimators and test statistics.As a concrete example, an empirical study based on data from the Organization for Economic Co-operation and Development (OECD) is carried out to measure and assess the importance of various factors determining two commonly accepted measures of health care outcomes: life expectancy and infant mortality.
The rest of the article is organized as follows.In Section 2, we introduce the linear mixed model and describe the estimation method.In Section 3, the corresponding asymptotic properties are studied.In Section 4, we derive some tests to detect departures from normality in the form of skewness or kurtosis, and we study their asymptotic properties.Section 5 contains some simulation results and an empirical application to illustrate the usefulness of the method.Section 6 presents our main conclusions.All proofs are collected in the Appendix.

Statistical model and estimation procedure
Assume that data are available from a linear mixed model of the form where i ∈ {1, . . ., n} and j ∈ {1, . . ., i } denote the group index and the measurements within this group, respectively, i is the sample size within group i, and n is the number of subjects.Also, y i j is the response variable corresponding to the jth observation in the ith group, while x i j is a p × 1 vector of covariates.Further, the relation between x i j and y i j described in Eq. ( 1) contains an intercept parameter α, a p × 1 fixed effect parameter vector β, and some random effects b i , all of which are unknown.All these quantities are also perturbed by random errors i j .Throughout this paper, it is assumed that i j and b i are independent and identically distributed (iid), and i j is independent of all b i and x i j for all i and j.Without loss of generality, we further assume that the expectations of the random effects and errors are zero so that β is identifiable.Otherwise, unknown nonzero expectations can be incorporated in the intercept.
For each k ∈ {2, . . ., 8}, let ) be the kth moments of the unobserved random effects and the idiosyncratic errors, respectively.This paper is concerned with the efficient estimation and testing of these unknown terms.Since these estimators are based on a suitable combination of the GLS residuals of (1), in this section we first obtain the estimators of the parameters of interest, β and α, and later we focus on the estimation of γ k b and γ k .In all our developments, we are unwilling to impose any condition on the statistical relationship between b i and the covariates of the model.Rewriting the regression model (1) in vectorial form, we have where for any i ∈ {1, . . ., n}, v i is a composed error term, y i = (y i1 , . . ., y i i ) , v i = (v i1 , . . ., v i i ) , and i = ( i1 , . . ., i i ) are i -dimensional vectors for subject i, x i = (x i1 , . . ., x i i ) is an i × p dimensional matrix, and ı i is a unitary vector of length i .
In order to obtain consistent estimators for β, a standard solution is to sweep out b i by the i × i idempotent transformation matrix where I i is a i × i identity matrix.By regressing Q i y i on Q i x i , the ordinary least squares (OLS) estimators for β are obtained as in [14].However, as pointed out by [13], the OLS estimators are the best linear unbiased estimators (BLUEs) under model (3), but not under the original model (2) since they ignore the correlation within units.As it is well known, for all j, the variance-covariance matrix of the composed error term v i of Eq. ( 2) is of the form . Following [12], the square root of the inverse matrix of With the aim of obtaining spherical disturbances, we premultiply (2) by V −1/2 i , which leads to the expression Note that the resulting estimator from ( 4) is infeasible since V −1/2 i depends on some unknown terms, i.e., γ 2 and γ 2 b .Therefore, in order to get a feasible OLS estimator of β, this covariance matrix must be estimated.Based on the spectral decomposition of Ω i , we propose to replace the unknown covariance components by their consistent estimators, i.e., γ 2 and γ 2 b , that are computed using the methods proposed in the following subsection.Then, assuming for a moment that γ 2 and γ 2 b are given, the generalized least squares (GLS) estimators have the form where x is an i × p matrix and ÿi = y i −ı i y is an i -dimensional vector defined in terms of where N = 1 + • • • + n is the overall sample size, As stated in Section 3, under weak conditions on the group sizes and the design variables, β is the BLUE estimator for β under the true model.Therefore, efficient estimators for the kth order moments of the errors can be proposed using the GLS residuals.
Since we aim at considering much more general situations than those covered by [3] and, at the same time, we want to address the efficiency issue of [14], a variant of the fundamental lemma presented by the latter is proposed.In particular, instead of looking for a suitable combination of the polynomial functions of v i j = b i + i j as they propose, this alternative approach is based on the mean deviation error term of the form i j = i j − i• , where i In order to illustrate the proposed technique, we define the following nonlinear function for each m ∈ {1, . . ., k}: The following lemma can be used to derive estimating equations for γ k = E( k i j ) .
This lemma is slightly different from the one obtained in [14], but it still maintains its usefulness since many of the terms in the expansion of f k m (i) will vanish when we take expectations.In addition, it is very useful from an empirical point of view.The fundamental lemma presented in [14] provides several polynomial functions of the error term, the f k m (i)'s, that have to be combined in a proper way in order to obtain efficient estimates for γ k .The transformation proposed here addresses this efficiency issue since it enables one to obtain unique f k m (i)'s that provide efficient estimators for each γ k .Also, note that this proposal can be easily extended to longitudinal, semi-parametric and nonparametric models with additive structure.

Estimation of second order moments
Focus first on the estimation of the second order moments of the random effects and error terms.From Lemma 1, we obtain the following two polynomial functions of the error Upon rearranging terms, we find f 2 1 (i) = 0 and E{ f 2 2 (i)} = ( i − 1)γ 2 .Averaging over all i ∈ {1, . . ., n} and replacing the unknown i j by the residuals i j , we see that the estimator of γ 2 is of the form , In addition, since E(v i j v ih ) = γ 2 b , when j h, and E(v 2 i j ) = γ 2 b + γ 2 , the estimator for γ 2 b can be written as where v i j = y i j − α − x i j β.

Estimation of third order moments
In this section we consider the estimation of higher order moments of the random error terms with minimal variance.As before, using Lemma 1 for the third order moments of the idiosyncratic error term, we obtain Rearranging terms, we find f 3 2 (i) = 0 and f 3 1 (i) = 0.Then, unlike what happens in [14], Lemma 1 provides a unique polynomial function of the error term that leads to an estimating equation from which the estimator for γ 3 can be directly obtained.Taking expectations on f 3 3 (i), we get from which the estimator for γ 3 has the form For the estimation of γ 3 b , note that E(v i j v ih v i ) = γ 3 b when j h and h .In this situation, the estimator for γ 3 b can be written as

Estimation of fourth order moments
Finally, for the estimator of the fourth order moments of the error terms, we get from Lemma 1 that ) = 0 and f 4 1 (i) = 0, after rearranging terms.Lemma 1 then yields a unique expression for the estimation of the fourth order moment of the idiosyncratic error term.Taking expectations on f 4 4 (i), we find so that the corresponding estimator has the form Finally, for the fourth order moment estimator of the random effects, we propose where γ 2 and γ 2 b are the consistent estimators for γ 2 and γ 2 b , respectively, obtained previously.

Asymptotic properties
To study the asymptotic properties of the proposed GLS estimators, we first give a set of regularity conditions.
Comparing the results of this theorem with the corresponding in Theorem 2.1 in [14], it can be emphasized that a BLUE estimator for the true model can be obtained taking quasi-temporary differences instead of using a mean deviation transformation as those one in (3).More precisely, if γ 2 b is relatively smaller than γ 2 , it results that λ 1 and the OLS and GLS estimators will be very different.In fact, β GLS will be more efficient than OLS since its variance-covariance matrix will be smaller.In the opposite case, when λ 1, both estimation procedures will provide similar results.
In what follows we focus on the properties of γ k and γ k b , for k ∈ {2, 3, 4}.Theorem 2. Under Assumptions 1-4, when γ 4 and γ . Theorem 2 reveals that although the estimators that we propose here for γ 2 and γ 2 b are a bit different from those obtained in [14], their asymptotic properties are exactly the same.This result will be corroborated in the simulations.In addition, in Appendix A it is shown that γ 2 and γ 2 b have iid representations, i.e., and In the following theorems, the asymptotic normality of the estimators for the third and fourth order moments of the error term is established.
Based on the results of Theorems 3 and 4 we can conclude that once again the proposed estimators for γ 3 , γ 3 b , γ 4 , and γ 4 b are a bit different from those obtained in [14], but their asymptotic results are similar.In addition, the proposed estimators achieve the corresponding minimum variance and they are as efficient as the moment estimators based on the true unknown terms i j and b i .
To sum up, it can be pointed out that the alternative lemma proposed in this paper effectively provides a unique estimating equation for all higher order moments of the error term even when standard distributional assumptions are violated.Thus, this method enables one to solve the efficiency issue discussed in [14].In principle, one could go further beyond the fourth moment.For practical issues, however, when one wants to study distributional properties of our estimators through their Fourier transform, an expansion involving the first few moments is enough.

Test statistics
With the aim of detecting departures from normality of either random effects or error terms, jointly and separately, in this section we propose tests for skewness, kurtosis and normality based on the previous higher order estimators.First, we focus on the skewness and kurtosis tests and derive their limiting distribution under arbitrary skewness and kurtosis coefficients.Later, we extend the results to the specific normal setting.Finally, we focus on a normality test and develop its limiting properties.

Tests for skewness and kurtosis
When we are interested in testing for skewness in both random effects and error terms, the quantities of interest for each component are , whereas for kurtosis the corresponding quantities are b and γ k be the consistent estimators obtained previously for γ k b and γ k , respectively, when k ∈ {2, 3, 4}.We propose the following statistics to construct tests for skewness whereas to test for kurtosis the proposed statistics are of the form In the following, we focus on the limiting distribution of these statistics under any conjectured coefficients of skewness and kurtosis.
Theorem 5.Under Assumptions 1-4, when γ 6 b and γ 6 are finite, as n → ∞, When the underlying distribution is symmetric, s b = 0 and s = 0.Then, standardizing the above skewness test statistics we obtain the following corollary.
The following corollary can then be exploited to test whether the distribution of the random effects and error terms is mesokurtic.
Corollaries 1 and 2 imply that, under the null hypothesis, the proposed statistics for skewness and kurtosis are asymptotically standard normal.Simple Student t tests can thus be implemented by standardizing the corresponding statistics with the square root of a consistent estimator of their variances.

Test for normality
As pointed out by [1], the above skewness and kurtosis tests are useful since they can be used to test whether the data conform to any distribution of interest.In this section, we propose to extend the normality test proposed in [1] for time series data to a linear mixed model.Then, we are able to detect whether the random effects or the error terms departure from normality, jointly and separately.
Let π b3 and π b4 be the statistics for skewness and kurtosis evaluated under the null hypothesis of normality of the random effects, i.e., s b = 0 and κ b = 3, respectively.Then, the proposed statistic to construct tests for normality of the random effects is . Similarly, let π 3 and π 4 be the statistics for skewness and kurtosis evaluated under the null hypothesis of normality of the error terms, i.e., s = 0 and κ = 3, respectively.The proposed statistic to test for normality of the error terms is Theorem 7.Under Assumptions 1-4 and assuming that γ 8 b and γ 8 are finite, under the null hypothesis of normality, as n → ∞, π 34 χ 2 2 and π b34 χ 2 2 .Under the normality assumption, it can be proved that π 3 and π 4 are asymptotically independent, and similarly for π b3 and π b4 .Thus, the proof of this theorem follows the same lines as the corresponding statements for Theorem 5 and it is therefore omitted.
Looking at the results of Corollaries 1-2 and Theorem 7, it can be pointed out that the statistics proposed in this section have the following appealing features.First, the skewness test statistics are valid even if the null hypothesis is not normally distributed, albeit symmetric.Second, in [1] and [4] it is pointed that one of the main weaknesses of this type of statistics for kurtosis is that it can be influenced by skewness.However, the statistics proposed in this paper are not affected by this problem.Third, the statistics proposed for the random effects are not affected by skewness or kurtosis in the remainder terms.The same holds for the statistics proposed for the error terms.Fourth, the proposed normality tests allow to detect whether one specific term departs from normality.Fifth, all these tests can be computed directly without having to use bootstrap techniques as in [4].The only requirement is to compute consistent estimators up the eight moment of the random effects and error terms, but they are easily obtained using Lemma 1.These consistent estimators are given in Appendix B.

Monte Carlo simulation and application
To study the possible gains of the estimation procedures proposed in this paper, a simulation study was carried out.In Section 5.1, we compare the performance in small samples of these estimators with respect to the ones proposed in [14].As further evidence of the usefulness of the proposed method, Section 5.2 contains an application to a real data set.

Simulation study
Consider the linear mixed model defined, for all i ∈ {1, . . ., n} and j ∈ {1, . . ., i }, by where α = 1 and β = (1, 2) .The group values are randomly drawn from a Poisson distribution with mean 5. Also, the explanatory variables are jointly normal with zero means, unit variances and correlation 0.6.
where N(0, 1) refers to the univariate standard normal distribution, t 8 denotes the univariate Student t distribution with 8 degrees of freedom, and Γ(1, 1) is the univariate Gamma distribution with both scale and shape parameters equal to 1, i.e., a unit exponential distribution.
The following simulation results are based on 1000 samples of data in which the number of subjects is either 50, 100 or 150.The estimated mean, standard deviation (Std) and the root mean squared error (RMSE) of the estimators of the model parameters are listed in Table 1, while the results for the higher order moments are collected in Table 2.For the sake of comparison, in Table 1 the OLS estimators are included in brackets, while in Table 2 it is the estimators of the moments of [14] that are in brackets.
As can be seen from Tables 1-2, the proposed estimators perform quite well.As the sample size n increases, both the estimated standard deviation and the RMSE are lower in the five cases considered.In addition, when we compare the behavior between the proposed estimators in this paper and those obtained in [14], one can note that our estimators perform slightly better in all cases considered.This difference is reduced as the sample size increases, as one might expect from the theoretical results.
5.1.2.Finite-sample performance of the test statistic.In order to assess the performance of the proposed tests statistics, we used the same data generating process as in (10) and explored their effectiveness under the distributional processes (a)-(e).
In Table 3 we report the proportion of rejections over 1000 Monte Carlo replications, and the results are based on either 10%, 5% or 1% nominal size.For the sake of practical implementation, the following Wald tests were used: π 2  3 and π 2 b3 are used to test for skewness, π 2 4 and π 2 b4 for kurtosis, and π 34 and π b34 for normality.Thus, under the corresponding null hypothesis, the skewness and kurtosis statistics have χ 2  1 asymptotic distribution, whereas π 34 and π b34 follow a χ 2 2 distribution.Table 3, case (a), reports the results for the experiment when both random effects and error terms follow a standard normal distribution.Looking at these results, it can be pointed out that all proposed tests show a good empirical size for the different sample sizes.Specifically, tests for skewness in starts with oversized figures for n = 50 (0.132 when α = 0.05) but reduces to 0.126 as the sample size increases.The test for kurtosis starts with undersized values for n = 50 (0.038 when α = 0.05) but the level reaches 0.051 as n increases.Further, the tests for joint skewness and kurtosis achieve the correct empirical size as n increases in a similar way to the test of skewness.Meanwhile, for the random effects it can be seen that all tests are undersized for the smaller sample size, but all of them almost achieve the corresponding level of significance as n → ∞.
In addition, as noted in [1], there are numerous distributions for which tests based on higher order moments are not valid.In order to corroborate this fact, in cases (b)-(e) the performance of the proposed tests was analyzed under different non-Gaussian distributions.In particular, in case (b) we assumed b i ∼ 0.5 t 8 , so the power of the kurtosis tests for this term should increase in a similar way as the sample size.In fact, looking at this result it can be seen that effectively the power of the kurtosis test in b i increases with n, but the tests for skewness and kurtosis in i j achieve the empirical size.An opposite behavior is observed in case (e), where i j ∼ 0.5 t 8 is assumed.These results show that kurtosis in one component does not affect tests for kurtosis in the other error component.
In case (c) we assumed b j ∼ 0.5 Γ(1, 1) − 0.5.As it is well known, this gamma function is not symmetric and presents excess of kurtosis.Then, the tests for skewness and kurtosis for b i should have a non-trivial power.The opposite can be said for case (d), where i j ∼ 0.5 Γ(1, 1) − 0.5 is assumed.As the reader can see, all these results are corroborated in Table 3.
Finally, it can be pointed out that through these Monte Carlo experiments it has been proved that the methodology proposed in this paper enables us to obtain more efficient estimators in finite samples for the higher order moments of both components of the error term which are very useful for the empirical analysis.In addition, the proposed tests enable us to assess whether the data conform to any distribution of interest.In addition, they are not affected by the presence of skewness and/or kurtosis in the other component.

Application to a real data set
In order to illustrate our proposed method, in this section we perform an empirical analysis of the health care delivery system of the OECD countries.Furthermore, we compare our GLS results with the standard OLS estimators.
These data are from the OECD Health Data 2015 and the Education Statistics of The World Bank.With the aim of having enough sample variability for all covariates, we consider 19 OCDE countries, i.e., all OECD nations except Belgium, Canada, Chile, France, Germany, Greece, Ireland, Italy, Korea, Norway, Portugal, Sweden, Switzerland, the Netherlands, and Turkey.After removing the missing observations we obtain a dataset for which we do not have the same number of time observations per country.The total sample size of this study is 225 observations.The variables included in this study are defined in Table 4. Before specifying the structural model of health outcome, we wish to stress issues related to the characteristics of the health care delivery system of the OECD countries.In the last decade, expenditures on health and health care have increased considerably mainly due to medical advancement, population aging, and rising public demand.Specifically, as described in the OECD database, health expenditure as a percentage of gross domestic product (GDP) was around 7.23% in 2000, and this figure has increased in recent years reaching 8.99% in 2013.Nevertheless, in recent years there has been a slight reduction in this growth mainly due to the global economic crisis.In this situation, it is of great interest to be able to identify the relative importance of various factors in key aspects of health outcomes such as life expectancy or maternal and infant mortality in order to develop efficient public policies.Nevertheless, these estimators can lead to unreliable inference in those cases in which life expectancy or maternal and infant mortality follow a skewed distribution.Therefore, testing for skewness or normality is an interesting issue.
Specifically, in this application we propose to assess the impact of economic, institutional, and social factors in the health outcome, approximated through the life expectancy and the infant and maternity mortality rate.With this aim, we consider two different regression models.
In the first regression model, the dependent variable is life expectancy (LIFE), which represents the typical number of years that a person at that age can be anticipated to live, presuming that age-specific mortality levels remain constant.In the second regression model, the dependent variable is the maternal and infant mortality rate (MORT) that encompasses the number of deaths of children under one year of age that occurred in a given year.
Assuming that LIFE and MORT are functions of the countries' medical investment, the quality of the health care system, and the educational level of the population, the regression models to estimate are the following: where i ∈ {1, . . ., 19} denotes the individuals (i.e., countries) and j ∈ {1, . . ., i } is the number of observations per individual.Tables 5 and 6 collect the regression results of life expectancy and maternal and infant mortality, respectively, whereas Table 7 reports the skewness, kurtosis, and normality tests for both regression models.
Analyzing the results in Tables 5 and 6, it can be pointed out that the signs of the parameter estimates of all the variables in the life expectancy and mortality regression are as expected.From the life expectancy point of view, more medical resources (PHYS), a higher level of immunization (INMUNI), a higher health expenditure (HEXP), a higher literacy rate (EDU), and a better access to medical technology (TECH) have a positive impact.Contrary, high in-patient utilization (PATIENT) and high alcohol consumption (ALCO) reduce life expectancy.Also, higher in-patient utilization (PATIENT) and a greater number of children immunized (INMUNI) are related with an increment in the mortality rate, whereas the rest of the explanatory variables of this regression reduce the mortality rate.
In addition, in the first regression all the explanatory variables considered, except ALCO, are statistically significant at the 10% level or better using the statistical test proposed previously.For the second regression, all the explanatory variables, except PATIENT and INMUNI, are statistically significant even at the 1% level.Then, higher supply of physicians and the inversion in medical training and education should be the primary focus of the OECD national health care policy.Further, several differences can be noticed when we compare the results of the OLS and GLS estimators.Specifically, the OLS estimators show more significant coefficients, while the GLS method proposed in this paper is more conservative.
In addition, from Table 7 it is found that the idiosyncratic error term for the life expectancy regression model is symmetric but with large kurtosis, and the joint test for skewness and normality is rejected at the 5% significance level.Meanwhile, for the infant and maternity mortality rate, one finds that i j is symmetric with kurtosis close to the normal value of 3 and the joint test is rejected at the 5% significance level.As to the country level component, in both regression models it is shown to be symmetric with a kurtosis coefficient close to the normal value.Then, based on these results we can conclude that alternative estimation techniques for generalized linear mixed models should be used in order to provide more reliable results.This issue is left for future work.

Conclusion
Linear mixed models provide a useful tool to fit continuous longitudinal data, and the random effects and error terms are commonly assumed to have a normal distribution.However, these restrictive assumptions may result in a lack of robustness against departures from normality and invalid statistical inferences, especially when data exhibit multimodality and skewness.Therefore, it is of practical interest to test for normality.In this paper, we propose a very simple and intuitive test for skewness, kurtosis, and normality based on GLS residuals.This requires the estimation of higher order moments, and an alternative estimation procedure is developed to this end.Compared to other procedures in the literature, our method is very appealing since it results in closed form expressions for the estimators even for the third and fourth order moments.In addition, no further distributional assumptions on either random effects or error terms are needed to show the consistency of the proposed estimators and tests statistics.
Finally, to illustrate the value and practicality of this new methodology, the finite-sample behavior of the proposed estimators and tests statistics has been analyzed through several simulation experiments.A study of the health care delivery system of the OECD countries was also carried out.As a result it was shown that health care expenditures, higher supply of physicians, and the inversion in medical training and education should be priority aspects of the OECD national health care policies because of their positive impact in both lengthening life expectancy and reducing infant mortality.In addition, the test for skewness of the error terms suggests that alternative estimation techniques would provide more reliable results.

Appendix A
Proof of Theorem 1.In order to prove the claims in Theorem 1, we follow the standard proofs scheme as in [14].Fix a ∈ R p .Under the conditions of Theorem 1 we have to show Using the results of Theorem 2, it can be proved that Ω i is a consistent estimator for Ω i .Also, it is easy to show that the jth element of ÿi is y i j − λy i• and similarly for ẍi .Then, the estimator (5) can be rewritten as According to this expression, β − β is a sum of zero-mean independent random vectors.Then, to obtain the convergence in distribution of this estimator it is necessary to check the variance term and Lindeberg's condition.
Hence, the residual term v i j is Using the results of Theorem 1, the higher order terms of (x i j − x i• ) ( β − β) can be ignored.Then, replacing (A2) in ( 6) the estimator for γ 2 b has the form By arguments similar to those used previously, it is straightforward to show that II b , III b , and IV b are independent variables with zero mean and finite variance.Then, we can conclude that n 1/2 II b , n 1/2 III b , and n 1/2 IV b converge in probability to zero.Using these results in (A3) and under Assumption 4 we obtain Proof of Theorem 3. As in the previous proof, using the results of Theorem 1 we can show the higher order terms of (x i j − x i• ) ( β − β) to be negligible.Therefore, replacing v i j with (A2) in ( 7) we obtain Under Assumption 4, the first expression is the leading term since the rest of the elements are negligible and the resulting equation to study is Then, by the Central Limit Theorem, the first part of Theorem 3 is proved.On the other hand, for γ 3 , we can use the previous arguments.Then, ignoring once again the higher order terms of (x i j − x i• ) ( β − β), we can write Then, by the expansion of 3  i j the expression to study is So it is proved that after centering 3 i j , Finally, the conclusion for γ 3 now readily follows from the Central Limit Theorem and Theorem 3 is proved.
Proof of Theorem 4. For γ 4 b we need arguments similar to before.Using the results of Theorem 1 we neglect again all terms that contain (x i j − x i• ) ( β − β).Then, replacing v i j with (A2), we get that the expression to study, after rearranging terms, is Based on the previous results for γ 2 and γ 2 b , it is easy to show γ 2 p − → γ 2 and γ 2 b p − → γ 2 b .Using these results and the assumptions of the theorem, after rearranging terms, we find that the equation to analyze is and by the Central Limit Theorem the first part of this theorem is proved.Finally, for γ 4 , it can be shown that Then, after centering we get and the proof of Theorem 4 follows from the Central Limit Theorem.
and rearranging the expression to analyze leads to , where α = (1, −3γ 2 , −3s γ /2) and s i j = ( 3 i j − γ 3 , i j , 2 i j − γ 2 ) are 3 × 1 vectors.Under the assumptions of Theorem 5, it is easy to show that E( i j s i j / √ N) = 0 and its variance-covariance matrix is of the form This concludes the proof of Theorem 5.
Proof of Theorem 6. Focus on the asymptotic properties of the kurtosis statistics and insert (A6) and (A8), for k = 4, into (9).Then, κ can be written as and rearranging terms leads to , where β = (1, −4γ 3 , −2κ γ 2 ) and κ i j = ( 4 i j − γ 4 , i j , 2 i j − γ 2 ) are 3 × 1 vectors.Again, under the assumptions of Theorem 6 it is straightforward to show that E( i j κ i j / √ N) = 0 and its variance-covariance matrix is Then, using the results of Theorem As before, under the assumptions of this theorem, we have E( i κ b i / √ n) = 0 whereas its variance matrix is of the form

Appendix B
In this Appendix we collect the proposed estimators for the higher order moments up to the eight moment of both the random effects and the error terms.As before, using Lemma 1 and rearranging terms the proposed estimators for the error term are of the form

Lemma 1 .
For any a, b ∈ R, write a ∧ b = min(a, b) and a ∨ b = max(a, b).We have

)
Using the result of Theorem 2, γ 2 p − − → γ 2 , and by the Central Limit Theorem the first part of this theorem is proved.Similarly, s b can be written as√ n( s b − s b ) = α b + o P (1),whereα b = (1, −3γ 2 b , −3s b γ b /2) and s b i = (b 3 i − γ 3 b , b i , b 2 i − γ 2 b ) are 3 × 1 vectors.Again, under the assumptions of Theorem 5 E( n i=1 s i / √ n) = 0 whereas its variance-covariance matrix is 4 and the Central Limit Theorem, we can conclude the proof of the first part of the theorem.Finally, focusing on the behavior of κ b and inserting (A5) and (A8), for k = 4, into (9), we find that the expression to analyze in closed form is√ n ( κ b − κ b ) = β b β b = (1, −4γ 3 b , −2κ b γ 2 b ) and κ b i = (b 4 i − γ 4 b , b i , b 2 i − γ 2 b ) are 3 × 1 vectors.19

.
The conclusion for κ b follows easily from the Central Limit Theorem and Theorem 4. This completes the proof of Theorem 6.

Table 3 :
Finite-sample performance of skewness, kurtosis and normality tests in cases (a)-(e).

Table 5 :
Estimated results for life expectancy

Table 6 :
Estimated results for maternal and infant mortality Significant at the 10% level.* * Significant at the 5% level.* * * Significant at the 1% level.

Table 7 :
Estimated results for maternal and infant mortality Significant at the 5% level.* * * Significant at the 1% level.Note: Using the results of Corollaries 1-2 and Theorem 4, the proposed test statistics follow a chi-square distribution even when standard distributional assumptions for the disturbances do not hold.