A regularized maximum likelihood estimator for the period of a cyclostationary process

We derive an estimator of the cycle period of a univariate cyclostationary process based on an information-theoretic criterion. Transforming the univariate cyclostationary process into a vector-valued wide-sense stationary process allows us to obtain the structure of the covariance matrix, which is block-Toeplitz, and its block size depends on the unknown cycle period. Therefore, we sweep the block size and obtain the ML estimate of the covariance matrix, required for the information-theoretic criterion. Since there are no closed-form ML estimates of block-Toeplitz matrices, we asymptotically approximate them as block-circulant. Finally, some numerical examples show the good performance of the proposed estimator.


I. INTRODUCTION
Cyclostationary (CS) processes [1], [2] are commonly encountered in science and engineering since they can be used to model periodic phenomena in climatology, meteorology, astronomy, economics, medicine, mechanics and communications, among others. A process is said to be CS if its statistical properties vary periodically. Since we will be considering zeromean processes and second-order cyclostationarity, only the covariance function is assumed to be periodic in its global time variable.
Most techniques that exploit cyclostationarity assume that the cycle period is known a priori. For instance, this is a typical assumption for detectors of cyclostationarity [3]- [6], beamforming [7] and many other applications (see [2] and references therein). In practice, however, the cycle period must usually be estimated from the measurements.
Some estimators of the cycle period have been proposed in the past. The authors in [8] obtained the estimate from the Loève spectrum [9]. The estimator is based on a result in [10], which states that the support of a discrete CS process in the dual-frequency domain is composed by a finite number of parallel lines, which are separated by the cycle frequency. Hence, this estimator obtains the cycle period by estimating this separation. The authors in [11] proposed an estimator based on a measure, named variability. Basically, this technique obtains the value of the variability for different cycle periods and selects the period that maximizes it. However, the variability-based estimator does not work with the so-called ill cyclostationary signals, which includes many practically relevant signals, such as a communications signal with rectangular shaping. Finally, another estimator based on the CS detector in [4] was proposed in [11]. This estimator computes the detector statistic of [4] for different cycle periods and picks the best one. This principle of searching a detector statistic over candidate values of the cycle period may also be used with other detector statistics such as that of [5], [6].
All these techniques provide ad-hoc CS period estimators suited for specific scenarios, but they are not based on sounded estimation approaches such as maximum likelihood (ML), maximum a posteriori probability or minimum mean square error, etc. To fill this gap, in this work, we obtain the ML estimator for the cycle period of a CS process. The main tool we use is the relationship between the scalar-valued CS process and a vector-valued wide-sense stationary (WSS) process, obtained by vectorizing the CS time-series with a vector length given by the cycle period [10]. Since the cycle period is unknown, we must sweep the block size (vector length) to find the optimal value. Asymptotically, this seemingly complex search is actually a simple search in the frequency domain.
Assuming Gaussianity and for a given block-size, we must obtain the ML estimate of the covariance matrix, which is block-Toeplitz. Since there are no closed-form solutions for the ML estimate of Toeplitz matrices [12], we asymptotically approximate it as block-circulant. It was shown in [13] that this likelihood converges in mean square to the likelihood with the block-Toeplitz matrix. With the ML estimate of the block-circulant matrix for a fixed block size, we must sweep the block size to find the ML estimate of the cycle period. However, as the value of the investigated cycle period increases, the number of parameters to estimate also increases, thus yielding a larger likelihood. It is therefore necessary to add a penalty term. Following [14], we propose to use an information theoretic criterion to penalize large values of the cycle period. Specifically, we used the minimum description length criterion [15], which provides us a better estimate than the ML approach, and, as we will see, this estimator outperforms previously proposed estimators.

II. PROBLEM FORMULATION
We would like to estimate the cycle period, P , of the cyclostationary (CS) time series u[n]. We assume that u[n] is a scalar-valued and zero-mean process, and the cyclostationarity is exhibited in the covariance function, i.e., where the period P is an unknown natural number. Moreover, we also assume that P ≥ 2, since the case P = 1 corresponds to wide-sense stationarity.
To derive an estimator based on an information-theoretic criterion we first need the ML estimator. In this section, following our previous work in [3], we formulate the problem in a way that makes this possible. We assume that u[n] is proper complex Gaussian with zero mean.
We first assume that the cycle period is known. Thus, let us arrange u[n] in blocks of size P , yielding the vector-valued time series which is WSS [10]. We have therefore transformed a scalarvalued CS process into a vector-valued WSS process. We can now stack N realizations of x[n] into the vector which is a stack of N P samples of u[n], i.e., The former expression simplifies the derivation of the covariance matrix of y, which is [3], . The covariance matrix R ∈ C N P ×N P is a block-Toeplitz covariance matrix with block-size P . The observations are distributed as

III. DERIVATION OF THE ESTIMATOR
In this section we derive an estimator for the cycle period P based on an information theoretic criterion. First, we obtain the ML estimator and show that it cannot be directly applied since a larger P implies more degrees of freedom, and, therefore, a larger likelihood. Hence, we penalize the likelihood following an information theoretic approach, similar to that in [14].

A. Maximum Likelihood Estimator
To obtain the ML estimator we proceed as follows: For a fixed P we obtain the ML estimate of the covariance matrix R. Then, we sweep P and select the value that maximizes the (compressed) likelihood. Assuming M independent and identically distributed realizations of y, {y m } M −1 m=0 , the loglikelihood of the observations is given by where the sample covariance matrix iŝ We first obtain the ML estimate of R for a fixed P . That is, we need to obtain the ML estimator of a block-Toeplitz covariance matrix with block size P . However, it was shown in [12] that there is no closed-form solution for the ML estimate of Toeplitz matrices. We must therefore follow a different approach, which is based on the following theorem.
Theorem 1: As the number of samples N tends to infinity, the log-likelihood in (1) converges in the mean square sense to a log-likelihood with a particular block-circulant covariance matrix, i.e., where C ∈ C N P ×N P is the block-circulant covariance matrix obtained as Here, F N is the Fourier matrix with elements given by [F N ] lm = e −j2πlm/N / √ N , and V is a block-diagonal matrix, whose kth block is given by the discrete Fourier transform (DFT) of the covariance sequence where θ k = 2πk/N , with k = 0, . . . , N − 1.
Proof: The proof is very similar to that in [13].
Taking Theorem 1 into account, we may transform the observations as z = (F N ⊗ I P ) H y, which is simply the stack of N frequencies of the discrete Fourier transform of x[n]. Thus, the random vector z is asymptotically distributed as a proper complex Gaussian with zero mean and covariance matrix V. The covariance matrix is block-diagonal, and it is therefore relatively easy to estimate. In the following lemma, we present the ML estimator.
Lemma 1: The ML estimate of V is given bŷ where diag P (Ŝ) is the operator that builds a block-diagonal matrix with block sizes P from the blocks on the diagonal of S, which is the sample covariance matrix of the observations z m , i.e., .
Based on the previous lemma, the asymptotic compressed likelihood is where P max is the maximum period that is checked.
We now show why the ML estimator does not directly solve this problem. Consider an example with P = 5, N = 128 and M = 25. As we may see in Figure 1a, there are clear peaks at the cycle period and its multiples (5, 10, 15, . . .). However, there is also increasing trend with P , which precludes the use of the maximum as the estimator of the period. This trend may be explained as follows: Keeping the size of the covariance matrix fixed, which we denote by T , a larger P leaves a smaller N , since T = N P . Therefore, the covariance matrix has more structure as P increases. As we will see later, the degrees of freedom vary between T and T 2 , for P = 1, N = T and N = 1, P = T , respectively. Here, N = 1 is the extreme case, where the covariance matrix is no longer a block-Toeplitz (or block-circulant) matrix, but just a positive definite matrix. Hence, a penalty term is necessary.

B. Minimum Description Length
We propose to use the minimum description length (MDL) criterion [15] to overcome the overfitting problem. Other criteria, such as the Akaike information criterion [17], could also be applied.
The MDL criterion for our problem is given by where ν is the number of degrees of freedom of the model, which depends on P . We now count the number of degrees of freedom. One may be tempted to count the number of degrees of freedom of a block-Toeplitz covariance matrix. However, we are actually estimating the unknown parameters of a blockcirculant matrix, that is, the elements in the block-diagonal matrix V. For a given P , the matrix V is composed by N blocks of dimension P × P . Moreover, since V is Hermitian without further structure, each block is also Hermitian without further structure. Thus, the number of degrees of freedom of each block is where the first P comes from the P real elements in the main diagonal of the kth block, and the second term is two times the number of complex elements above the main diagonal. Thus, since there are N blocks, the total number of degrees of freedom is ν = N P 2 . Plugging ν and (4) into (6), the MDL criterion becomes MDL(P ) = N P M (log π + 1) and the MDL-based estimator iŝ Figure 1b shows the MDL criterion for the experiment presented in the previous subsection. In this figure, we can observe that the period that minimizes the criterion isP = 5.

IV. NUMERICAL SIMULATIONS
In this section we evaluate the performance of the proposed estimator, and compare it with that of previous detectors. Specifically, we consider the following model for the CS process where h[n] is a Rayleigh channel with 10 taps (at the symbol rate) and a flat power delay profile, w[n] is an additive Gaussian noise, generated by a moving average model of order 19 (at the sampling frequency). The transmitted signal s[n] is QPSK with rectangular shaping. Moreover, the symbol rate is 600 Ksymbols/second and the sampling frequency is 2.4 MHz, which yields a cycle period P = 4. Finally, M = 10 realizations and N = 64, which results in 4 · 10 · 64 = 2560 samples available at the estimator.
We have chosen the following cycle period estimators for comparison. The first one, presented in [8], is based on the Loève spectrum [9]. Basically, it uses a smoothed periodogram to estimate the Loève spectrum, which is normalized by the estimated power spectral density. Then, the magnitude squared of the normalized spectrum, or coherence, is integrated along lines parallel to the stationary manifold, and the fundamental cycle frequency is the one that maximizes this measure. In particular, we estimate the smoothed periodogram with a length 2048 FFT and a smoothing window of length 256. The second competitor is based on the detector of cyclostationarity presented in [4]. Basically, it obtains the statistic for different cycle periods, and selects the period with the largest value of the statistic. In our implementation, we use the first four lags of the cyclic covariance to build the detector in [4], and a Kaiser window of length 257 to estimate the cyclic power spectral density. Figure 2 shows the probability of missed detection vs. the signal-to-noise ratio (SNR) for the proposed estimator and the two competitors. As can be seen, the proposed estimator yields the best results for moderate and high SNRs. It is also worth noticing that the estimator based on the detector in [4] does not improve with the SNR. To shed some light on this observation we plot 1000 realizations of the statistic used in [4] against P for SNR = 10 dB, and its mean (as the thick line). As can be seen the value corresponding to P = 2 also presents high values. This phenomenon is due to the fact that P = 2 captures the cyclostationarity exhibited in the second harmonic of the fundamental cycle frequency. For some realizations the statistic corresponding to P = 2 may be larger than for P = 4, and that explains why the performance of the estimator does not improve with the SNR.
Two final comments are in order. First, the approach based on the detector [4] provides very good performance for cycle periods that are prime numbers, since the aforementioned effect does not occur for such values. Second, the signal in our example is an "ill-CS" signal 1 according to the definition given in [11]. This prevents a comparison with the variability technique [11].

V. CONCLUSIONS
We have obtained the ML estimator for the period of a CS process. The first ingredient to obtain the ML estimator is the relationship between the scalar-valued CS process and a vector-valued wide-sense stationary (WSS) process, obtained by arranging the CS process in blocks of size given by the cycle period. The resulting WSS process has a block-Toeplitz 1 A ill-CS signal is a CS signal with all the diagonal elements of Q[m] equal. covariance matrix, for which there is no closed-form ML estimate. We thus asymptotically approximate it by a blockcirculant matrix, for which we may obtain a closed-form estimate. The ML estimate of the covariance matrix assumes that the cycle period is known. Since this is not true, we must therefore sweep its value and select the value that maximizes the compressed likelihood as the ML estimator. However, we showed that the ML estimator is not a valid estimator, since the larger the cycle period is, the more degrees of freedom there are, and the likelihood obviously becomes larger. To solve this issue, we penalized large values of the cycle period using the MDL criterion. Finally, simulations showed the good performance of the proposed estimator.