Efficient Iteratively Reweighted LASSO Algorithm for Cross-Products Penalized Sparse Solutions

In this paper, we describe an efficient iterative algorithm for finding sparse solutions to a linear system. Apart from the well-known L1 norm regularization, we introduce an additional cost term promoting solutions without too-close activations. This additional term, which is expressed as a sum of cross-products of absolute values, makes the problem non-convex and difficult to solve. However, the application of the successive convex approximations approach allows us to obtain an efficient algorithm consisting in the solution of a sequence of iteratively reweighted LASSO problems. Numerical simulations on randomly generated waveforms and ECG signals show the good performance of the proposed method.


I. INTRODUCTION
The reconstruction of an unknown signal of interest from noisy observations is a common problem in signal processing. Often, this signal of interest corresponds to a sparse sequence of events that can be modelled as the output of a linear system whose input is a spike train. This happens, for instance, in electrocardiography, where the observed ECG is composed of a regular sequence of waveforms generated by the propagation of electrical impulses inside the heart. In this case, the spike train corresponds to the occurrence of the heartbeats and the linear system models the path from the heart to the sensor located on the body's surface [1], [2]. As a second example, we may consider the field of Gamma spectroscopy, whose objective is to identify radioactive sources and their activity. In this case, the spike train corresponds to the arrival times of the detected particles and the linear system models the measurement process [3], [4]. These two applications (as well as many others) have another important common characteristic: the negative co-occurrence of the events modelled by the spike train. Indeed, in the case of electrocardiograms (ECGs), after a cardiac cell activation there exists a so called refractory period where the cell cannot be excited [5]. Similarly, in the analysis of spectrometric data obtained through Type I counters, the detector (e.g., a Geiger counter) has an associated dead time during which it is unable to record another particle interaction [6]. These physical constraints impose inactivity periods that must be taken into account.
On the one hand, co-ocurrence has been extensively exploited in the field of computer vision [7], [8], but it has received much less attention in other signal processing and machine learning domains. On the other hand, many sparsityaware signal processing approaches have been proposed since the introduction of the LASSO by Tibshirani in 1996 [9]: the fused LASSO [10], which imposes both sparsity and flatness of the obtained coefficients profile, the elastic net, which favors sparsity obtained by correlated variables [11], model-based compressive sensing [12], etc.
Unfortunately, very few approaches have been proposed so far to simultaneously encourage both sparsity and negative cooccurrence, i.e., a minimum distance among activations in the latent spike train. In [1]- [4] an initial spike train was obtained by means of the LASSO and a post-processing stage was then applied to keep only the strongest activations and remove those that did not respect the negative co-occurrence period. Then, in [13] a novel penalty term, based on the cross-products of the reconstruction coefficients, was added to the LASSO cost function in order to enforce negative co-occurrence. The resulting cross-products LASSO (CP-LASSO) approach resulted in a non-convex optimization problem that was solved using the successive convex approximations (SCA) technique. However, although the proposed algorithm showed a good performance, its computational cost prevented its application to large datasets.
In this paper, we introduce an efficient iterative algorithm to obtain an approximate solution of the CP-LASSO problem, through a sequence of iteratively reweighted LASSO problems, that allows us to deal with thousands of samples. The paper is organized as follows. Section II shows the problem formulation. The CP-LASSO cost function and its efficient minimization using the iteratively reweighted LASSO are described in detail in Section III. Then, Section IV shows the numerical results on randomly generated waveforms and ECG signals. Finally, Section V concludes the paper.
II. PROBLEM FORMULATION Let us consider an observed discrete-time signal, y[n], which corresponds to the output of an LTI system contaminated by noise, i.e., where * denotes the standard linear convolution, h[n] is the length L h impulse response of the LTI system, w[n] ∼ N (0, σ 2 w ) is additive white Gaussian noise (AWGN) with zero mean and variance σ 2 w , and s[n] is a sparse latent sequence: with K indicating the total number of non-zero elements (i.e., spikes, activations or events), S k their amplitudes (which are strictly positive in some applications), and δ[n] denoting Kronecker's delta. Substituting (2) into (1), we obtain Alternatively, (3) can be expressed more compactly as where H is the N × N Toeplitz channel matrix, and ] is the noise vector.

III. PROBLEM SOLUTION A. Minimization Problem: Cross-Products Penalized Cost Function
Given the observation model in (4), our goal is finding a latent vectorŝ that fulfills the following conditions: •ŝ is P -sparse with P N and P ≈ K (note that the exact value of K is often unknown in practice) in order to ensure a good localization ability of the latent activations. • Hŝ is a good approximation of the unknown noiseless signal x = Hs. • Activations are not too close in order to respect the restrictions imposed by the addressed physical problem . Taking into account these constraints, we can express our optimization problem as [13] Here, with a slight abuse of notation, the first term represents the conventional sparsity promoting L 1 -norm penalization term. The second term measures the residual error, with the parameter β allowing us to establish the desired tradeoff between the residual error and the sparsity level of the recovered activation signals. Finally, the last term aims to promote solutions with not too close activations. Thus, the third term in (7) penalizes the cross-products between samples, where g[k] can be considered as a penalization window, which in practice will be symmetric (g[k] = g[−k]) and of a rather small length (g[k] = 0, ∀|k| > L g ).
The reformulation of the optimization problem (7) in matrix/vector form yields minimize s s 1 + β y − Hs where H is given by (6), G is a symmetric Toeplitz matrix with g[k] in its ±k-th diagonal, and |s| denotes the elementwise absolute value of the vector s B. Non-Convex Optimization Algorithm: Iteratively Reweighted LASSO Unfortunately, the optimization problem in (8) is not convex, and we have to resort to some approximations in order to find a local solution. However, the application of the successive convex approximations (SCA) approach provides a simple method which can be easily interpreted as an iteratively reweighted LASSO algorithm. Thus, considering a candidate solutionŝ, we can approximate the last term in (8) as which allows us to rewrite the optimization problem in (8) with γ = 1 + 2G |ŝ|. Interestingly, the above problem reduces to a weighted LASSO problem, where the weights in the vector γ summarize the penalization factors due to the sparsity promoting L 1 term, and the cross-product penalizations based on the candidate solutionŝ. With this in mind, and after the introduction of an adaptive step-size α, the Iteratively Reweighted LASSO algorithm is summarized in Algorithm 1.
From the description in Algorithm 1, it is easy to see that the cost function is decreasing with the iterations, which suffices to guarantee the convergence to a local minimum. Note also that the staring point for our algorithm corresponds exactly to the LASSO solution. Consequently, Algorithm 1 is always going to attain a lower value of the cost function than the LASSO algorithm and with the added benefit of enforcing a minimum distance among the detected events. Finally, let us point out that all the matrix-vector products in the weighted LASSO for solving (10), as well as in Algorithm 1, involve Toeplitz matrices, and therefore can be very efficiently implemented as linear filterings.
Algorithm 1 Iterative Reweighted LASSO Algorithm for solving the Cross-Product Penalized Problem in (8).

A. Case Study 1: Random Shapes
In this first case study we consider randomly generated shapes, h[n] ∼ N (0, 1) for 0 ≤ n ≤ L h − 1 and L h = 12. We generate N = 10 4 observed samples, y[n] for 0 ≤ n ≤ N − 1, with regularly occurring activations every M = 100 samples plus a random delay ∆M k ∼ U({1, 2, . . . , 10}) for k = 0, 1, . . . , K − 1 with K = N/M = 1000. Therefore, the activation times are m k ∼ U({100k+1, 100k+2, . . . , 100k+10}) for 0 ≤ k ≤ K − 1 and their amplitudes are also normally distributed: S k ∼ N (0, 1). For the restriction period, we set L g = 30, whereas the parameter β 2 = C/σ 2 w for several tested values of C. As performance measures, we consider both the reconstruction SNR, SNR(dB) = 10 · log 10 x x (x − Hŝ) (x − Hŝ) (11) and the recovered sparsity, measured as the number of nonzero values inŝ. We test a range of SNR values from 0 to 50 dB, with the constant C ranging from 0.001 to 0.05, performing 100 simulations for each case.  Figure 1 shows the SNR(dB) as a function of the input SNR for different values of C, whereas Figure 2 shows the sparsity ofŝ. From Fig. 1 we notice that the optimum value of β depends on the SNR of the observations, with smaller values of β providing better results as the SNR increases. However, from Fig. 2 we can also see that the improved reconstruction SNR may come at the cost of a lack of sparsity in the recovered  s. Note that the largest the value of β the more emphasis we place on a good reconstruction and the less importance we give to the sparsity ofŝ. Therefore, if we do not want to lose the spike localization ability provided by a sparse recovered latent sequenceŝ, we need to select the proper value of β that allows us to achieve the desired sparsity. For instance, for an SNR = 10 dB in the observations the optimum choice (out of the tested values) is β 2 = 0.1581 (i.e., C = 0.05), as this approach provides a 0.0046 average sparsity inŝ and the best reconstruction SNR, SNR(dB) = 2.89. Interestingly, the optimum value of β 2 = C/σ 2 w in order to achieve a good reconstruction with a sparse latent signal seems to remain quite stable as the SNR in the observations increases. Table  I shows the reconstruction SNR for the optimum value of β (out of the tested ones) that attains a sparsity level similar to the sparsity level of the original signal (0.01). Note that, even though the optimum value of C decreases as the observations SNR increases, as we set β 2 = C/σ 2 w , the optimum value of β is always either β 2 = 0.1 or β 2 = 0.1581. Finally, Figure  3 shows an example of the unobserved clean signal x[n], the observed noisy signal y[n], and the recovered signalx[n] for a low SNR situation (SNR = 20 dB) and a high SNR case (SNR = 40 dB). Note that all the spikes are properly detected in the high SNR case, whereas the small ones are masked by the noise (and thus cannot be recovered) in the low SNR case.

B. Case Study 2: Electrocardiographic (ECG) Signals
As a second case study, we consider an electrocardiographic (ECG) signal. ECG signals are composed of regularly occurring waveforms (the P and T waveforms and the QRS complex) and no activity intervals (isoelectric periods), contaminated by noise and interference. Therefore, they are intrinsecally sparse signals with a latent sequence of activations corresponding to the appearance of the different waveforms. Furthermore, due to physiological resctrictions a refractory period of 200-300 ms, during which no activations can occur, exists after each activation. Consequently, ECG signals fit perfectly within our model.
In this example, we focus on the detection of the R peak in the QRS complex, since this is the largest peak in the ECG and its detection is commonly the first stage in any ECG signal processing algorithm [14], [15]. Hence, for the dictionary we consider the typical shape of a QRS complex as extracted from the first heartbeat generated by the ECGSYN synthetic ECG generator [16]. used (after normalization) as h[n] to build matrix H. Then, Figure 5 shows an example of the unobserved clean signal x[n], generated from another run of ECGSYN with f s = 360 Hz and average heart rate (HR) of 60 bpm (with 6 bpm of standard HR deviation), the observed noisy signal y[n], for an SNR = 20 dB, and the recovered signalx [n]. Note that all the QRS complexes seem to have been properly detected. In order to verify this, we apply the well-known Pan-Tompkins algorithm [17] to the recovered signalx[n], and compare the result of the R peak detection w.r.t. the clean signal x[n].    Figure 6 shows the output of the Pan-Tompkins algorithm when applied to the recovered signalx[n]: all the 64 R peaks in the signal are properly detected. Indeed, this peaks are always within 1 sample (i.e., 2.8 ms) of the true location and their average distance is 0.3281 samples (i.e., 0.9 ms). Therefore, we conclude that the proposed approach is doing a good job in extracting the relevant information (i.e., locating all the R peaks) from the noisy signal.

V. CONCLUSIONS
In this paper, we have described an efficient algorithm to recover a sparse latent signal that requires the enforcement of a minimum distance among the activations. A cost function that includes both the sparsity and the minimum distance restrictions has been introduced and we have shown that this function can be minimized through an iterative reweighted LASSO approach. The proposed method has been tested on a first example with randomly generated samples and a second example with ECG signals with excellent results. Future lines include developing a method to obtain the optimum value of the sparsity parameter, performing extensive tests on real-world signals (e.g., ECG and gamma spectrometry signals), and developing a multi-channel extension.