Multiple Importance Sampling for Efficient Symbol Error Rate Estimation

Digital constellations formed by hexagonal or other non-square two-dimensional lattices are often used in advanced digital communication systems. The integrals required to evaluate the symbol error rate (SER) of these constellations in the presence of Gaussian noise are in general difficult to compute in closed-form, and therefore Monte Carlo simulation is typically used to estimate the SER. However, naive Monte Carlo simulation can be very inefficient and requires very long simulation runs, especially at high signal-to-noise ratios. In this letter, we adapt a recently proposed multiple importance sampling (MIS) technique, called ALOE (for `At Least One rare Event'), to this problem. Conditioned to a transmitted symbol, an error (or rare event) occurs when the observation falls in a union of half-spaces or, equivalently, outside a given polytope. The proposal distribution for ALOE samples the system conditionally on an error taking place, which makes it more efficient than other importance sampling techniques ALOE provides unbiased SER estimates with simulation times orders of magnitude shorter than conventional Monte Carlo.


I. INTRODUCTION
In several communication systems the transmitted signal belongs to a non-square 2D lattice. For instance, hexagonal constellations are the densest packing of regularly spaced points in 2D and, as the number of constellation points grows to infinity, they also minimize the probability of error in Gaussian noise under an average power constraint being therefore optimal [1]. Other examples of constellations that provide nonrectangular decision boundaries are the θ quadrature amplitude modulation (QAM) family [2], and the recently proposed family of improper constellations in [3].
When these constellations are used, maximum likelihood decoding amounts to finding the closest lattice point to a given noisy observation, x, which is known to be NP-hard for generic channels [4], [5] meaning that the problem complexity is exponential on the dimension of the lattice [6]. Further, to evaluate the symbol error rate probability conditioned to a given constellation point we need to compute the probability that the observation x lies outside a polytope defining the corresponding Voronoi region. The resulting integrals are in V. Elvira  general difficult to compute in closed form for arbitrary decision regions, and typically one resorts to standard Monte Carlo simulations for performance evaluation, which can be very time consuming especially when the number of symbols of the constellation is large and the targeted symbol error rate (SER) is below 10 −6 .
As an alternative to naive Monte Carlo, importance sampling (IS) has been used as a method for variance reduction in SER or BER simulations in a wide range of scenarios since the late seventies [7]- [10] (for a review we refer the reader to [11]). Despite this, many digital communication researchers are still unaware of the potential benefits of IS techniques to characterize the statistical performance of digital communication systems.
The basic IS methodology samples from a proposal distribution that increases the number of errors during simulation, and then weights the samples by the ratios of the target to the proposal densities [12]. However, designing a good proposal with high density in regions where samples should be drawn might not be easy, and typically is problem dependent. For instance, a biased channel distribution is proposed in [13] for the simulation of orthogonal space-time block codes (OSTBCs) on Nakagami channels. Other recent work is [14], where IS is used for the estimation of the outage probability of multi-antenna receivers with generalized selection combining. A general approach is using multiple importance sampling (MIS), where several proposals are used for simulating the samples [15]- [18].
In this letter, we apply a recently proposed multiple importance sampling technique called ALOE ("At Least One rare Event") [19] to estimate the SER in additive white Gaussian noise (AWGN) channels.
The conditional symbol error probability is the integral of a Gaussian outside a polytope formed by the intersection of K hyperplanes; or, in other words, the integral over the union of the half-spaces formed by these hyperplanes. ALOE estimates this integral by taking samples from a mixture of K truncated Gaussians where errors take place. The method can be easily extended to arbitrary 2D constellations and provides unbiased SER estimated with bounded variance.

A. Problem statement
Our motivating problem is the efficient SER estimation of digital constellations like the one depicted in Fig. 1a. Assume that symbol s m , marked with a star in Fig. 1b Gaussian noise. The conditional probability of error is the integral of a Gaussian distribution centered at s m outside the shaded decision region in Fig. 1b: , and, assuming that all symbols are transmitted with equal probability, the symbol error probability is Our goal is to develop an efficient multiple important sampling (MIS) scheme to estimate p m , m = 1, . . . , M in (1).
For notational simplicity let us denote the transmitted symbols as s, and let R be its associated Voronoi region, which is a polytope defined by the intersection of finitely many hyperplanes in R 2 . The probability of interest can be then expressed as p = whereπ(x) ≜ N (s, σ 2 I) is the Gaussian distribution of the observation, and I R (x) is the indicator function taking value 1 for all x ∉ R, i.e., out of the shaded decision region in Fig. 1b. The integral in Eq. (2) is intractable in the general case where R is an arbitrary polytope. However, for some constellations such as QAM modulations, the integral can of course be obtained in a closed-form, and there is no need to resort to any simulation procedure. A naive Monte Carlo estimator of p simulates N samples fromπ(x) and approximates (2) aŝ i.e., it simulates from the observation distribution and counts the rate of observations that are out of R. This estimator is unbiased, but its efficiency decays dramatically when p is small.

B. Importance Sampling
Importance sampling (IS) is a more advanced Monte Carlo methodology, used when sampling fromπ(x) is either not possible or not efficient. The N samples are simulated instead from a so-called proposal distribution, q(x), and the estimator of p is built aŝ where w n =π (xn) q(xn) is the importance weight. Note thatp (IS) is an unbiased and consistent estimator of p.
The variance of the estimatorp (IS) in Eq. (4) is given by It can be shown that the optimal proposal, q * , that minimizes the variance is q * (x) ∝ I R (x) ⋅π(x). Intuitively, the performance is poor when the targeted integrand I R (x) ⋅π(x) and the distribution that generates the samples have a large mismatch. In our problem, the performance is very bad when we find few or no errors for the simulated observation. This explains the very poor performance of the naive Monte Carlô p (MC) , whose variance can be obtained as a particular case of (5) as with a relative root mean square error (RRMSE) of Note that N needs to be inversely proportional to the true value p in order to provide a reliable estimate, which explains why N must be huge for high signal-to-noise ratio (SNR).

C. Multiple Importance Sampling
In the case of IS, it is difficult to find a good unique proposal with a low mismatch w.r.t. the multimodal product I R (x)⋅π(x). Multiple importance sampling (MIS) is a natural extension of IS in this setup, allowing for the simulation from a set of K proposals, {q k (x)} K k=1 , instead of just one [17]. However, the extension from one to several proposals is not straightforward and many sampling and weighting schemes can be devised (see [20] for a review). A conventional way to proceed is to simulate from the mixture proposal as where α = [α 1 , ..., α K ] is a simplex vector with all nonnegative weights in the mixture such that ∑ K k=1 α k = 1. The MIS extension of Eq. (4) iŝ We recall that we aim at integratingπ(x) in region R. This region can be described as the union of all half-spaces in R 2 generated by the K hyperplanes that define the border of R (e.g, K = 6 for the symbol in Fig. 1b). More precisely, defined by the k-th hyperplane, which is parametrized by γ k and β k .
We follow the choice of proposals in [19]. First, the number of proposals, K, is equal to the number of hyperplanes, being each proposal a truncated version of the target distribution (a Gaussian centered at the received symbol) beyond each hyperplane, i.e., q k (x) = I S k (x)π (x) P k , where P k = ∫ I S k (x)π (x)dx is the integral of the target distribution beyond the hyperplane (the procedure for the efficient simulation from a generic truncated Gaussian distribution is described in Appendices A and B). For reasons that will become apparent shortly, it is useful to define p = ∑ K k=1 P k , which is an upper union bound of p. 1 Then, Eq. (11) yieldŝ The weight of each proposal in the mixture defined in Eq. (10) is chosen as α k = P k p, for k = 1, ..., K. Then, Eq. (13) yields where C(x n ) = ∑ K k=1 I S k (x n ) is the number of half-spaces S k where x n is present. Note that we have used I R (x n ) = 1, since all samples are generated in R. Fig. 2 shows samples generated from the proposal corresponding to the symbol s m = 1 Note that the bound becomes an equality when S i ∩ S j = ∅, for all i ≠ j. 0.1343+0.2136j in the constellation of Fig. 1a. Note that there are K = 6 proposals in the mixture, and the samples of each proposal are represented by a different color.

D. Theoretical guarantees of the MIS estimator
The variance of the estimator (15) can be bounded by [19] Var(p (MIS) ) ≤ p(p − p) N .
First, note that when the upper union boundp gets closer to the true value p, the variance of the MIS estimator goes to zero. This is of special interest in our application since the performance of the raw Monte Carlo estimate deteriorates for high SNRs (i.e., when p is small). Conversely, the MIS estimator gets better for high SNR since the upper boundp becomes tighter. The reason is that most errors fall very close to the separating hyperplane, and hence in just only one region S k (i.e., C(x n ) = 1 for all x n ). Note that in the limit, when all C(x n ) = 1, the estimator (15) which turns to be p ≤ 1. Note that for moderate SNRs, the upper bound p is smaller than 1. Only for very low SNRs, p ≥ 1, which is not a problem since (a) we are comparing the MIS upper bound (not the variance itself), and (b) the low SNR scenario is not challenging and both MIS and raw Monte Carlo estimators are very accurate with a few samples. Eq. (16) gives us an upper bound for the variance of the conditional probability of error of an arbitrary symbol of the constellation that we denoted for simplicity as s. To estimate the symbol error probability in Eq. (1) we run M independent ALOEs, then, the variance of the P e estimator can be bounded as Note that several symbols in Fig. 1a have identical Voronoi regions, which can be exploited to avoid approximating all corresponding error probabilities. Moreover, one can allocate different number of samples, N m , for the approximation of the error probability associated to each of the M symbols. However, this fine tuning goes beyond the scope of this paper.

III. SIMULATION RESULTS
In this section we estimate the SER of arbitrary 2D digital constellations with the proposed MIS technique. As an example, we consider the family of improper constellations proposed in [3], whose two-dimensional signal points belong to a non-square lattice. Improper signals, which are correlated with their complex conjugate, have proven beneficial in several interference-limited wireless networks such as the interference channel [21]- [23], Z-interference channel [24], underlay cognitive radio networks [25], [26], and relay channels [27]. In all these scenarios, we wish to transmit a digital communications signal with a given circularity coefficient, which is the parameter that determines the degree of impropriety of the signal. The circularity coefficient of a zero-mean complex random variable X, is defined as [28] We consider a constellation with M = 64 symbols and circularity coefficient κ = 0.8, whose signal points are shown in Fig.  1a. We estimate the SER curve using ALOE and conventional Monte Carlo transmitting only N = 1280 symbols at each E b N 0 value, i.e., each estimator uses N = 1280 samples (we transmit 20 times each of the M = 64 symbols). Moreover, we also compare with a standard IS algorithm with a unique Gaussian proposal centered at each symbol s, and variance α 2 σ 2 I, with α > 1, i.e., the proposal is overdispersed w.r.t. the targetπ(x), since I R (x)π(x) is clearly more dispersed thañ π(x) (see the discussion about the optimal proposal in [29], [30] and the use of overdispersed IS proposals in [31]). We try a grid of values of α ∈ {1, 1.5, 2, ..., 5}, and we select the case with the smallest RRMSE. The experiment is repeated 200 times and the RRMSEs defined in (9) are depicted in Fig.  3. For high E b N 0 values, ALOE provides four to five orders of magnitude improvement over the usual naive Monte Carlo.

IV. CONCLUSIONS
A multiple importance sampling technique called ALOE is used in this paper to estimate the symbol error rate probability of 2D constellations designed over arbitrary lattices. ALOE is extremely efficient to estimate the integral of a Gaussian in a region defined by a union of half-spaces, which is precisely the error event in a digital communications system. At high signalto-noise ratios, ALOE provides orders of magnitude speedup with respect to conventional Monte Carlo simulation. As future work, we plan to extend the proposed SER estimation technique to other noise distributions, to higher dimensional lattices originated by multi-antenna systems, as well as to fading channels. constellations with enhanced features.

ACKNOWLEDGMENTS
The authors want thank Art B. Owen for his insightful comments.

APPENDIX
A. Simulation from a truncated Gaussian N (0, I) The proposals in the MIS implementation are truncated Gaussian distributions. Let us first describe the simulation of a truncated Gaussian N (0, I) in the half-space described by x T ω ≥ τ , which first proceeds by simulating the sample x from the complementary half-space (i.e., x T ω < τ ), and then delivering the −x for numerical stability. The algorithm described in [19] proceeds as follows: 1) Simulate z ∼ N (0, I) 2) Simulate u ∼ U(0, 1) 3) Let y = Φ −1 (uΦ(−τ )), where Φ(τ ) denotes the cumulative distribution function for the standard Gaussian 4) Let x = ωy + I − ωω T z 5) Output x = −x B. Extension to a generic truncated Gaussian N (µ, Σ) When the target distribution is a generic truncated Gaussian N (µ, Σ) that must be integrated over the union of the halfspaces described by the hyperplanes {γ k , β k } K k=1 , one can transform the problem as follows. First, we describe K halfspaces with the hyperplanes ω T k x ≥ τ k , where for k = 1, ..., K. Then, a Gaussian N (0, I) is integrated over the union of those half-spaces as described in previous section.