Efficient SER Estimation for MIMO Detectors via Importance Sampling Schemes

In this paper we propose two importance sampling methods for the efficient symbol error rate (SER) estimation of maximum likelihood (ML) multiple-input multiple-output (MIMO) detectors. Conditioned to a given transmitted symbol, computing the SER requires the evaluation of an integral outside a given polytope in a high-dimensional space, for which a closed-form solution does not exist. Therefore, Monte Carlo (MC) simulation is typically used to estimate the SER, although a naive or raw MC implementation can be very inefficient at high signal-to-noise-ratios or for systems with stringent SER requirements. A reduced variance estimator is provided by the Truncated Hypersphere Importance Sampling (THIS) method, which samples from a proposal density that excludes the largest hypersphere circumscribed within the Voronoi region of the transmitted vector. A much more efficient estimator is provided by the existing ALOE (which stands for "At Least One rare Event") method, which samples conditionally on an error taking place. The paper describes in detail these two IS methods, discussing their advantages and limitations, and comparing their performances.


I. INTRODUCTION
Monte Carlo (MC) simulation is a key methodology of performance evaluation for wireless communication links, for many of which an exact computation is typically unfeasible. Monte Carlo simulations are routinely used to estimate the symbol error rate (SER), the bit error rate (BER), or the outage probability in block fading channels. However, naive Monte Carlo simulation can be very inefficient and requires very long simulation runs when the target probability is small. For instance, the typical BER for lightwave communication systems is of the order of 10 −9 to 10 −12 [1], while ultrareliable low-latency communications (URLLC) have stringent requirements in terms of outage probability (e.g. lower than 10 −6 ) [2]. For systems with such a stringent requirements in terms of SER/BER or outage probability, the need to perform in excess of 10 12 trials makes MC simulation infeasible.
To overcome this limitation of raw MC simulation, this paper studies efficient SER estimation techniques for multipleinput multiple-output (MIMO) communication systems based on importance sampling (IS) techniques. IS has been used as a method for variance reduction in SER or BER simulations in a wide range of scenarios since the late seventies [3]- [7]. Despite this, many digital communication researchers are still unaware of the potential benefits of IS techniques to char-acterize the statistical performance of digital communication systems.
The basic IS methodology samples from a proposal distribution that increases the number of errors during simulation, and then weights the samples by the ratios of the target to the proposal densities [8]. As a single-proposal distribution for the MIMO detection problem, we propose in this paper to use a truncated Gaussian where the mass around its center has been removed. We call this IS method as truncated hypersphere importance sampling (THIS). However, covering the region where both the target density and the function to be evaluated take significant values with a single proposal is not possible in general. An alternative is to use more than one proposal through the so-called multiple importance sampling (MIS) technique [9], [10]. In this paper we use a MIS technique called ALOE ("At Least One rare Event") [11], which is particularly well suited for SER estimation. ALOE is extremely efficient to estimate the integral of a Gaussian in a region defined by a union of half-spaces, which is precisely the error event in a digital communications system. Conditioned to a transmitted symbol, an error occurs when the observation falls in a union of half-spaces or, equivalently, outside a given polytope. The proposal in ALOE simulates the system conditionally on an error taking place, which makes it more efficient than other importance sampling techniques. However, ALOE requires a perfect knowledge of the Voronoi region in the lattice of symbols transformed by the channel.

II. MIMO DETECTION
We consider a multiple-input multiple-output (MIMO) spatial multiplexing system with L transmit and receive antennas, where the channel is unknown at the transmiter but perfectly known at the receiver side. 1 The received signal follows the well-known baseband model where H is an L × L matrix whose columns represent the known complex channel gains from each transmit antenna to the L receive antennas, s = (s 1 , . . . , s L ) T is the vector with the unknown transmitted symbols, and n = (n 1 , . . . , n L ) T is the noise vector. The noise is modeled as n ∼ CN L (0, σ 2 I), where CN L (0, R) denotes a complex Gaussian distribution in C L with zero mean and covariance R. Each symbol s belongs to a discrete constellation that we will assume to be an square M -QAM (Quadrature Amplitude Modulation) signal. The average energy per symbol is normalized so that E[|s k | 2 ] =1. We will find convenient to reformulate the problem in terms of real variables as follows (2) Then, without loss of generality we can focus on the system model given by (1) but with real vectors and matrices, in this case the symbols take values on a finite alphabet of integers, n ∼ N 2L (0, σ 2 2 I), and the vector s ∈ D 2L belongs to a 2Ldimensional square real lattice with cardinality |D 2L | = M L .
The MIMO detection problem consists of estimating the complex symbol vectorŝ ∈ D 2L in (1) that minimizes the symbol error probability. Under white Gaussian noise, the optimal maximum likelihood (ML) decoder is obtained by solving an Integer Least Squares problem Note that this problem is equivalent to constructing the Voronoi diagram with all transformed points Hs with s ∈ D 2L , and estimatingŝ as the associated point to the region where x falls. Maximum likelihood decoding therefore amounts to finding the closest lattice point Hs to a given noisy observation, x, which is known to be NP-hard for generic channels [12], [13] meaning that the problem complexity is exponential on the dimension of the lattice [14]. Using the Finke-Phost sphere-decoding (SD) algorithm [15], ML detection can be achieved at an average complexity that is cubic in the number of transmit antennas, as was shown in [16]. Later Agrell et al. showed that the Schnorr-Euchner (SE) enumeration strategy [17] reduces the complexity of SD algorithms in comparison to [15], and very efficient SD implementations have been proposed in [18].

A. Problem statement
Let us assume that s m is transmitted and let V m be its associated Voronoi region, which is a polytope defined by the intersection of finitely many hyperplanes in R 2L . The observation x = Hs m + n is the 2L-dimensional lattice point distorted by the MIMO channel and perturbed by additive Gaussian noise. Assuming that all symbols are transmitted with equal probability, the symbol error probability (SER) is where is the conditional SER when s m is transmitted, and L j,m is the number of complex symbol errors when s m is transmitted and s j is detected. Each term Prob(Hs m + n ∈ V j ) is the integral of a Gaussian distribution centered at r m = Hs m in the polytope region V j . Note that transmitting s m and deciding s j can result in a different number of complex symbol errors L j,m ∈ {1, ..., L}; therefore each term in (5) is weighted by L j,m /L. The integrals required to evaluate (5) are in general difficult to compute in closed-form, and therefore Monte Carlo simulation is typically used to estimate the SER. However, naive Monte Carlo simulation (also called raw Monte Carlo) can be very inefficient and requires very long simulation runs, especially at high signal-to-noise ratios or when either the constellation size, M , or the number of antennas, L, is large. Our goal thus is to develop efficient sampling schemes to estimate p(e|s m ) in (5).

SAMPLING
For notational simplicity let us denote the transmitted vector as s, with associated decision region V ⊂ R 2L , and let p = p(e|s) be the conditional SER of interest, which can be expressed as whereπ(x) N 2L (Hs, σ 2 2 I) is the Gaussian distribution of the observation, and the function h(x) = (x)/L is the ratio of wrongly detected symbols, i.e., (x) is the number of errors when s is transmitted and x is received. Note that h(x) = 0 when x falls in the Voronoi decision region for s.

A. Single-proposal importance sampling
Importance sampling (IS) is a Monte Carlo technique used when sampling fromπ(x) is either not possible or not efficient. The N samples are simulated instead from a so-called proposal distribution, q(x), and the estimator of p is built as where w n =π (xn) q(xn) is the importance weight. Let us first consider a Gaussian proposal q(x) =π(x) = N 2L (Hs, σ 2 2 I). This particular case yields the standard raw Monte Carlo algorithm whose efficiency decays when σ 2 decreases. This is an illustrative example of how IS is an efficient technique if q(x) is close to h(x)π(x). However, the variance of the IS estimator increases when too many samples from q(x) fall in V, and hence h(x) takes value zero.
We present here a single-proposal IS method that increases the efficiency by avoiding the simulated samples to be close to Hs. We call the method as truncated hypersphere importance sampling (THIS). The method only requires to know the distance d min to the closest neighbor of Hs. Then, we set the IS proposal to a truncated Gaussian distribution q(x; d min /2) ∝ π(x)I R d min /2 (x), where R dmin/2 is the hypersphere of radius d min /2, i.e., q(x; d min /2) is a distribution proportional to the targeted Gaussian where the mass around its center has been removed. Note that by construction, q(x; d min /2) covers all the support where the integrand of Eq. (6) takes values different from zero because the hypersphere is inside the Voronoi decision region of Hs, i.e, R dmin/2 ⊂ V. The estimator of THIS is built as where Z q is the normalizing constant of the proposal, i.e., Z q = R d min /2 N 2L (Hs, σ 2 2 I)dx and it can be easily computed. In Appendix A we describe how to sample from the truncated Gaussian and the computation of Z q . Note that, for every sample x n , the SD algorithm is run in order to find the ML estimate of the transmitted vector of symbols, and hence, the value of the ratio of errors h(x n ). The estimator p (THIS) is unbiased, consistent, and guarantees a variance reduction w.r.t. the raw MC estimator, i.e., Var p (THIS) < Var p (MC) . See Appendix B for more details.

B. Multiple importance sampling
Multiple importance sampling (MIS) is a natural extension of IS allowing for the simulation from a set of K proposals, {q k (x)} K k=1 , instead of just one [9]. However, the extension from one to several proposals is not straightforward and many sampling and weighting schemes can be devised (see [10] for a review). A conventional way to proceed is to simulate from the mixture proposal as where α = [α 1 , ..., α K ] is a simplex vector with all nonnegative weights in the mixture such that K k=1 α k = 1. The MIS extension of Eq. (7) is i.e., the new importance weight is now w n =π (xn) qα(xn) . We recall that we aim at integratingπ(x) in region V, weighting each sample with h(x) = (x) L , the ratio of errors in the detector when x is received. The region V can be described as the union of all half-spaces in R 2L generated by the hyperplanes that define the border of V.
A natural way to adapt this MIS scheme to our problem is to follow the choice of proposals in [11] called ALOE for "At Least One rare Event". In ALOE the number of proposals is the number of hyperplanes defining the Voronoi region and each proposal is a truncated version of a Gaussian centered at the received symbol beyond each hyperplane. More specifically, q k (x) = , where P k = I S k (x)π (x)dx is the integral of the target distribution beyond the hyperplane S k = {x ∈ R 2L | x T γ k ≥ β k } (parametrized by γ k and β k ), and V = K k=1 S k . The procedure for the efficient simulation from a generic truncated Gaussian distribution is described in [11]. ALOE has recently been used for the SER estimation of single-input single-output (SISO) AWGN channels with non-square two-dimensional constellations in [19]. With twodimensional lattices or constellations the Voronoi regions are determined by just a few hyperplanes, which can easily be computed, and hence an exact implementation of ALOE is possible. However, for MIMO systems the number of hyperplanes determining a Voronoi region can be very large and computing all hyperplanes is in general not feasible. In this work we use the SD algorithm to find K lattice points that belong to a hypersphere centered at Hs, where K is a fixed number chosen to limit the complexity of this step. These points determine the K hyperplanes to be used in the mixture proposal (9). Our experiments suggest that selecting K = 2 log 2 (M )L 2 provides accurate SER estimates.
Let us summarize the implementation of a modified ALOE for approximating the integral of Eq. (6). We first define p = K k=1 P k , which is an upper union bound of p. Then, in ALOE, Eq. (10) yields The weight of each proposal in the mixture defined in Eq. (9) is chosen as α k = P k /p, for k = 1, ..., K. Then, ALOE estimator is where C(x n ) = K k=1 I S k (x n ) is the number of half-spaces S k where x n is present. ALOE estimator is unbiased, and i.e., the bound of the variance decays with the number of samples, and the constant factor depends on the true probability, p. It is straightforward to understand the high performance of ALOE for high SNR scenarios for which p is small. A deeper theoretical analysis of ALOE can be found in [11].

IV. SIMULATION RESULTS
We consider an example with L = 6 antennas and M = 64. As a figure of merit for the different estimators we use the relative root mean square error, defined as where p is the true probability. We compare the performance of raw Monte Carlo (Raw-MC), ALOE and THIS. For the three methods, the SER at each Eb/N 0 is estimated with a total of 10 5 random samples. More specifically, we generate 100 different vectors of symbols (i.e., lattice points) and, for each lattice point, N = 1000 samples are drawn from each  proposal. The RRMSE results are shown in Fig. 1. We observe that the gap between raw Monte Carlo, THIS and ALOE increases as the SNR increases. The SER curves for MC and ALOE along with their confidence intervals are depicted in Fig. 2. Moreover, Figs. 3 and 4 show the RRMSE and the SER curves for the different estimators when L = 12 and M = 4. Again, ALOE provides extremely accurate SER estimates with just 10 5 random samples.

V. DISCUSSION
The integral in Eq. (6) can be approximated using the raw Monte Carlo by sampling directly fromπ(x). However, the efficiency of the method decays when most of the samples fall inside the Voronoi decision region of Hs. Importance sampling can alleviate this problem by avoiding or at least decreasing the waste of samples that fall in V. The proposed THIS  estimator increases the efficiency by avoiding to sample in the hypersphere centered at Hs with radius d min /2. The advantage is that the only information it requires is the closest symbol to Hs and its distance d min . The disadvantage is that in higher dimension, the volume of the 2L-dimensional hypersphere of radius d min /2 can be small w.r.t. the total area of the Voronoi decision region of Hs. In that case, the efficiency of THIS can deteriorate, although it will always be better than raw Monte Carlo. The advantage of ALOE is that all samples have at least one error, i.e., h(x n ) > 0 for all x n , which is a key feature of the algorithm, particularly for high SNR when most samples in raw Monte Carlo give h(x n ) = 0.

ACKNOWLEDGMENT
The work of V. Elvira was partially supported by Agence Nationale de la Recherche of France under PISCES project , i.e., an isotropic Gaussian distribution truncated out of the hyper-sphere a = d min /2, by using standard methods used in directional statistics [20]. We proceed as follows: 1) Simulate r ∼ χ 2 (2L) (r|r > a), i.e., from the truncated χ 2 distribution with 2L degrees of freedom, defined in the interval (a, ∞). This can be efficiently done as follows: • Find c as the cdf of the distribution χ 2 (2L) at a σ 2 .
• Apply a modified inverse transform sampling by first sampling uniformly u ∼ U(c, 1), and then obtaining r as the evaluation of u in the inverse cdf of χ 2 (2L) . 2) Simulate z ∼ N (0, I) of dimension 2L.
3) Project z in the 2L − 1 unit sphere θ = z ||z|| . 4) Output x = σθ √ r. The normalizing constant of the THIS proposal can be computed as Z q = R d min /2 N 2L (Hs, σ 2 2 I)dx = 1 − c, where c is the volume of hypersphere that has been removed from the Gaussian distribution. h(x n )π (x n )I R d min /2 (x n ) since V ⊆ R dmin/2 . The variance of the estimator of THIS is Similarly, the variance of the raw MC estimator with q(x) = π(x) can be computed as By simply comparing Eqs. (15) and (16), and since Z q < 1, it follows that Var p (THIS) < Var p (MC) .