Multiple Importance Sampling for Symbol Error Rate Estimation of Maximum-Likelihood Detectors in MIMO Channels

In this paper we propose a multiple importance sampling (MIS) method for the efficient symbol error rate (SER) estimation of maximum likelihood (ML) multiple-input multiple-output (MIMO) detectors. Given a transmitted symbol from the input lattice, obtaining the SER requires the computation of an integral outside its Voronoi region in a high-dimensional space, for which a closed-form solution does not exist. Hence, the SER must be approximated through crude or naive Monte Carlo (MC) simulations. This practice is widely used in the literature despite its inefficiency, particularly severe at high signal-to-noise-ratio (SNR) or for systems with stringent SER requirements. It is well-known that more sophisticated MC-based techniques such as MIS, when carefully designed, can reduce the variance of the estimators in several orders of magnitude with respect to naive Monte Carlo in rare-event estimation, or equivalently, they need significantly less samples for attaining a desired performance. The proposed MIS method provides unbiased SER estimates by sampling from a mixture of components that are carefully chosen and parametrized. The number of components, the parameters of the components, and their weights in the mixture, are automatically chosen by the proposed method. As a result, the proposed method is flexible, easy-to-use, theoretically sound, and presents a high performance in a variety of scenarios. We show in our simulations that SERs lower than $10^{-8}$ can be accurately estimated with just $10^4$ random samples.

infeasible. Monte Carlo simulations are routinely used to estimate the symbol error rate (SER), the bit error rate (BER), or the outage probability in wireless systems. However, naive Monte Carlo simulation can be very inefficient and requires very long simulation runs when the target probability is small. For instance, the typical BER for lightwave communication systems or satellite links (after error correction coding) can be on the order of 10 −9 to 10 −12 [1], [2], the probability of a packet loss in packet switching systems is also on the order of 10 −9 , and ultra-reliable low-latency communications (URLLC) have requirements in terms of outage probability lower than 10 −6 [3]. A common rule of thumb is to simulate 10 or 100 times more samples than the inverse of the probability we are trying to estimate. For systems with such a stringent requirements, the MC simulation is infeasible due to the need of performing an unreasonable large number of trials. But even for more conventional systems with BER requirements on the order of 10 −3 − 10 −5 , MC simulation can be costly in time and computational resources, especially when it is necessary to evaluate the system performance in a large number of scenarios due to, for example, user mobility. In this paper, we develop an efficient simulation scheme to study the SER of multiple-input multiple-output (MIMO) systems using maximum likelihood decoding. The performance of suboptimal linear MIMO detectors, such as the zero-forcing (ZF) or the minimum mean-squared error (MMSE) detectors, have been analyzed in [4] based on a high signal-to-noise-ratio (SNR) approximation. The authors of [5] derived approximate bit error rate (BER) expressions for the MMSE MIMO receiver based on the Gamma distribution approximation for the signal-to-interference-plus-noise ratio (SINR). The exact error performance of MMSE detection, possibly with successive interference cancellation (SIC), was investigated in [6], and closedform SER expressions for the matched filter (MF) detector in MIMO systems have recently been derived in [7]. Based on the union bound, asymptotic SER expressions for ML-MIMO systems decoder have been derived in [8] [9]. However, these results only provide an upper bound, which may be loose for scenarios involving a large number of antennas, so to characterize the performance of ML-MIMO decoders one must resort to computer simulations. When the targeted SER is low and the number of antennas increases, MC simulation becomes rapidly infeasible because of excessively long run times required to generate errors (rare events) in sufficiently large numbers for obtaining statistically significant results.
To overcome this limitation of naive MC simulation, this paper studies efficient simulation schemes based on importance sampling (IS) methods [10]- [12], which are particularly suitable for rare-events estimation problems. The IS methodology has been used for variance reduction in SER or BER simulations in a wide range of scenarios since the late seventies [13]- [17]. However, many digital communication researchers are still unaware of the potential benefits of IS techniques to characterize the statistical performance of digital communication systems, mostly due to the difficulty to find an adequate IS algorithm [18].
In IS, the samples are simulated from a proposal probability density function (pdf), different from the target pdf, with the goal of increasing the number of errors during simulation. Then, the samples are properly weighted using the ratio of the target and the proposal pdfs in such a way the resulting estimators are unbiased and consistent with the number of samples [12]. In IS, the key for a successful performance is in the choice of this proposal pdf. With a well-chosen proposal, IS can reduce the variance of the estimators by several orders of magnitude with respect to naive Monte Carlo (or equivalently IS can work at a similar performance by using significantly less number of samples). Many works in the literature are devoted for the choice of the IS proposal, using usually a mixture of components as proposal in multiple IS (MIS) [19] and through an iterative adaptation of this proposal in adaptive IS (AIS) [20]. In [21] a biased channel distribution is proposed for the simulation of orthogonal space-time block codes (OSTBCs) on Nakagami channels. The authors of [22] used IS for the estimation of the outage probability of multi-antenna receivers with generalized selection combining. In a different setup, IS has been also used for approximating outage capacity over generalized fading channels [23]- [25]. Adaptive importance sampling has been used in [26] for error rate estimation in multiple access systems, and a nested IS estimator to estimate the random-coding error probability of coded modulation has been proposed in [27].
A multiple importance sampling technique called ALOE ("At Least One rare Event") [28], has recently been applied to SER estimation in single-input single-ouput systems [29] transmitting non-square 2D constellations [30]. ALOE is extremely efficient to estimate the integral of a Gaussian in a region defined by a union of half-spaces, which is precisely the error event in a digital communications system. Conditioned to a transmitted symbol, an error occurs when the observation falls in a union of half-spaces or, equivalently, outside a given Voronoi region. The proposal in ALOE simulates the system conditionally on an error taking place, which makes it more efficient than other importance sampling techniques. However, ALOE requires a perfect knowledge of the Voronoi decision regions. This information is not available in MIMO systems, where the Voronoi regions may be determined by a very large number of unknown hyperplanes, which precludes a direct application of ALOE to ML-MIMO decoders. A simple alternative that uses just a few hyperplanes obtained from the closest points to a given lattice point has been explored in [31], but it provides biased estimators especially at low SNRs.
In this paper, we propose a novel IS-based method for rareevent estimation, called defensive adaptive ALOE or DA 2 LOE, particularly suited for SER estimation of the ML-MIMO decoders. DA 2 LOE is a MIS method with a carefully designed mixture proposal. In particular, we consider a proposal with K + 1 components that are truncated versions of the targeted distribution (this truncation is key for a more efficient use of the computational budget). In this mixture, K components are defined in a region that guarantees that each sample produces at least one error (unlike in naive Monte Carlo, where especially at medium-high SNR, most samples produce zero errors), taking into account the K closest neighbors to the transmitted signal in the transformed lattice. The (K + 1)-th proposal is designed in such a way the estimator is unbiased, ensuring that all integration area is covered. DA 2 LOE automatically selects the number of components, their parameters, and the component weights in the mixture, all by taking into account the dimension (number of antennas) and the SNR. As a consequence, DA 2 LOE is a flexible, easy-to-use, interpretable, and high-performance method that requires very limited tuning.

A. Outline
The outline of this paper is as follows. In Section II we present the ML detection problem in MIMO systems. In Section III we briefly describe standard Monte Carlo, importance sampling, and multiple importance sampling methods. The proposed algorithm DA 2 LOE is presented in Section IV. Simulations results are provided in Section V. The paper is concluded with a final discussion in Section VI.

B. Notation
In this paper, matrices are denoted by bold-faced upper case letters, bold-faced lower case letters denote column vectors, and scalars are denoted by light-face lower case letters. A real matrix of dimension M × N is denoted A ∈ R M ×N and x ∈ R M indicates that x is a real vector of dimension M . The notation x ∼ N M (μ, R) indicates that x is an M -dimensional Gaussian random vector of mean μ and covariance matrix R and x ∼ CN M (μ, R) is used for complex normal vectors. E[·] represents the expectation operator and the superscript (·) denotes transpose.

II. MIMO DETECTION
We consider a multiple-input multiple-output (MIMO) spatial multiplexing system with T transmit and R receive antennas, where the channel is unknown at the transmiter but perfectly known at the receiver side. We assume R ≥ T . The received signal follows the well-known baseband model where H is an R × T matrix whose columns represent the known complex channel gains from each transmit antenna to the R receive antennas, s = (s 1 , . . . , s T ) is the vector with the unknown transmitted symbols, and n = (n 1 , . . . , n R ) is the noise vector. The noise is modeled as n ∼ CN R (0, σ 2 I). Each symbol s belongs to a discrete constellation that we will assume to be a square M -QAM (Quadrature Amplitude Modulation) signal. The average energy per symbol is normalized so that E[|s k | 2 ] = 1. We will find convenient to reformulate the problem in terms of real variables as follows (2) Then, without loss of generality we can focus on the system model given by (1) but with real vectors and matrices, in this case the symbols take values on a finite alphabet of integers, n ∼ N 2R (0, σ 2 2 I), and the vector s ∈ D 2T belongs to a 2Tdimensional square real lattice with cardinality |D 2T | = M T . For instance, if the number of antennas is T = 10 and we transmit 64-QAM symbols, the number of lattice points is 64 10 ≈ 1.15 · 10 18 .
The MIMO detection problem consists of estimating the symbol vectorŝ ∈ D 2T in (1) that minimizes the symbol error probability. Under white Gaussian noise, the optimal maximum likelihood (ML) decoder is obtained by solving an Integer Least Squares problemŝ Note that this problem is equivalent to constructing the Voronoi diagram with all transformed points Hs with s ∈ D 2T , and estimatingŝ as the associated point to the region where x falls. Maximum likelihood decoding therefore amounts to finding the closest lattice point Hs to a given noisy observation, x. This problem is known to be NP-hard for generic channels [32], [33] meaning that the problem complexity is exponential on the dimension of the lattice [34]. Using the Finke-Phost spheredecoding (SD) algorithm [35], ML detection can be achieved at an average complexity that is cubic in the number of transmit antennas, as was shown in [36]. Later Agrell et al. showed that the Schnorr-Euchner (SE) enumeration strategy [37] reduces the complexity of SD algorithms in comparison to [35], and very efficient SD implementations have been proposed in [38]. Our goal in this work is to develop efficient symbol error rate (SER) estimation techniques for ML MIMO detection. Although we focus on SER estimation, all methods in this paper can be generalized to estimate the uncoded bit error rate (BER).

A. Problem Statement
Let us assume that s q is transmitted and let H q be its associated Voronoi region, which is a polytope defined by the intersection of finitely many hyperplanes in R 2R . The observation x = Hs q + n is the 2R-dimensional lattice point distorted by the MIMO channel and perturbed by additive Gaussian noise. Assuming that all symbols are transmitted with equal probability, the symbol error probability (SER) is where is the conditional SER when s q is transmitted, and T j,q is the number of complex symbol errors when s q is transmitted and s j is detected. Each term Prob(Hs q + n ∈ H j ) is the integral of a Gaussian distribution centered at Hs q ∈ R 2R in the polytope region H j . Note that transmitting s q and deciding s j can result in a different number of complex symbol errors T j,q ∈ {1, . . ., T }; therefore each term in (5) is weighted by T j,q /T . If we weight each term in (5) by the ratio of uncoded bits in error instead of T j,q /T , then we would get an estimate of the uncoded BER. The integrals required to evaluate (5) are in general difficult to compute in closed-form, and therefore Monte Carlo simulation is typically used to estimate the SER. However, naive Monte Carlo simulation (also called crude Monte Carlo) can be very inefficient and requires very long simulation runs, especially at high signal-to-noise ratios or when either the constellation size, M , or the MIMO configuration, R × T , is large. Our goal thus is to develop efficient sampling schemes to estimate p e|s q in (5).

A. Standard Monte Carlo
For notational simplicity let us denote the transmitted vector as s, with associated decision region H ⊂ R 2R , and let p e p e (s) be the conditional SER of interest, which can be expressed as where π(x) N 2R (Hs, σ 2 2 I) is the Gaussian distribution of the observation, and the function h(x) = (x)/T is the ratio of wrongly detected symbols, i.e., (x) is the number of symbol errors when s is transmitted and x is received. Note that h(x) = 0 when x ∈ H, i.e., when it falls in the Voronoi decision region for s, whereas 1/T ≤ h(x) ≤ 1 when x ∈ H. The standard Monte Carlo procedure consists on simulating N i.i.d. samples from the targeted distribution and approximating Eq. (6) as This procedure is reasonably efficient when h(x n ) > 0 for most samples. Standard Monte Carlo is known to be very inefficient for rare events estimation [10], [11], i.e., when p e is small, as it happens at high SNR in MIMO detection.

B. Importance Sampling
Importance sampling (IS) is a Monte Carlo technique used when sampling from π(x) is either not possible or not efficient [11], [12]. The N samples are simulated instead from a so-called proposal distribution, q(x), and the estimator of p e is built as Note that the particular case with q(x) = π(x) = N 2R (Hs, σ 2 2 I) yields the standard Monte Carlo estimator of Eq. (7). In this way, IS can also be seen as a generalization of standard Monte Carlo. The key of IS is in the appropriate choice of the proposal q(x). The variance of (8) is and the optimal proposal, which achieves zero variance, is given by . The optimal proposal requires to know p e , which is what we wish to estimate, hence it is rarely available in practice [11], [12]. Intuitively, the variance of the IS estimator increases when too many samples from q(x) fall in H, and hence h(x) takes value zero. This is exactly what happens with the MC estimator when σ 2 decreases, making MC simulation infeasible to estimate very low error probabilities.

C. Multiple Importance Sampling and the ALOE Algorithm
Multiple importance sampling (MIS) is a natural extension of IS allowing for the simulation from a set of J proposals, {q j (x)} J j=1 , instead of just one [19]. However, the extension from one to several proposals is not straightforward and many sampling and weighting schemes can be devised (see [19] for a review). A conventional way to proceed is to simulate from the mixture proposal as where α = [α 1 , . . ., α J ] is a simplex vector with all nonnegative weights in the mixture such that J j=1 α j = 1. The MIS extension of Eq. (8) is where the importance weight is now T is the ratio of symbol errors when x is received. The region H can be described as the union of all half-spaces in R 2R generated by the hyperplanes that define the border of H.
A natural way to adapt a MIS scheme to our problem is to follow the choice of proposals in [28] called ALOE for "At Least One rare Event". In ALOE the number of proposals is the number of hyperplanes that define the polytope H. Let V = {v j } J j=1 be the set containing the J neighbors of Hs, where we define a neighbor of Hs as a point in the transformed (by the MIMO channel) lattice whose Voronoi region shares a border with the Voronoi region of Hs. Each proposal in ALOE is a truncated version of a Gaussian centered at Hs beyond a hyperplane that defines the border between Hs and its j-th neighbor v j .
More specifically, q j (x) = Under white Gaussian noise, it is a simple exercise to show that the integral in the half-space H j concides with the integral in the half-space the Euclidean distance between the j-th neighbor v j and the transformed symbol Hs. Since the noise variance is σ 2 /2 then is the Q-function or tail distribution of the standard normal pdf, and Φ(·) is the associated cdf.
The union set of the J proposals in ALOE covers the integration area, therefore the ALOE estimator is unbiased. Note also that each draw from the mixture x n falls in a region where (x n ) > 0, so there is at least one symbol error per draw.
In ALOE, the samples x n , n = 1, . . ., N, are independently simulated from q α . The procedure for the efficient simulation from a truncated Gaussian distribution is described in [28] and summarized in Appendix I.
Substituting the ALOE proposals in Eq. (11) yields the MIS estimator We first define p = J j=1 P j , which is a union upper bound of p e . The weights in the mixture proposal in Eq. (10) are chosen as α j = P j /p, for j = 1, . . ., J (see [28] for a justification). Then, the ALOE estimator is where C(x n ) = J j=1 I H j (x n ) is the number of half-spaces H j where x n is present. ALOE is unbiased, and its variance can be bounded as i.e., the bound of the variance decays with the number of samples, N , and the scale factor depends on the true probability, p e . It is straightforward to understand the high performance of ALOE for high SNR scenarios for which p e is very small. A deeper theoretical analysis of ALOE can be found in [28].
ALOE has recently been used for SER estimation of singleinput single-output (SISO) channels with non-square twodimensional constellations in [29]. With two-dimensional lattices or constellations the Voronoi regions are determined by just a few hyperplanes, which can easily be computed, and hence an exact implementation of ALOE is possible. However, for MIMO systems the number of hyperplanes J determining a Voronoi region can be very large. Therefore, finding all neighbors V = {v j } J j=1 of Hs (i.e., those whose associated polytopes share one hyperplane with the polytope that encloses H) for each lattice point under test, Hs, is in general not feasible. The best we can hope for is to select, applying a SD-like algorithm, a set C K = {c k } K k=1 containing the K closest points to Hs in the transformed lattice. We form in this way a new estimator that we call ALOE-K where α k = P k , for k = 1, . . ., K, as in ALOE and p (ALOE-K) = K j=1 P j . For ALOE-K to be unbiased, the proposal support is missed, then the mixture proposal will not cover the whole integration area and hence ALOE-K will be negatively biased with i.e., p (ALOE-K) e underestimates the probability of error. In the next section we propose an alternative MIS estimator that incorporates a defensive component in the mixture to recover the unbiasedness of the estimate, while adaptively selecting the number of hyperplanes K.

IV. DEFENSIVE ADAPTIVE ALOE
Here we present the defensive adaptive ALOE (DA 2 LOE) algorithm with three important features. First, DA 2 LOE is unbiased thanks to the introduction of an extra proposal in the mixture that plays a similar role as the defensive component in defensive IS [39]. However, in DA 2 LOE the defensive proposal is not the targeted distribution as in conventional defensive IS, but a more efficient distribution that takes into account the geometry on the problem as well as the rest of proposals already included in the mixture. Moreover, the weight of this proposal in the mixture is also automatically set. Second, in addition to the defensive component the proposal mixture in DA 2 LOE is composed of K Gaussians truncated beyond a hyperplane, as in ALOE. Finally, DA 2 LOE allows for an automatic choice of K jointly with the design of the aforementioned defensive proposal.

A. Mixture Proposal
The DA 2 LOE mixture proposal is with K + 1 components q k (x) and non-negative associated mixture weights α k . For each lattice point under test, Hs, we choose its K + 1 nearest neighbors C K+1 = {c k } K+1 k=1 , defined in decreasing order of Euclidean distance, e.g., c 1 is the closest point to Hs (below we discuss the choice of K jointly with the other parameters). The sphere decoding algorithm [38] can be applied to find the K + 1 nearest neighbors, C K+1 = {c k } K+1 k=1 , that belong to a hypersphere of appropriate radius centered at Hs. To this end, an increasing radius search procedure can be applied [40]. The K + 1 DA 2 LOE components are selected as follows.
r K ALOE-like components: We design the set of proposals {q k (x)} K k=1 similarly as in ALOE. Each proposal q k (x) is a truncated version of a Gaussian centered at Hs beyond each hyperplane that defines the pairwise border between Hs and its k-th neighbor c k . In Appendix I, we describe the procedure to both simulate from q k (x) and find its associated normalizing constant P k .
r One defensive component: The defensive proposal, q K+1 (x), is designed as follows. Let c K+1 be the (K + 1)th closest point to Hs at Euclidean distance d K+1 . Note that c K+1 is the next closest point after the K neighbors that determine the ALOE-like proposals. Then, the defensive proposal is a truncated Gaussian distribution conditionally on ||x − Hs|| 2 > d K+1 /2, i.e., where is the outside of a hypersphere of radius d K+1 /2 and P K+1 = S K+1 π(x)dx is its normalizing constant. The choice of the defensive proposal ensures that the mixture covers all the support where the integrand of Eq. (6) takes values different from zero, and hence DA 2 LOE is unbiased.
In Appendix Ic, we describe how to sample from a multivariate normal truncated outside a hypersphere, as well as the computation of P K+1 .

B. Algorithm, Choice of Mixture Weights, and Estimator
Algorithm 1 summarizes the DA 2 LOE method. In Step 1 (Proposal adaptation), DA 2 LOE algorithm starts with the defensive component in the mixture proposal, and iteratively incorporates ALOE-like components. The choice of K is done by sequentially adding points in increasing order of Euclidean distance until a stopping rule is fulfilled, forming the set C K+1 with the K + 1 closest points. Since the distances to the neighbors d 1 ≥ d 2 ≥ · · · ≥ d K are increasing, and P k = Q(d j /( √ 2σ)) is a monotonically decreasing function, then P 1 ≥ P 2 ≥ · · · ≥ P K . The algorithm stops incorporating proposals when the K-th component has received a small mixture weight, i.e., α K < γ, where γ is a threshold chosen a priori (see the discussion in Section IV-D). In Step 2 (Sampling), N independent samples, {x n } N n=1 , are simulated from the designed mixture proposal q α (x) as in Eq. (19). The standard importance weights are computed in Step 3 (Weighting). Finally, the (unnormalized) IS estimator is built in Step 4. Fig. 1 displays a graphical representation of DA 2 LOE and naive Monte Carlo.
Substituting the choice of the mixture weights in Eq. (22), the estimator can be written as where we recall that p (ALOE-K) = K k=1 P k and C(x n ) = K k=1 I H k (x n ) is the number of half-spaces H k where x n is present.

C. Analysis of DA 2 LOE
In the following we discuss the unbiasedness of DA 2 LOE and provide an upper bound for its variance.
Proof: See Appendix II-B for a proof. The bound in (25) provides useful insights on the choice of K and α K+1 that are further discussed in the next section.
We now analyze the variance of ALOE-K in Eq. (16). Let us first consider the case when the support of the mixture proposal covers the integration area (i.e., when Q K ] < p e is the expectation of the ALOE-K estimator (which is negatively biased in this case).
Regarding DA 2 LOE, we recall that P K+1 is the integral of the Gaussian out of a hypersphere with a radius that depends on K. When K grows, P K+1 gets smaller. Then, in the bound of (25), for a fixed α K+1 , we can make the term find c k and c k+1 , the k-th and (k + 1)-th closest points to Hs, at distances d k and d k+1 , respectively 5: define the half-space compute P k = I H k (x)π (x)dx, i.e., the integral of the target distribution in the half-space H k 7: Compute P k+1 = I S k+1 (x)π (x)dx, the integral of the target distribution outside the hypersphere S k+1 , and set α k+1 = min(P k+1 , ξ) 8: recompute the mixture weights 9: 10: end while 11: Set K := k as the final number of ALOE proposals 12: 2. Sampling. Simulate independently 13: 3. Weighting. The importance weight of the n-th sample is . (28) 14: 4. Estimator. The IS estimator is given by α K+1 is still negligible with respect to the second term in (25). In summary, by increasing K and carefully reducing α K+1 , we can cover better the integration area with the K ALOE-like proposals, while allowing us to reduce the weight of the defensive component. In this scenario, the bound in (25) is dominated by its second term as First, note that by increasing K, we would eventually cover the whole support of the integrand with the K ALOE-like proposals, and then we could get V (DA 2 LOE) arbitrarily close to V (ALOE) by reducing α K+1 . In other words, we would recover ALOE as a limiting case, since in this scenario, ALOE-K becomes ALOE.
Moreover, in this case, V (DA 2 LOE) can be re-written as The constant multiplying in the right-hand side of Eq. (32) can be interpreted as the price to pay in DA 2 LOE to ensure that the estimator is unbiased. Moreover, we note that (25) can be a loose bound, and in practice the MSE in DA 2 LOE will be in general smaller than in ALOE-K, especially at low SNR where the bias of ALOE-K is particularly significant (see the numerical validation in Section V). The extra cost per sample of DA 2 LOE compared to ALOE-K is negligible, and it is associated to checking whether x n is outside of the hypersphere that defines the defensive component. The variance of the naive Monte Carlo estimator is exactly (see for instance [11]). ALOE-K guarantees a variance reduction with respect to naive Monte Carlo when p (ALOE-K) < 1, which happens at medium/high SNR for a reasonable choice of K. Following similar arguments, when the choice of K and α K+1 makes the first term of (25) negligible, a variance reduction in DA 2 LOE is guaranteed if p (ALOE-K) 1−α K+1 < 1 (and, since the estimator is unbiased, also a MSE reduction is ensured). In practice, for most ranges of SNR, DA 2 LOE largely outperforms naive Monte Carlo in terms of variance for the same N , which is equivalent to say that in general it can obtain better results with much fewer samples.

D. Adaptive Choice of K and α K+1
The number of ALOE-like components in the mixture, K, and the weight assigned to the defensive component, α K+1 , determine the accuracy of the estimator as well as its complexity. In addition, at high SNR it is expected that only a few closest symbols (hyperplanes) will contribute to the conditional symbol error probability [41], so both K and α K+1 should be chosen as a function of the noise variance. Finally, note that DA 2 LOE has only two parameters, ξ and γ, that are closely connected to the choice of K and α K+1 .
First, note that α K+1 modulates the importance of the spherically truncated proposal, and balances the tradeoff between defensiveness (to give enough mass to regions not covered by the ALOE proposals) and efficiency loss (to not give too much mass to already covered regions or to regions in H). For moderate/high SNR, the non-covered probability mass is smaller, and it is possible to reduce the weight of the defensive proposal. A sensible choice is α K+1 = P K+1 . In this way, we guarantee that the mixture proposal pdf q α (x) ≥ π(x) for those points not covered by at least one of the K ALOE-like proposals, i.e., for all {x : ||x − Hs|| 2 > d K+1 /2}. However, for high-dimensional problems P K+1 can be very close to one (even for high K), which can compromise the efficiency of DA 2 LOE when most of the mass is assigned to the defensive proposal. For this reason, we propose to limit the defensive weight α K+1 ≤ ξ. For higher dimensions, P K+1 needs a higher SNR to drop significantly from 1, and hence other less-defensive strategies could be devised. Regarding, the choice of γ, which controls the adaptation of K, we recommend it to be selected on the order of magnitude of 1/N . The rationale is that when α K is smaller than 1/N , on average the proposal would be chosen to simulate less than one sample out of the N random draws, i.e., adding more components in the mixture would not significantly change it. We recall that the use of the defensive proposal attenuates the impact of missing potentially relevant hyperplanes/proposals, making DA 2 LOE unbiased.

V. SIMULATION RESULTS
In this section, we study the performance of the proposed MIS estimators for MIMO systems with M -QAM transmitted signals by means of Monte Carlo simulations. We consider Rayleigh MIMO channels with i.i.d. zero-mean and unit variance complex Gaussian entries h ij ∼ CN (0, 1). In each simulation, the SER for a given E b /N 0 is estimated as follows: 1) Q codewords are generated by sampling uniformly the 2T dimensional squared real lattice defined by the M −QAM constellation: s q ∈ D 2T , q = 1, . . . , Q. 2) For each noiseless received codeword, Hs q , we draw N samples from the proposal distribution x q,n ∼ q(x), n = 1, . . . , N. For the standard naive Monte Carlo algorithm, q(x) = N 2R (Hs q , σ 2 2 I). For ALOE-K, q(x) is the K component mixture proposal in (16). For DA 2 LOE, q(x) is the K + 1 component mixture proposal in (23), where the last term of the mixture corresponds to the defensive proposal. 3) We apply sphere decoding (SD) or any other approximate ML detection to each received vector. When the number of antennas is large, even efficient SD implementations can be computationally costly. For this reason, we apply instead an approximate fixed-complexity ML-decoding algorithm that takes advantage of the fact that the number of visited lattice points, Q, is in general much smaller than the total number of received vectors NQ. The idea is the following: for each of the Q visited lattice points, Hs q , we find its S nearest neighbors using a SD-like algorithm. In our simulations we take S = 300. The approximate ML-decoding algorithm then finds for each noisy received signal, x qn ∼ q(x), the closest lattice point to x qn among the fixed-size subset of S nearest neighbors. Note that for both MC and DA 2 LOE it is necessary to run the SD algorithm, or its approximate version, for each of the samples generated. In addition, the computational complexity of SD increases with the number of symbols in the constellation. Therefore, for large constellations such as 1024-QAM, used for example for both 5G+ and beyond-WiFi-6 applications, a simulation scheme such as DA 2 LOE that provides unbiased, low-variance SER estimates with as few samples as possible is even more important.

4) Finally, the SER is obtained by averaging the conditional SER estimates as
where p e|s q is the conditional error probability when s q is transmitted, which is estimated as (7), (16), and (23) for naive Monte Carlo (denoted as MC), ALOE-K and DA 2 LOE, respectively. With the proposed simulation setup, the SER is estimated at each E b /N 0 with a total of N t = QN samples. For the proposed MIS estimators a number of samples on the order of N t ≈ 10 4 (we use in our simulations Q = 100 lattice points and N = 100 random samples of the mixture for each lattice point), allows us to obtain low variance estimates for error probabilities lower than 10 −10 . Similar accuracies cannot be reached with standard MC or previously proposed importance sampling schemes.
As figures of merit to characterize the statistical performance of the different estimators, we use the relative root mean squared error (RRMSE) defined as and the bias where P e is the true symbol error probability. The expectations in (34) and (35) are estimated by averaging the results of 500 independent simulations, whereas the true P e is estimated running ALOE with a high number of mixture proposals and draws per mixture (we set K = 300).

A. Performance Comparison: MC, ALOE-K and DA 2 LOE
In this subsection we compare the performance of naive Monte Carlo (denoted as MC in the figures), ALOE-K, and  Fig. 2 shows the SER curves for MC, ALOE-K with K = 10, K = 50, and K = 100 mixture components and DA 2 LOE. Fig. 3 shows (a) the RRMSE and (b) the bias of the estimates. Fig. 3(b) only shows the results of ALOE with K mixture components, since naive Monte Carlo and DA 2 LOE are unbiased estimators.
At low or moderate E b /N 0 , ALOE with K mixture components provides biased SER estimates. Note that the Voronoi regions could be determined by many more hyperplanes than the prefixed value K. In other words, the intersection of the K half-spaces does not cover the whole integrating area where the integrand takes significant values and hence the estimator is biased. At high E b /N 0 , however, the conditional symbol error probability has an asymptotic behavior proportional to Q( d min √ 2σ ), where Q(·) denotes the Gaussian Q-function and d min is the distance to the closest lattice point [41]. Therefore, in this regime only the closest hyperplanes contribute to the integral and ALOE-K is practically unbiased.
The defensive component of DA 2 LOE eliminates the bias at low E b /N 0 values, so DA 2 LOE and MC are both unbiased estimators. In fact, in this regime MC is almost optimal and, as we observe in Fig. 3(a), DA 2 LOE replicates its behavior. As the SNR increases, the weight assigned to the defensive component decreases and the number of components of the mixture varies as shown in Fig. 4. When the E b /N 0 is low, DA 2 LOE retains all possible hyperplanes (in this case K = 100), whereas when the E b /N 0 is high only a few hyperplanes are kept in the mixture. For intermediate values, the behaviour can be non-monotonic   due to the interplay between the defensive and the non-defensive components on the mixture.
In summary, the DA 2 LOE mixture proposal composed of an adaptive number of hyperplanes plus a defensive component gets unbiased and low-variance estimates with affordable computational cost over the whole E b /N 0 range.
The SER curves for MC and DA 2 LOE along with their confidence intervals are depicted in Fig. 5. The confidence interval has been magnified to [−20σ, 20σ] for visualization purposes. Even so, for high E b /N 0 values the confidence interval region of DA 2 LOE is so small in comparison to that of the MC estimate that it is not visible in the figure.  Fig. 7. Finally, Fig. 8 shows the SER curves for MC and DA 2 LOE along with their confidence intervals. Again we see the good performance of DA 2 LOE, particularly good for the challenging range of high SNR values.
Let us finally remark that DA 2 LOE estimates the SER for a fixed MIMO channel realization, regardless of its structure (e.g., correlation between antennas) or fading distribution [42]. Its final performance and computational complexity, however, may depend on the channel characteristics. For instance, the number of hyperplanes that determine the decision regions increases as the number of antennas grows, or as the correlation between the matrix entries increases. This, in turn, increases the complexity of DA 2 LOE, since more hyperplanes need to be included in the mixture to ensure a satisfactory performace. Note also that DA 2 LOE is specifically designed for ML MIMO detectors. If a linear MIMO decoder, followed by a componentwise decisor,  is used, then the decision regions are hypercubes determined by a fixed number of 2 T hyperplanes. Consequently, for efficient SER estimation of linear MIMO detectors there is no need to use DA 2 LOE and the simpler ALOE-K estimator can be used instead.

VI. CONCLUSION
In this paper we have proposed an efficient SER estimator for ML-MIMO decoders. The so-called DA 2 LOE generates N samples from a mixture of K + 1 truncated Gaussians: each of the first K components is a Gaussian truncated to a half-space that ensures that each random sample produces at least one error, while the last K + 1 component is a Gaussian truncated to the outside of a hypersphere and acts as a defensive component that makes the estimator unbiased. The advantage of DA 2 LOE is that all samples from the first K components of the mixture produce at least one symbol error, which is a key feature of the algorithm, particularly for high SNR when most samples in naive Monte Carlo would give zero errors. DA 2 LOE is unbiased, and our simulations show that it provides low variance estimates for SER values lower than 10 −8 with as few as N = 10 4 samples, largely reducing the computational cost with respect to naive Monte Carlo. The paper has focused on SER estimates for a fixed MIMO channel realization. Nevertheless, the SER performance of fading MIMO channels could be analyzed by first sampling a channel realization from the fading distribution (probably using some IS methodology adapted to the fading distribution) and then applying DA 2 LOE. In this case, more efficient nested IS procedures might be worth studying. Another future line of research is to adapt DA 2 LOE to characterize the BER performance of coded modulations. It would be also possible to design other suitable defensive proposals that could be sequentially adapted.

A. Simulation From a Truncated Gaussian N 2 L(0, I) in a Half-Space
The proposals in the MIS implementation are truncated Gaussian distributions. Let us first describe the simulation of a truncated Gaussian π(x) = N 2R (x; 0, I) in the half-space described . It first proceeds by simulating the sample x from the complementary half-space (i.e., x ω < τ), and then delivering the −x for numerical stability. The algorithm described in [43] proceeds as follows: 1) Simulate z ∼ N 2R (0, I) 2) Simulate u ∼ U(0, 1) 3) Let y = Φ −1 (uΦ(−τ )), where Φ(·) denotes the cumulative distribution function for the standard Gaussian 4) Let x = ωy + (I − ωω T )z 5) Output x := −x Interestingly, Z = Φ(−τ ) = I x ω≥τ (x)N 2 L (x; 0, I)dx is the normalizing constant of the distribution. This normalizing constant is used by the ALOE-K and DA 2 LOE algorithms.

B. Extension to a Generic Truncated Gaussian N 2R (μ, Σ) in a Half-Space
When the target distribution is a generic truncated Gaussian N 2R (μ, Σ) that must be integrated over the union of the halfspaces described by the hyperplanes {γ k , β k } K k=1 , one can transform the problem as follows. First, we describe K half-spaces with the hyperplanes x ω k ≥ τ k , where for k = 1, . . . , K. Then, a Gaussian N 2R (0, I) is integrated over the union of the transformed half-spaces as described in previous section.

C. Simulation From a Truncated Gaussian Outside a Hyper-Sphere
We denote the defensive DA 2 LOE proposal as where π(x) = N 2R (x; μ, η 2 I), and is the region outside of a hypersphere of radius a (with a = d K+1 /2 ind DA 2 LOE) and Z = S(a) π(x)dx is its normalizing constant. We can efficiently sample from q(x) by using standard methods used in directional statistics [44]. We proceed as follows: 1) Simulate r ∼ χ 2 2R (r|r > a), i.e., from the truncated χ 2 distribution with 2R degrees of freedom, defined in the interval (a, ∞). This can be efficiently done as follows: r Find c as the cdf of the distribution χ 2 2R at ( a η ) 2 (e.g., using in MATLAB the function chi2inv).
r Apply a modified inverse transform sampling by first sampling uniformly u ∼ U(c, 1), and then obtaining r as the evaluation of u in the inverse cdf of χ 2 2R . 2) Simulate z ∼ N 2R (0, I) of dimension 2R.

A. Unbiasedness of DA 2 LOE
The expectation of the DA 2 LOE estimator, from Eq. (22), is given by where the expectation is with respect to the mixture proposal q α , with support Q K+1 = K k=1 H k S K+1 . Hence, from (39), where we have used that where we have used H ⊆ Q K+1 by construction of the proposal, and h(x) = 0 for all x ∈ Q K+1 \ H.

B. Variance of DA 2 LOE
Let us express the variance of the 1-sample DA 2 LOE estimator (N = 1, for N > 1, simply divide by N ) as where w(x) = 1 Next, we make use of an auxiliary variable z, where z = 0 whenever the sampling is done from the defensive proposal, and z = 1 otherwise (from the ALOE-like proposals). Hence Next, since q K+1 (x) = I S K+1 (x)P −1 K+1 π(x), where we have used that h(x) ≤ 1 and that, when z = 0, the sample is at least present in S K+1 (and could be also present in the K half-spaces H). Next, where we have used again that h(x) ≤ 1. Then, we use [28, Theorem 2], establishing where p (ALOE-K) ]. Note that p (ALOE-K) e ≤ p e since ALOE uses a subset of the set of half-spaces whose union defines the integration set H. Then, = p e p (ALOE-K) Finally, we are able to bound the variance as