Approximation to uniform distribution in SO(3)

Using the theory of determinantal point processes we give upper bounds for the Green and Riesz energies for the rotation group SO(3), with Riesz parameter up to 3. The Green function is computed explicitly, and a lower bound for the Green energy is established, enabling comparison of uniform point constructions on SO(3). The variance of rotation matrices sampled by the determinantal point process is estimated, and formulas for the L2 -norm of Gegenbauer polynomials with index 2 are deduced, which might be of independent interest. Also a simple but effective algorithm to sample points in SO(3) is given.


Introduction and Results
In this paper we study properties of a finite collection of randomly generated points in SO(3), the rotation group of 3-dimensional Euclidean space, sampled by determinantal point processes (dpp). It turns out that they tend to be well distributed, a property that is important for discretization, integration and approximation. Our goal is not to compute actual collections of evenly distributed rotation matrices, but rather to provide a comparison tool that allows to decide the effectiveness of any given method.
If one is given an algorithm to generate finite (but arbitrarily large) collections of matrices, common methods to measure how well distributed these are, include either calculating some discrete energy of them or looking at the speed of convergence of the counting measure towards uniform measure. Most work in this direction has been done on spheres of various dimensions, see for instance [7], etc.; the particular question of finding collections of points with very small energy was posed by Shub and Smale in [24] and is nowadays known as Smale's 7th problem [25].
In order to extend part of the work done on spheres to the context of rotation matrices, we will obtain bounds on various energies for points generated through the method of dpp, which are technically speaking counting measures where one identifies them with their set of atoms. In few words, such a process is obtained by taking a Hilbert space H(X) of an underlying measure space (X, µ) and an N -dimensional subspace V ⊂ H(X), with projection kernel K onto V -then, under mild conditions on X, one is guaranteed almost surely the existence of such a process with N distinct points in X associated to K.
The theory of those processes has been developed in [8]; there one also finds a pseudo-code which samples points based on the dpp -which seems hard to implement. A main feature of the underlying points is that they tend to "repel" each other, and hence have become the theoretical basis of construction of well-distributed points on various symmetric spaces, see for instance [2,6,7,22].
Since one can sometimes compute the expected value of the energy of points coming from these processes with high precision, they have been used as a tool to understand the asymptotic properties of the discrete energy in that context; and in particular, for even dimensional spheres with exception of the usual 2-sphere, the best known bounds for some energies have been proved using this approach.
We will employ the same method for SO (3), considering first the (discrete) Riesz s-energy for A = {α 1 , . . . , α N } ⊂ SO (3): with α j being thought of as rotation matrices, · F being the Frobenius or L 2 -norm, and s ∈ (0, 3]. In contrast to this, the continuous Riesz s-energy is given by replacing the double sum by the double integral over SO (3). We further set The investigation of these sums is very popular and results describe the behavior of the two leading terms. This seems particularly interesting in case s equals the dimension, where we have following result. for L ∈ N, then the Riesz 3-energy satisfies 12 √ 2π · E 3 R (N ) ≤ N 2 log(N ) + 3γ + log(8 2 · 6) − 21 4 N 2 + O(N 5/3 log(N )).
The right-hand side is the expected value of the Riesz 3-energy with underlying points generated by a dpp. Now, given any particular method of generating finite point sets in SO(3), one can compute, numerically, their 3-energy and compare it to the value above to decide if the points are evenly distributed. This comparison would clearly rise in significance at the presence of lower bounds on the 3-energy, which do not seem easy to find. For this reason we turn our attention to the Green energy, where we succeeded in this endeavor.
To recap, a Green function G L for a linear differential operator L is an integral kernel to produce solutions for inhomogeneous differential equations and is unique modulo kern(L). In our case, we deal with the Laplace-Beltrami operator ∆ g , and note that kern(∆ g ) is the set of harmonic functions -which are just constants on a compact Riemannian manifold (M, g). We will construct G = G ∆g in such a way, that it integrates to zero and speak of the Green function.
The (discrete) Green energy for A = {α 1 , . . . , α N } ⊂ SO(3) will be given by and we let It is noteworthy that G(α, β) · d(α, β) ≈ 1 for α close to β in geodesic distance d(·, ·), and a set of points with small Green energy is hence expected to be well-distributed, which is indeed the main result in [5]: We know that if {α 1 , . . . , α N } attains the minimal possible energy, then the associated discrete measure approaches the uniform distribution in SO(3) as N → ∞. A set of points with small Green energy is also expected to be well-separated, see [9]. Now, G(·, β) is for any β ∈ SO(3) a zero mean function by definition, and if α 1 , . . . , α N were simply chosen uniformly and independently in SO(3), then the expected value of the Green energy would equal 0, so in particular we have E G (N ) ≤ 0. In this note we prove the following much stronger result.
The right-hand side is the expected value of the Green energy with underlying points generated by a dpp, and that is where we have the restriction for N , as the process is related to subspaces V that we can project onto. The lower bound is valid for all N .
As mentioned above, another classical measure of the distribution properties of α 1 , . . . , α N is the speed of convergence to uniform measure, i.e. choosing some range sets {A j } j∈I measurable w.r.t. Haar measure µ and investigating the behavior of sup as N grows large. We will tackle this problem probabilistically, where we turn the count of points in A j into a random variable.
In analogy to spherical caps on spheres, the range sets for SO(3) will be chosen to be balls B(α, ε) := {β ∈ SO(3) : ω(α −1 β) < ε} for ε ∈ 0, π 2 and ω(·) being the rotation angle distance introduced in the following sections. For given random points α 1 , . . . , α N , we define random variables via characteristic functions Now, for a collection of random uniform points, chosen independently in SO (3) we have E[η α,ε ] = N µ(B(α, ε)) = N µ(B(1, ε)), and the variance can also be computed from the independence of the points: We are able to bound the variance of this quantity for our dpp, proving that it is much smaller than in the previous case.
for L ∈ N, and ε ∈ 0, π 2 be fixed, then the points generated by our determinantal point process satisfy E[η α,ε ] = N µ(B(α, ε)) = N µ(B(1, ε)), and moreover From Theorem 1.3 and for any fixed ε, we then have by Chebyshev's inequality for example, letting T = N 1/3 log(N ) and with some little arithmetic we obtain sup α∈SO(3) In other words, for large N the counting and Haar measures are very similar with large probability.

Introductory Concepts
In this section we collect some definitions and previous results that we will use and that intend to make this manuscript reasonably self-contained. Proofs and definitions of Chebyshev polynomials and alike are postponed to subsection 2.4.

Structure, distances and integration in SO(3)
The special orthogonal group SO(3) is the compact Lie group of 3 by 3 orthogonal matrices over R that represent rotations in R 3 , i.e. with determinant equal to one. Its exponential map is given by Rodrigues' rotation formula, and a closed expression for the Baker-Campbell-Hausdorff formula has been derived in [13]. It is a 3 dimensional manifold and since it is naturally included in R 9 it is customary to let it inherit its Riemannian submanifold structure. Following [16], using Euler angles (ϕ 1 , θ, ϕ 2 ) ∈ [0, 2π) × [0, π] × [0, 2π), every element R ∈ SO(3) can be decomposed as R = s z (ϕ 1 )s x (θ)s z (ϕ 2 ) where The Riemannian distance associated to the structure of SO(3) is certainly a natural and useful concept, but for us it will be more convenient to use the so called rotation angle distance defined as follows: for α, β ∈ SO (3), Its convenience stems from following fact, see for example [16, page 173]: Given a function f ∈ L 1 (SO(3)) such that we can findf ∈ L 1 ([0, π]) with f (x) =f (ω(x)), then

Laplace-Beltrami operator and Green function in SO(3)
The Laplace-Beltrami operator ∆ g is defined on any Riemannian manifold (M, g) in terms of the Levi-Civita connection. Following [11], if γ 1 (t), . . . , γ n (t) is a set of geodesics in an n-dimensional manifold such that γ j (0) = p ∈ M for all 1 ≤ j ≤ n, and such that {γ j (0)} form an orthonormal basis of the tangent space T p M (geodesic normal coordinates), then the action of ∆ g on C 2 -functions f at p is given by Note the convention given by the minus sign in front of the sum, which sometimes leads to confusion given the Laplacian in R n . The convention we use here is widely accepted, see for example [19]. A Green function G = G ∆g is a distributional solution to ∆ g G(·, y) = δ(·, y) − 1 µ dV (M) .
This way defined it is unique modulo kern(∆ g ) and it is common practice to add a constant in such a way that for all y ∈ M the function G(·, y) has zero mean, see [4]. We use this convention and simply refer to G as the Green function. It further follows from classical Fredholm theory for a linear differential operator L that where 0 = λ 0 < λ 1 ≤ λ 2 ≤ · · · is the sequence of eigenvalues for L and {φ j }, j ≥ 1 is a complete orthonormal set of associated eigenfunctions. This is hence true locally on any manifold, and the expression we obtain will be independent of any particular chart, thus valid globally. In the case M = SO(3), geodesics are dealt with in [21], the eigenvalues and eigenfunctions of ∆ g are known from the classical theory of continuous groups and have been intensively studied in the physics literature, see [16,18], [28, §15]: Moreover, if H is the eigenspace associated to λ , then the dimension of H is (2 + 1) 2 and an orthonormal basis of H is given by √ 2 + 1D m,n where − ≤ m, n ≤ and D m,n are Wigner's D-functions.
Moreover we have, see [18,Eq. 4.65] or [27, pp. 40-41] for a nice summary: ( where U 2 (x) is the Chebyshev polynomial of second kind and degree 2 . The following simple form for the Green function is derived, and to the best of our knowledge, this is the first time it has been formulated.

Lemma 2.2. The Green function for the Laplace-Beltrami operator on SO(3)
can be written in terms of the metric ω, i.e. for α, β ∈ SO(3) with α = β:

Determinantal point processes
We point the reader to the excellent monograph [8] for an introduction to point processes, and we briefly summarize part of this material below. As in [7] and [6], we will use only a fraction of the theory. A simple point process on a locally compact Polish space Λ with reference measure µ is a random, integer-valued positive Radon measure η, that almost surely assigns at most measure 1 to singletons -we shall think of it as a counting measure η = j=1 δ xj , with x j = x s for j = s. One usually identifies η with a discrete subset of Λ.
The joint intensities of η w.r.t. µ, if they exist, are functions ρ k : Λ k → [0, ∞) for k > 0, such that for pairwise disjoint sets {D s } k s=1 ⊂ Λ, the expected value of the product of number of points falling into D s is given by and ρ k (y 1 , . . . , y k ) = 0 in case y j = y s for some j = s.
A simple point process is determinantal with kernel K 1 , iff for every k ∈ N and all y j 's where E g(x 1 , . . . , x N ) means expected value of some function defined from M×· · ·×M (N copies of M) to [0, ∞], when x 1 , . . . , x N are chosen from the point process associated to H; the orthogonal projection of f onto H satisfies: Note that if ϕ 1 , . . . , ϕ N is an orthonormal basis of H, then we can write and clearly Coming back to the case of interest and following ideas in [7], we choose as subspace H the span of the first eigenspaces of ∆ g . (3)) be the span of the union of eigenspaces for eigenvalues λ 0 , . . . , λ L of ∆ g . Then, we define Moreover 2 , the projection kernel is:

Chebyshev polynomials and proofs of lemmas
The degree n + 1 Chebyshev polynomials of first and second kind satisfy the recurrence relation n (x) of degree n and index λ appear any time rotation invariance plays a role, and can be defined for integer λ as multiples of derivatives of Chebyshev polynomials of the second kind; sufficient for us is the one formula in (9). With this said, using (2), (3) and (5), we obtain Further we list some equations for later reference and the reader's convenience.
Proof of Lemma 2.3. Let y := cos ω(α −1 β) 2 , then by (7) and (9) The formula for the dimension of H L can be proved as follows. The eigenspace associated to λ = ( + 1) has dimension (2 + 1) 2 since this is the number of elements of its basis D m,n . Thus dim(H L ) is given by 2 Here, C 2L , L ≥ 0, is the sequence of Gegenbauer (ultra-spherical) polynomials.
Proof of Lemma 2.2. In (8) we apply the equality and reason, under the assumption w := ω(α −1 β) = 0, as follows where we used the well known fact, that the power series for log(1 − x) at 1 converges at the boundary of its disc of convergence (except for x = 1) and equals the logarithm at these values: and similarly Further, by 1 − e −iw = 2ie −i w 2 sin( w 2 ), we conclude where we used a property of the complex logarithm: log(re iϕ ) = log(r) + iϕ.
3 Riesz s-Energy: Proof of Theorem 1.1 Recall if A is a real matrix, we have A 2 F := Trace(A t A). We set throughout N = N (L) = C 2L (1) for L ∈ N, and note next a well known fact before we proceed, see for instance [17,Eq. (33)].
Proof. We abbreviate ω = ω(α −1 β), and use the half-angle formula for sine: Reminding us first of the Beta function B(a, b) := 1 0 t a−1 (1 − t) b−1 dt for a, b > 0, we are ready to state our first proposition. Proof. We use (4), Lemma 2.3, Lemma 3.1, invariance of Haar measure, and (1): The next line is, apart of the factor 4 8 s/2 π N 2 , the continuous Riesz s-energy: For 0 < s < 3, we hence obtain where we inferred [26, Eq. 7.33.6], i.e. for every c > 0 there is C ≥ 0 such that The case s = 1 is Lemma B.2; the case s = 2 follows from Lemma B.4: where c u,u = c 2 u,u (2L) with notation as in Lemma B.4.
To use (1) in the next proof, which is valid for L 1 functions -we argue as follows: Use Lebesgue's monotone convergence theorem with (1) on f n = min{n, f } → f . We will further use the digamma function ψ, see Appendix B.
Proof of Theorem 1.1. We proceed as in the previous proof and use Lemma B.4: We use (19): (4) , and obtain ( ) = 2(γ + log (4) and hence by well known summation formulas due to Faulhaber: Invoking Lemma 3.3 yields Since N 2 = C proving the claim when multiplied by 4 8 3/2 π . Lemma 3.3. Let ψ(t) be the digamma function and m ≥ 0, then n k=1 k m ψ k as the sum can be bounded from above and below by the same integral, apart from integration boundaries, where we obtain the error term. We finish by applying the anti-derivative: t m+1 m+1 log(t) − t m+1 (m+1) 2 .

Green Energy: Proof of Theorem 1.2
We prove the lower and upper bound separately in the following two sections.

Estimate of the Green Energy: Lower Bound
We follow an exposition due to N. Elkies, found in [20, pp. 149-154]. This has been pointed out to the authors by E. Saff, and his help is thankfully acknowledged. The idea is to find a function with nice properties smaller than G, and to bound its energy from below. For α, β ∈ SO(3) and t > 0, the following will do: To show that it really is smaller, we infer an adaptation of [20, Lem. 5.2].
Proof. Using uniform convergence, we differentiate term by term and define Given a smooth test function φ, with uniformly converging representation as where we interchanged integration and summation by uniform convergence and used that {D m,n √ 2 + 1} is an orthonormal basis. Now we have uniformly For t > 0 fixed, we can interchange differentiation and integration yielding ∆ g u(α, t) + ∂ t u(α, t) = 0.
By the strong maximum principle Theorem A.2, we have for every t > 0: The same PDE and estimates hold for v(α, t) = u(α, t) + φ 0 .
If φ ≥ 0, then so is v(α, t) for all t > 0 by the maximum principle as v(α, 0) = φ(α). Hence We further set where we interchanged sum and integral again. The limit t → 0 exists and equals the integral of G(α, β) · φ(β). Differentiating term-wise for t > 0 yields Finally, for fixed α let t > > 0, then by the fundamental theorem of calculus: and thus, for all non-negative test functions φ Since G(α, β) is continuous and locally integrable in β away of α, this proves the lemma. Now by Lemma 4.1, we have for some t > 0 which will be determined later, and some collection of distinct points {α 1 , . . . , α N } ⊂ SO (3): Thus our remaining task is to find an asymptotic for G t (α, α) in t. First we note that e − ( +1)·t where C < 1/4 is some constant. For 0 < t 1 we then obtain with erf(x) := 2 √ π x 0 e −y 2 dy.

Estimate of the Green Energy: Upper Bound
According to (4), we have to estimate the integral which by Lemmas 2.2 and 2.3 and by invariance of Haar measure equals The integrand is in L 1 (SO (3)) since the singularity of the cotangent is removed by the zero of the difference of Gegenbauer polynomials, thus being a continuous function on a compact set. We hence can apply (1) getting: We simplify by noticing that where we used that odd functions integrate to zero over symmetric intervals. But by the following equality, valid for ν > 1 2 and found in [15,Eq. 7.314, p.789]: We have then proved that Next we use Lemma B.1 and Lemma B.2 in and obtain Finally we use so that, by (12) Hence where equality follows from Lemma 3.1. Note that by rotation invariance it suffices to study the variance of the random variable where α 1 , . . . , α N are generated by our dpp. The expected value of η satisfies E[η ] = µ(A)N , and the variance of η A is by definition (using χ A (α k ) 2 = χ A (α k )): The expected value of the right-hand side equals by (4) In other words, we have 2L cos ω(α −1 β) 2 2 dµ(β, α), and therefore, using invariance of Haar measure, (1) and (12) All in one we have proved the variance version of [23,Eq. 28]:

A The Strong Maximum Principle on Manifolds
We state the classical strong maximum principle Theorem A.1 for open, bounded, and connected subsets U ⊂ R n , and regard second order parabolic partial differential operators L + ∂ ∂t acting on functions C 2 1 (U × (0, T ]), i.e. twice differentiable with respect to spatial variables and once w.r.t. time. T > 0. A special case of this is extended in Theorem A.2. We set for smooth coefficients: and without loss of generality, a ij (x, t) = a ji (x, t).
Definition A.1. L + ∂ ∂t is said to be uniformly parabolic if there is a C > 0, s.t.
Theorem A.1 (Thm. 11, page 396 of [14]). Let u ∈ C 2 for U ⊂ R n as above, L + ∂ ∂t uniformly parabolic, and L as in (14). If the maximum or minimum of u is attained at a point (x 0 , t 0 ) ∈ U × (0, T ], then u equals this value everywhere in U × [0, t 0 ].
Given a manifold M with or without boundary, we set M • = M \ ∂M , and for x ∈ M , define M x as the connected component of M containing x. Now, the next theorem should be known, but we haven't found a reference.
is an open ball B α , and the local representation of ∆ g in U α is of type (14), and satisfies (15) for C = 1/2. This follows from the fact that the Laplace-Beltrami operator at a point β in the interior can be written as the usual Laplacian at β, and by continuity of the coefficients, there is an open set of β where the inequality (15) is true for C = 1/2. Assume there were a t 0 > 0 such that the maximum/minimum of u would be attained at (α, t 0 ). Writing ∆ g w.r.t. the chart x α as ∆ α , and regarding the equation The maximum/minimum is in particular attained at the boundary as claimed. Further, M α is covered by finitely many intersecting charts as above, and Theorem A.1 would yield that u is constant and equals u(α, t 0 ) in all of M α × [0, t 0 ].
Lemma B.4. Let n, λ ∈ N be fixed, j, k ∈ {0, . . . , n} and define Proof. We will use Lemma B.3 with [15, Eq. 8.934]: in conjunction with the angle-sum and half-angle formula for cosine and sine: An application of (19) finishes the proof.

C Sampling on SO(3)
So far we obtained theoretical bounds for the Green energy on SO(3) via a lemma due to N. Elkies and properties of points sampled by a dpp. The upper bound cannot be best possible, as it is an expected value -and hence there must be fluctuations above and in particular below that value. In this section we will introduce an algorithm to sample points in SO (3), that is simple to implement and numerically outperforms points sampled by a dpp. We are not giving any proofs regarding this algorithm, but rather show that it exists and how our bounds could be used as a comparison tool.
In 1987 a probabilistic algorithm was introduced by P. Diaconis and M. Shahshahani for compact groups in [10] and seemingly a special case of that was re-discovered by J. Arvo for SO(3) in [3]. We will use a variant of this, replacing random points by a Halton sequence in the unit cube, which we baptize HArDiSh algorithm, and it does very well according to numerics. See Figure 1.
Following closely to [3], we sample N points as follows: For x 1 , x 2 , x 3 to be determined later, let M = −HR where H In [3], the x j were chosen uniformly at random, and as Arvo already mentions, generating x j by stratified or jittered sampling should yields less clumping for the matrices M . Our humble modification is to sample x j via Halton sequences, i.e. let vdC(p, j) denote the j-th element of the van der Corput seqence in base p, set then we obtain matrices {M k } N k=1 via (22) by setting x j (k) = H(j, k). We do not know if the algorithm will continue to perform well for high numbers N .