The polynomial eigenvalue problem is well conditioned for random inputs

We compute the exact value of the squared condition number for the polynomial eigenvalue problem, when the input matrices have entries coming from the standard complex Gaussian distribution, showing that in general this problem is quite well conditioned.

where A = (A 0 , . . . , A d ) and A solution (α, β) ∈ P(C 2 ) of P (A, α, β) = 0 is called a generalized eigenvalue, or simply an eigenvalue, of A. Let us denote by Eig(A) the set of eigenvalues of A.
For generic input A, there exist nd such eigenvalues. Right and left eigenvectors of A are defined similarly to the GEVP case. If we set d = 1 we get (up to a sign convention) the GEVP problem.
A good deal of effort has been dedicated in the last years to find efficient algorithms for particular instances of the PEVP such as the quadratic eigenvalue problem, see [6,27,24] (the inverse version [25] is also of interest). The nowadays method of choice for solving the PEVP is to linearize it obtaining a GEVP (in dimension nd) that can be solved using standard eigenvalue solvers. Different ways to linearize the problem can be analyzed in search for optimal ones in terms of stability or numerical convenience, see [22].
A fundamental question regarding the numerical solution of the PEVP is the stability, governed by the so-called condition number [16,23,21]. The condition number for the GEVP has a nice expression computed in [16,Section 6]: for a matrix pair (A, B) ∈ C n×n × C n×n and an eigenvalue (α, β) ∈ P(C 2 ), we have: µ((A, B), (α, β)) = (α, β) x y |ᾱy * Ax +βy * Bx| (A, B) F , where x and y are the corresponding right and left eigenvectors, and · F denotes Frobenius norm (the factor (A, B) F is missing in [16] since Dedieu and Tisseur compute the absolute condition number instead of the relative condition number). This formula is indeed a consequence of the definition of the condition number as the maximum of the change in the eigenvalue when the input is locally perturbed (see Section 2 for a more formal definition). The same definition as the maximum possible change in the eigenvalue when the input is perturbed is valid for the more general PEVP. An explicit formula for the condition number for the PEVP was derived in [16,Th. 4.2]: Note that the condition number depends both on the input A, and on the particular eigenvalue (α, β). It is customary to consider Computing the condition number for any numerical problem (including the PEVP) is a time-consuming task that suffers from intrinsic stability problems as pointed out in [15]. It is hence usual to estimate average values of the condition number for a given family of inputs, in such a way that we can rely on probabilistic arguments instead of computing conditions of particular inputs. The linear algebra case is probably the most studied one, see [19,14] for estimates on Turing's condition number of linear algebra with different normalizations. Theoretical results describing the average condition number of the standard eigenvalue-eigenvector problem have been recently obtained [4,2], but we are not aware of any previous similar result for the PEVP case. In this paper we fill this gap: )n 2 be chosen at random with independent entries following the standard complex Gaussian distribution N C (0, 1). Then, the expected value of the squared condition number for the PEVP satisfies: A trivial consequence of Theorem 1.1 is: . . , A d ) ∈ C (d+1)n 2 be chosen at random with independent entries with distribution N C (0, 1). Then, and in particular Since the number of digits needed to describe accurately the output of a numerical problem is controlled by the logarithm of the condition number, we can now see from Corollary 1.2 that there is no intrinsic obstruction for computing solutions to the PEVP, even for very large values of n and d.
The proof of Theorem 1.1 (see Section 3.4) will be a consequence of a much more general result, Theorem 2.1. We also prove similar results for other numerical problems, see sections 3.1, 3.2, 3.3, 3.5, 3.6 for other examples of application.

Remark 1.3
Since the condition number satisfies µ(tA, (α, β)) = µ(A, (α, β)) for any nonzero t ∈ C, the results above also apply in the case that the coefficients of the input matrices are N C (0, σ) for σ ∈ (0, ∞), as far as all the coefficients follow the same distribution.

Remark 1.4
The standard approach to solve a polynomial equation p(x) = 0 in one complex variable, is to construct the companion matrix of the given polynomial and compute its eigenvalues (this is the way used by Matlab's command root). It is usually convenient to balance the companion matrix before applying QR or some other standard eigensolver. Specific eigensolvers that take advance of the special structure of companion matrices have also been developped, see [10,11,7,5] and references therein. The stability of the process can be improved by using a companion matrix different from the standard one, see [18].
Currently existing eigenvalue solvers exhibit remarkable stability and accuracy properties, and indeed the above process shows excellent practical performance. In contrast, computing eigenvalues of a matrix writing down its characteristic polynomial and solving by Newton's method or other basic procedure shows quite a poor performance in practice. This may have contributed to support the idea, extended in some part of the community, that polynomial root finding is poor-conditioned, while eigenvalue computation is well-conditioned. Indeed, in [31, p. 92] root finding is presented as a classic example of ill-conditioned problem, and so is the computation of eigenvalues in the non-symmetric case. Theorem 1.1 shows that in a homogeneous context, at least if the coefficients of the input instances are derived from N C (0, 1), we actually have the following.
Both polynomial root finding and polynomial eigenvalue problems are pretty well conditioned on the average, and indeed polynomial root finding has average squared condition number exactly equal to 1.

Remark 1.5
The condition number in a numerical problem measures the change in the solution when the input is locally perturbed, giving a worst case bound: the worst (local) perturbation in the input defines the condition of a problem.
In some situations, perturbations of the input can be expected to be "random" in the sense that there is no prefered direction, see [1]. Of course, in this case the observed condition number of the problem may seem better than the theoretical bound, since most perturbations will be quasiorthogonal to the worst direction from the concentration of measure phenomenon. The condition number for random direction in the perturbation was called in [1] the stochastic condition number µ st . Its formula for the PEVP is given (using [16,Th. 4.2]) by: . . ,Ȧ d ) lies in the unit sphere in C n 2 (d+1) and v is given by (2). From [1, Theorem 1] we have µ 2 = (d + 1)n 2 µ 2 st , which gives the equality Thus, for a random eigenvalue of a random input and a random direction in the perturbation, one should expect the relative change in the solution to be around 1/d times the size of the perturbation (relative to A F = (A 0 , . . . , A d ) F ).

Geometric framework: a general result
We will follow the path of [32,16,30] but considering a general setting of input and output spaces which can be applied to many different situations (see for example [12], or [1]). In many numerical problems there is a space of inputs I that we can identify with C m , and a space of outputs O that we can identify with a one-dimensional projective irreducible algebraic subvariety of P(C b ), for some b ∈ N. We denote by d O the degree of O. We denote elements in I by p and elements in O by z, and we consider a polynomial F : bihomogeneous in its two variables with degrees: We look at the problem: Then, we consider the algebraic variety as well as the two natural projections π 1 : V → I, π 2 : V → O.
The space C m is equipped with the canonical Hermitian inner product, and P(C b ) is equipped with the induced Fubini-Study metric and Hermitian structure (see [12] for example).
A random choice of p ∈ I is to be understood as a random variable for the Gaussian density 1 π m e − p 2 . That is, under the equivalence I ≡ C m , the elements of I follow the standard complex Gaussian distribution.
We will use the following notation for norms: • By · we mean the Euclidean vector norm. In particular, if A is a matrix, we have A = A F (Frobenius norm).
• By A 2 (with A a matrix or a linear operator) we denote the operator 2-norm of A.
• By · z we mean the norm in the tangent space T z P(C b ).
Here are some basic examples of the general setting above, including the PEVP case: A I ≡ C N +1 is the set of polynomials of degree at most N , homogeneous in two variables, and we search for a zero z ∈ P(C 2 ). Here, B I ≡ C n×n × C n×n ≡ C 2n 2 is the set of pairs of matrices (A, B), and we search for a generalized eigenvalue (α, β) ∈ P(C 2 ), i.e. a solution of In general, there exist n generalized eigenvalues.
C I ≡ C n×n × · · · × C n×n ≡ C (d+1)n 2 (there are d + 1 copies of C n×n ), and we want to solve the PEVP, that is given where P is given in (1). In general, there exist dn eigenvalues. Note that A and B can be seen as a particular cases of C.
D One can also consider sparse or lacunary versions of the problems above, namely the same problems where some of the entries of the matrices, or some of the coefficients of the polynomials, are set to 0.
The relative condition number in these and other numerical problems is defined as follows (see [12], [1], or [13, Sec. 14.1] for a general setting, or [16] for an specification to the PEVP). Let z 0 be a solution for an input p 0 . Assume that z 0 is a smooth point of O and letż 0 be any nonzero vector in T z 0 O. When the directional derivative is not equal to 0, from the implicit function theorem, there is a locally defined solution map, denoted by Sol, which sends an input p (close to p 0 ) to its output z (close to z 0 ). This map is given by the composition π 2 • π −1 1 , locally defined in a neighborhood of p 0 . The condition number at (p 0 , z 0 ) is then defined by the operator norm of the derivative of the solution map, normalized by the norm of the input In the case that Sol is just Lipschitz, the condition number uses the local Lipschitz constant instead of the 2-norm of the derivative, but we will not deal with this case here. The condition number is set to ∞ if DF(p 0 , z 0 )(0,ż 0 ) = 0 for some (i. e. for all) nonzero vectorż 0 ∈ T z 0 O.
This geometric definition of the condition number is inspired in the intuitive definition: it is a local bound for the change in the output under perturbations on the input, both measured in relative error terms. The classical condition number of linear algebra κ(A) = A 2 A −1 2 does not exactly follow this definition since in our general frame we measure the relative error with respect to the norm of the input as a vector. Indeed, our definition gives the so called Demmel's condition number: Our main result is a general formula for the expected value of the condition number. Its main feature is that it is valid for the general input-output setting described above. For p ∈ I, and z ∈ O, let n(p) = ♯π −1 1 (p), and V z = π −1 2 (z), i.e. where we set n(p) = +∞ in case π −1 1 (p) not finite.
Theorem 2.1 With the notations above, assume that : • n(p) is finite for all p ∈ I out of some m − 2 dimensional subvariety, and there exists p 0 ∈ I such that n(p 0 ) = sd O .
• V z is an m − 1 dimensional variety for all z ∈ O, and there exists z 0 ∈ O such that the degree of V z 0 is equal to r.
Then, for all p ∈ I out of some zero-measure set, the equation F(p, z) = 0 has exactly sd O solutions in O. Moreover, the expected squared condition number satisfies:

Proof of Theorem 2.1
It will be helpful to recall the notion of a constructible subset of an algebraic variety, see [26, p. 393]: it is a set of the form where V i , W i are algebraic subvarieties of X. In other words, a constructible set is a finite union of quasialgebraic varieties. Chavalley's theorem [26, p. 395] asserts that the projection of a constructible set is a constructible set. We organize the proof of Theorem 2.1 in several claims. Then, π 1 restricted toV is a submersion ontoÎ. From Claim 1, given (p, z) ∈ V, z is a solution to F(p, z) = 0 of multiplicity 1, that is ∂ ∂z F(p, z) = 0 and from the implicit function theoremV is a m-dimensional complex manifold, its tangent space is and the projection π 1 |V :V →Î is a submersion at (p, z).

Claim 4 :
The restriction of π 2 toV is a submersion onto some Zariski open subset ofÔ. Note that π 2 : V → O is surjective by the hypotheses of the theorem. Now, from Claim 2 we have that for all but a finite number of points z ∈Ô there is some p ∈ I such that p ∈ V z \ S z , and hence φ 2 = π 2 |V containŝ O except at most a finite number of points. Now, from Sard's theorem the set of singular values of φ 2 is zero-measure, and it is also quasialgebraic since it is the intersection ofÔ with the projection of the algebraic variety {p, z : F(p, z) = 0, ∂ ∂z F(p, z) = 0}. It must then be at most a finite number of points, and the claim is proved. g(p, z) dp = z∈O p∈Vz N Jπ 1 (p, z) N Jπ 2 (p, z) g(p, z) dp dz, (5) where N J means Normal Jacobian (i.e. the Jacobian of the derivative restricted to the orthogonal to its kernel). In particular, these integrals are well defined. This is a consecuence of the smooth coarea formula applied to the two projectionsV →Î andV →Ô (see for example [12] or [13, p. 244]). From Claim 3 we have: g(p, z) dp.
On the other hand, from Claim 4 we have: N Jπ 1 (p, z) N Jπ 2 (p, z) g(p, z) dp.
Finally, from Claims 1 and 2 we can substituteÎ andÔ by I and O respectively in this last equality, and the claim follows.
Claim 6 : The quotient of the Normal Jacobians satisfies Given (p, z) ∈V, let Dπ 2 (p, z) * : T zÔ → T (p,z)V be the adjoint operator to Dπ 2 (p, z), and let whereż 0 is any non zero vector in T zÔ , and let (ṗ j ,ż j ), 1 ≤ j ≤ m − 1 be an orthonormal basis of the orthogonal complement of (ṗ 0 ,ż 0 ) in T (p,z)V (this tangent space was computed in Claim 3). Since the image of Dπ 2 (p, z) * is the orthogonal complement to the kernel of Dπ 2 (p, z), then we haveż j = 0 anḋ p j orthogonal toṗ 0 for 1 ≤ j ≤ m − 1. Then, writing the linear operators Dπ 1 (p, z) and Dπ 2 (p, z) on this basis we get: Now, note from the definition of the condition number that µ(p, z) is precisely p times the operator norm of the linear operator DSol(p, z) given by We have thus proved the equality in the claim.

Claim 7 :
The following equality holds: (And this readily implies the theorem). In order to prove Claim 7 we use claims 5 and 6 with g(p, z) = e − p 2 µ(p, z) 2 /π m , getting: π m z∈O p∈Vz p 2 e − p 2 dp dz.
Now, for any z ∈ O (except at most a finite number) using Lemma A.1 we have p∈Vz p 2 e − p 2 dp = π m−1 r (m − 1), since the set V z is (for almost all z ∈ O) an algebraic subvariety of dimension m − 1 and degree r. Using that the volume of O is d O V ol(P(C 2 )) = πd O (see for example [13,Cor. 20.10]) we then get the claimed formula.

Examples
We now analyze some of the consequences of Theorem 2.1, visiting the scenarios A, B, C and D in the introduction.

Solving univariate polynomials
Let H N [X, Y ] be the space of degree N homogeneous polynomials with unknowns X and Y . Let so a pair (p, z) is in V if and only if z is a (projective) zero of p. Note that the hypotheses of Theorem 2.1 are trivially satisfied. We then have: independently of the degree. Since we are endowing the space H N [X, Y ] with the metric given by the 2-norm of the vector of coefficients of p (which makes monomials of different degrees an orthonormal basis), the condition number has the form µ(p, (z, 1)) = p (1, z, . . . , z N ) |p ′ (z)|(1 + |z| 2 ) .
A random polynomial is obtained by choosing coefficients a j , associated to the monomial X j Y N −j (with 0 ≤ j ≤ N ), independent with distribution N C (0, 1). Equation (7) then implies that the condition number for such a random polynomial is quite small.

Lacunary polynomial solving
be the space of polynomials containing only monomials of those degrees. Let The hypotheses of Theorem 2.1 are again satisfied (note that the polynomial X N − Y N has exactly N projective zeros). As a subspace of H N [X, Y ], the input space H i N [X, Y ] can be endowed with the induced Hermitian products given in the example before. Then, from Theorem 2.1 we have This equality describes quantitatively how solving lacunary systems exhibit better stability properties than solving dense polynomials, which was qualitatively easily shown since the perturbations on the input are more restricted.

Generalized eigenvalue problem
As pointed out above, with we get the GEVP. Let us now apply our Theorem 2.1. Note that the degrees of F in its two entries are r = s = n. We check that: • The number of eigenvalues is finite for all nonsingular choices of A, B, that is out of a 2n 2 − 2 dimensional subvariety, and there exists (A, B) such that the number of eigenvalues is equal to n.
• For all (α, β) ∈ P(C 2 ), the set V (α,β) of (A, B) such that det(βA − αB) = 0 is a 2n 2 − 1 variety. Moreover, V (1,0) is the set of (A, B) such that det(B) = 0 which has degree n Thus, the hypotheses of Theorem 2.1 are satisfied. We then have: The expected value of the squared condition number for the GEVP is thus essentially equal to the size of the input.

Polynomial eigenvalue problem
This case is just: where A = (A 0 , . . . , A d ) ∈ I. Again the hypotheses of Theorem 2.1 are easily checked, and we conclude: which proves our Theorem 1.1.

Sparse polynomial eigenvalue problem
The same preceding argument can be applied to sparse versions of the PEVP case. Instead of writing down a generic result, we show how to use it in a particular case: assume for example that we deal with the Quadratic Eigenvalue Problem F((A, B, C), (α, β)) = det(α 2 A + αβB + β 2 C) = 0, where we impose on the input matrices some structure. For example, assume that A is diagonal and C is upper triangular, and assume that the nonzero entries follow again an independent distribution N C (0, 1). We check the hypotheses of Theorem 2.1: • The number of solutions is finite as far as A and C are nonsingular, thus out of a codimension 2 variety, and for input A = Id n (the identity matrix), B = 0 and generic C the number of solutions is equal to 2n.
• V (α,β) is a codimension 1 variety for all (α, β) ∈ O and V (1,0) is the set of (A, B, C) such that det(A) = 0, that is a degree n variety.

More elaborated eigenvalue problems
We finally include one example of use in the case that O = P(C 2 ). Assume that our problem is the following: on input A, B, C (three n × n matrices), find (α, β, γ) ∈ P(C 3 ) such that: αβ + αγ + βγ = 0, det(αA + βB + γC) = 0. This is, in some sense, a system of two homogeneous equations and three variables, thus one would expect the solution set to be a finite collection of points (α, β, γ) ∈ P(C 3 ). We can treat this problem inside our framework as follows. Let which is a degree 2 irreducible projective algebraic subvariety of P(C 3 ). Assume now that A, B, C have random independent entries with distribution N C (0, 1). We check that the hypotheses of Theorem 2.1 hold: • If A is nonsingular, then (1, 0, 0) ∈ O is not a solution of For these points to be in O we need −m(γ + 1 + γ) + (γ + 1)γ = 0, that is for each m there are exactly 2 values of γ that make (−m, γ +1, γ) ∈ O. We thus have found an input with exactly 2n solutions.
We can thus apply Theorem 2.1 and we conclude that the expected condition number squared for the problem in this section equals:

A A useful integral
In this section we prove the following lemma.
Lemma A.1 Let J ⊆ C a be a homogeneous complex algebraic variety of degree d and dimension n. Then, p∈J p 2 2 e − p 2 2 dp = π n nd.
(In order to compute this Normal Jacobian just consider any o.n. basis of T p J whose first vector isṗ = p).
The lemma follows.