Convexity properties of the condition number

We define in the space of n by m matrices of rank n, n less or equal than m, the condition Riemannian structure as follows: For a given matrix A the tangent space of A is equipped with the Hermitian inner product obtained by multiplying the usual Frobenius inner product by the inverse of the square of the smallest singular value of A denoted sigma_n(A). When this smallest singular value has multiplicity 1, the function A ->log (sigma_n(A)^(-2)) is a convex function with respect to the condition Riemannian structure that is t ->log (sigma_n(A(t))^(-2)) is convex, in the usual sense for any geodesic A(t). In a more abstract setting, a function alpha defined on a Riemannian manifold (M,<,>) is said to be self-convex when log alpha (gamma(t)) is convex for any geodesic in (M,<,>). Necessary and sufficient conditions for self-convexity are given when alpha is C^2. When alpha(x) = d(x,N)^(-2) where d(x,N) is the distance from x to a C^2 submanifold N of R^j we prove that alpha is self-convex when restricted to the largest open set of points x where there is a unique closest point in N to x. We also show, using this more general notion, that the square of the condition number ||A|||_F / sigma_n(A) is self-convex in projective space and the solution variety.


Introduction
Let two integers 1 ≤ n ≤ m be given and let us consider the space of matrices K n×m , K = R or C, equipped with the Frobenius Hermitian product Given an absolutly continuous path A(t), a ≤ t ≤ b, its length is given by the integral and the shortest path connecting A(a) to A(b) is the segment connecting them. Consider now the problem of connecting these two matrices with the shortest possible path in staying, as much as possible, away from the set of "singular matrices" that is the matrices with non-maximal rank.
The singular values of a matrix A ∈ K n×m are denoted in non-increasing order: We denote by GL n,m the space of matrices A ∈ K n×m with maximal rank : rank A = n, that is σ n (A) > 0 so that the set of singular matrices is N = K n×m \ GL n,m = A ∈ K n×m : σ n (A) = 0 .
Since the smallest singular value of a matrix is equal to the distance from the set of singular matrices: given an absolutly continuous path A(t), a ≤ t ≤ b, we define its "condition length" by the integral A good compromise between length and distance to N is obtained in minimizing L κ . We call "minimizing condition geodesic" an absolutly continuous path, parametrized by arc length, which minimizes L κ in the set of absolutly continuous paths with given end-points and condition distance d κ (A, B) between two matrices the length L κ of a minimizing condition geodesic with endpoints A and B, if any. In this paper our objective is to investigate the properties of the smallest singular value σ n (A(t)) along a condition geodesic. Our main result says that the map log (σ n (A(t)) −1 ) is convex. Thus σ n (A(t)) is concave, and its minimum value along the path is reached at one of the endpoints.
Note that a similar property holds in the case of hyperbolic geometry where instead of K n×m we take R n−1 ×[0, ∞[, instead of N we have R n−1 ×{0}, and where the length of a path a(t) = (a 1 (t), . . . , a n (t)) is defined by the integral da(t) dt a n (t) −1 dt.
Geodesics in that case are arcs of circles centered at R n−1 × {0} or segments of vertical lines, and log (a n (t) −1 ) is convex along such paths. The approach used here to prove our theorems is heavily based on Riemannian geometry. We define on GL n,m the following Riemannian structure: M, N κ,A = σ n (A) −2 Re M, N F where M, N ∈ K n×m and A ∈ GL n,m . The minimizing condition geodesics defined previously are clearly geodesic in GL n,m for this Riemannian structure so that we may use the toolbox of Riemannian geometry. In fact things are not so simple: the smallest singular value σ n (A) is a locally Lipschitz map in GL n,m , and it is smooth on the open subset GL > n,m = {A ∈ GL n,m : σ n−1 (A) > σ n (A)} that is when the smallest singular value of A is simple. On the open subset GL > n,m the metric ·, · κ defines a smooth Riemannian structure, and we call "condition geodesics" the geodesics related to this structure. Such a path is not necessarily a minimizing geodesic. Our first main theorem establishes a remarkable property of the condition Riemannian structure: Theorem 1. σ −2 n is logarithmically convex on GL > n,m i.e. for any geodesic curve γ(t) in GL > n,m for the condition metric the map log (σ −2 n (γ(t))) is convex.
Problem 1. The condition Riemannian structure ., . κ is defined in GL n,m where it is is only locally Lipschitz. Let us define condition geodesics in GL n,m as the extremals of the condition length L κ (see for example [3] Chapter 4, Theorem 4.4.3, for the definition of such extremals in the Lipschitz case). Is Theorem 1 still true for GL n,m ? All the examples we have studied confirm that convexity holds, even if σ −1 n (γ(t)) fails to be C 1 . See Boito-Dedieu [2]. We intend to address this issue in a future paper.
In a second step we extend these results to other spaces of matrices: the sphere S r (GL > n,m ) of radius r in GL > n,m in Corollary 6, the projective space P GL > n,m in Corollary 7. We also consider the case of the solution variety of the homogeneous equation Mζ = 0 that is the set of pairs (M, ζ) ∈ K n×(n+1) × K n+1 : Mζ = 0 . Now our function α is the square of the condition number studied by Demmel in [4]. This is done in the affine context in Theorem 3 and in the projective context in Corollary 8.
Since σ n (A) is equal to the distance from A to the set of singular matrices a natural question is to ask whether our main result remains valid for the inverse of the distance from certain sets or for more general functions. ·, · κ,x = α(x) ·, · x called condition Riemann structure. We say that α is self-convex when log α(γ(t)) is convex for any geodesic γ in M κ .
For example, with M = {x = (x 1 , . . . , x n ) ∈ R n : x n > 0} equipped with the usual metric, α(x) = x −2 n is self-convex. The space M κ is the Poincaré model of hyperbolic space.
In the following theorem we prove self-convexity for the distance function to a C 2 submanifold without boundary N ⊂ R j . Let us denote by Let U be the largest open set in R j such that, for any x ∈ U, there is a unique closest point in N to x. When U is equipped with the new metric α(x) ., . we have: Theorem 2 is then extended to the projective case. Let N be a C 2 submanifold without boundary of P(R j ). Let us denote by d R the Riemannian distance in projective space (points in the projective space are lines throught the origin and the distance d R between two lines is the angle they make). Let us denote d P = sin d R (this is also a distance), define α(x) = d P (x, N ) −2 , and let U be the largest open subset of P(R j ) such that for x ∈ U there is a unique closest point from N to x for the distance d P . Then The extension of Theorem 1 and Theorem 2 to other types of sets or functions is not obvious. In Example 1 we prove that α(A) = σ 1 (A) −2 + · · · + σ n (A) −2 is not self-convex in GL n,m .
In Example 2 we take N = R 2 , and U the unit disk so that U contains a point (the center) which has many closest points from N . In that case the corresponding function α : U \ N → R is self-convex but it fails to be smooth at the center of the disk.
In Example 3 we provide an example of a submanifold N ⊂ R 2 such that the function α( Our interest in considering the condition metric in the space of matrices comes from recent papers by Shub [8] and Beltrán-Shub [1] where these authors use condition length along a path in certain solution varieties to estimate step size for continuation methods to follow these paths. They give bounds on the number of steps required in terms of the condition length of the path. If geodesics in the condition metric are followed the known bounds on polynomial system solving are vastly improved. To understand the properties of these geodesics we have begun in this paper with linear systems where we can investigate their properties more deeply. We find self-convexity in the context of this paper remarkable. We do not know if similar issues may naturally arise in linear algebra even for solving systems of linear equations. Similar issues do clearly arise when studying continuation methods for the eigenvalue problem.

Self-convexity
Let us first start to recall some basic definitions about convexity on Riemannian manifolds. A good reference on this subject is Udrişte [9].
for every x, y ∈ M, for every geodesic arc γ xy joigning x and y and 0 ≤ t ≤ 1.
The convexity of f in M is equivalent to the convexity in the usual sense of f • γ xy on [0, 1] for every x, y ∈ U and the geodesic γ xy joining x and y or also to the convexity of g • γ for every geodesic γ ( [9] Chap. 3, Th. 2.2). Thus, we see that Lemma 1. Self-convexity of a function α : M → R is equivalent to the convexity of log •α in the condition Riemannian manifold M κ .
When f is a function of class C 2 in the Riemannian manifold M, we define its second derivative D 2 f (x) as the second covariant derivative. It is a symmetric bilinear form on T x M. Note ([9, This second derivative depends on the Riemannian connection on M. Since M is equipped with two different metrics: ., . and ., . κ we have to distinguish between the corresponding second derivatives; they are denoted by D 2 f (x) and D 2 κ f (x) respectively. No such distinction is necessary for the first derivative Df (x).
Convexity on Riemannian manifold is characterized by (see [9] Chap. 3, Th. 6.2): We use this proposition to obtain a caracterisation of self-convexity: α is self-convex if and only if the second derivative D 2 κ (log •α)(x) is positive semidefinite for any x ∈ M κ . We get Proposition 2. For a function α : M → R of class C 2 with positive values self-convexity is equivalent to for any x ∈ M and for any vectorẋ ∈ T x M, the tangent space at x.
Proof. Let x ∈ M be given. Let ϕ : R m → M be a coordinate system such that ϕ(0) = x and with first fundamental form g ij (0) = δ ij (Kronecker's delta) and Christoffel's symbols Γ i jk (0) = 0, and let . Those coordinates are called "normal" or "geodesic". Note that this implies ∂g ij ∂z k (0) = 0 for all i, j, k. We denote by g κ,ij and Γ i κ,jk respectively the first fundamental form and the Christoffel symbols for ϕ in M κ . Let us compute them. Note that That is, The second derivative of the composition of two maps is given by the identity (see [9] Chap. 1.3, Hessian) This gives in our context, that is when f = α and ψ = log, According to Proposition 1 our objective is now to give a necessary and sufficient condition for D 2 κ (log •α)(x) to be positive semidefinite for each x ∈ M. In our system of local coordinates the components of D 2 α(x) are (see [9] Chap. 1.3) If we replace the Christoffel symbols in this last sum by the values previously computed we obtain, when j = k, Both cases are subsumed in the identity Putting together all these identities gives the following expression for the x ẋ 2 x − 4(Dα(x)ẋ) 2 ≥ 0 for any x ∈ M and for any vectorẋ ∈ T x M. This finishes the proof.
An easy consequence of Proposition 2 is the following. See also Example 3.
Corollary 2. When a function α : M → R of class C 2 is self-convex then any critical point of α has a positive semi-definite second derivative D 2 α(x). Such a function cannot have a strict local maximum or a non-degenerate saddle.
Proposition 3. The following condition is equivalent for a C 2 function α = 1/ρ 2 : M −→ R to be self-convex on M: For every x ∈ M andẋ ∈ T x M, or, what is the same, Proof. Note that Hence, the necessary and sufficient condition of Proposition 2 reads and the proposition follows.
Corollary 3. Each of the following conditions is sufficient for a function α = 1/ρ 2 : M −→ R to be self-convex at x ∈ M: For everyẋ ∈ T x M, In the following proposition we obtain a weaker condition on α to obtain convexity in M κ instead of self-convexity.
for any x ∈ M and any vectorẋ ∈ T x M.
Proof. We follow the lines of the proof of Proposition 2 with ψ equal to the identity map instead of ψ = log.
3 Some general formulas for matrices Proposition 5. Let A = (Σ, 0) ∈ GL > n,m , where Σ = diag (σ 1 ≥ · · · ≥ σ n−1 > σ n ) ∈ K n×n . The map σ n : GL > n,m → R is a smooth map and, for every U ∈ K n×m , Proof. Since σ 2 n is an eigenvalue of AA * with multiplicity 1, the implicit function theorem proves the existence of smooth functions σ 2 n (B) ∈ R and u(B) ∈ K n , defined in an open neighborhood of A and satisfying Differentiating these equations at B gives, for any U ∈ K n×m , Corollary 4. Let A = (Σ, 0) ∈ GL > n,m , where Σ = diag (σ 1 ≥ · · · ≥ σ n−1 > σ n > 0) ∈ K n×n . Let us define ρ(A) = σ n (A)/ A F . Then, for any U ∈ K n×m such that Re A, U F = 0, we have and the first assertion of the corollary follows from Proposition 5. For the second one, note that h = h 1 /h 2 (for real valued C 2 functions h, h 1 , h 2 with h 2 (0) = 0) implies F , and D 2 σ 2 n (A)(U, U) is known from Proposition 5. The formula for D 2 ρ 2 (A) follows after some elementary calculations.

The affine linear case
We consider here the Riemannian manifold M = GL > n,m equipped with the usual Frobenius Hermitian product. Let α : GL > n,m → R be defined as α(A) = 1/σ 2 n (A). Corollary 5. The function α is self-convex in GL > n,m . Proof. From Proposition 3, it suffices to see that Since unitary transformations are isometries in GL > n,m with respect to the condition metric we may suppose, via a singular value decomposition that A = (Σ, 0) ∈ GL > n,m , where Σ = diag (σ 1 ≥ · · · ≥ σ n−1 > σ n ) ∈ K n×n . Now, the inequality to verify is obvious from Proposition 5, as Dσ n (A) F = 1 and Corollary 6. Let r > 0. The function α is self-convex in the sphere S r (GL > n,m ) of radius r in GL > n,m . Proof. It is enough to prove that any geodesic in (S r (GL > n,m ), α) is also a geodesic in (GL > n,m , α). Indeed, suppose that A and B are matrices in S r (GL > n,m ) and the minimal geodesic in (GL > n,m , α) between A and B is X(t), a ≤ t ≤ b. Then we claim that L κ rX(t) X(t) F ≤ L κ (X(t)). Indeed, for any t, d dt Therefore X(t) can only be a minimizing geodesic if it belongs to S r (GL > n,m ). Since all geodesics are locally minimizing geodesics, Corollary 6 follows.
The following gives an example of a smooth and non-selfconvex function in GL n,m .
Proof. For simplicity we consider the case of real square matrices. We have The matter of this subsection is mainly taken from Gallot-Hulin-Lafontaine [6] sect. 2.A.5.
Let V be a Hermitian space of complex dimension dim C V = d + 1. We denote by P(V ) the corresponding projective space that is the quotient of V \ {0} by the group C * of dilations of V ; P(V ) is equipped with its usual smooth manifold structure with complex dimension dim P(V ) = d. We denote by p the canonical surjection.
Let V be considered as a real vector space of dimension dim R V = 2d + 2 equipped with the scalar product Re ., . V . The sphere S(V ) is a submanifold in V of real dimension 2d + 1. This sphere being equipped with the induced metric becomes a Riemannian manifold and, as usual, we identify the tangent space at z ∈ S(V ) with The projective space P(V ) can also be seen as the quotient S(V )/S 1 of the unit sphere in V by the unit circle in C for the action given by (λ, z) ∈ S 1 × S(V ) → λz ∈ S(V ). The canonical map is denoted by p V is the restriction of p to S(V ).
The horizontal space at z ∈ S(V ) related to p V is defined as the (real) orthogonal complement of ker Dp V (z) in T z S(V ). This horizontal space is denoted by H z . Since V is decomposed in the (real) orthogonal sum V = Rz ⊕ Riz ⊕ z ⊥ and since ker Dp V (z) = Riz (the tangent space at z to the circle S 1 z) we get There exists on P(V ) a unique Riemannian metric such that p V is a Riemannian submersion that is, p V is a smooth submersion and, for any z ∈ S(V ), Dp V (z) is an isometry between H z and T p(z) P(V ). Thus, for this Riemannian structure, one has: for any z ∈ S(V ) and u, v ∈ H z . Proposition 6. Let z ∈ S(V ) be given.
2. Its derivative at 0 is the restriction of Dp(z) at H z : which is an isometry.
The following result will be helpful.
. Proof. Note that p : S(GL > n,m ) → P(GL > n,m ) is a Riemannian submersion and α 2 = α • p where α is as in Corollary 6. The corollary follows from Proposition 7.

The solution variety.
Let us denote by p 1 and p 2 the canonical maps where S 1 is the unit sphere in K n×(n+1) and S 2 is the unit sphere in K n+1 . Consider the affine solution variety, n,n+1 and Mζ = 0 . It is a Riemannian manifold equipped with the metric induced by the product metric on K n×(n+1) × K n+1 . The tangent space toŴ > is given by The projective solution variety considered here is W > = (p 1 (M), p 2 (ζ)) ∈ P K n×(n+1) × P n (K) : M ∈ GL > n,n+1 and Mζ = 0 , that is also a Riemannian manifold equipped with the metric induced by the product metric on P K n×(n+1) × P n (K).
which is a consequence of our Proposition 5.
6 Self-convexity of the distance from a submanifold of R j Let N be a C k submanifold without boundary N ⊂ R j , k ≥ 2. Let us denote by ρ(x) = d(x, N ) = inf y∈N x − y the distance from N to x ∈ R j (here d(x, y) = x − y denotes the Euclidean distance). Let U be the largest open set in R j such that, for any x ∈ U, there is a unique closest point from N to x. This point is denoted by K(x) so that we have a map defined by K : U → N , ρ(x) = d(x, K(x)).
Classical properties of ρ and K are given in the following (see also Foote [5], Li and Nirenberg [7]).
This last quantity is equal to 1 2 d 2 . It is nonnegative by the second order optimality condition.
Proof of Theorem 2 and Corollary 1. We are now able to prove our second main theorem. Let us denote α(x) = 1/ρ(x) 2 . We shall prove that α is self-convex on U. From proposition 3 it suffices to prove that, for everẏ x ∈ R j , 2 ẋ 2 Dρ(x) 2 ≥ D 2 ρ 2 (x)(ẋ,ẋ) or, according to Proposition 8.4 and Dρ = 1, that 2 ẋ 2 ≥ 2 ẋ 2 − 2 DK(x)ẋ,ẋ . This is obvious from Proposition 8.4. Now we prove Corollary 1. Let S 1 (R j ) be the sphere of radius 1 in R j and let p R j denote the canonical projection p R j : R j → P(R j ). Note that the preimage of N by p R j satisfies d(y, p −1 R j (N )) = d P (p R j (y), N ) y .
As in the proof of Corollary 6, the mapping 1/ρ(x) 2 is self-convex in the set S 1 (R j ) ∩ p −1 R j (U). Now, apply Proposition 7 to the Riemannian submersion p R j to conclude the corollary.
Two examples. Example 2. Take U the unit disk in R 2 and N the unit circle. The corresponding function is given by According to Theorem 2, the map log α(x) is convex along the condition geodesics in U \ {(0, 0)} = x ∈ R 2 : 0 < x < 1 .
Example 3. Take N ⊂ R 2 equal to the union of the two points (−1, 0) and (1, 0). In that case It may be shown that for any 0 < a ≤ 1/10, the straight line segment is the only minimizing geodesic joining the points (0, −a) and (0, a). Since log α(0, t) = − log(1 + t 2 ) has a maximum at t = 0, g(t), −a ≤ t ≤ a, cannot be log-convex. Here {0} × R is equal to the locus in R 2 of points equally distant from the two nodes which is the set we avoid in Theorem 2.