Convexity properties of the condition number II

In our previous paper [SIMAX 31 n.3 1491-1506(2010)], we studied the condition metric in the space of maximal rank matrices. Here, we show that this condition metric induces a Lipschitz-Riemann structure on that space. After investigating geodesics in such a nonsmooth structure, we show that the inverse of the smallest singular value of a matrix is a log-convex function along geodesics (Theorem 1). We also show that a similar result holds for the solution variety of linear systems (Theorem 31). Some of our intermediate results, such as Theorem 12, on the second covariant derivative or Hessian of a function with symmetries on a manifold, and Theorem 29 on piecewise self-convex functions, are of independent interest. Those results were motivated by our investigations on the com- plexity of path-following algorithms for solving polynomial systems.

We also show that a similar result holds for the solution variety of linear systems (Theorem 31).
Some of our intermediate results, such as Theorem 12, on the second covariant derivative or Hessian of a function with symmetries on a manifold, and Theorem 29 on piecewise self-convex functions, are of independent interest.
Those results were motivated by our investigations on the complexity of path-following algorithms for solving polynomial systems.

Introduction
Let two integers 1 ≤ n ≤ m be given and let us consider the space of matrices K n×m , K = R or C, equipped with the Frobenius inner product We denote by the singular values of a matrix A ∈ K n×m , by GL n,m the space of matrices A ∈ K n×m with maximal rank, that is rank A = n or, equivalently, σ n (A) > 0, and by N the set of singular (or rank deficient) matrices: N = K n×m \ GL n,m = A ∈ K n×m : σ n (A) = 0 .
The distance of a matrix A ∈ K n×m from N is given by its smallest singular value: Consider now the problem of connecting two matrices with the shortest possible path staying, as much as possible, away from the set of singular matrices. We realize this objective by considering an absolutely continuous path A(t), a ≤ t ≤ b, with given endpoints (say A(a) = A and A(b) = B) which minimizes its condition length defined by We call minimizing condition path an absolutely continuous path which minimizes this integral in the set of absolutely continuous paths with the same end-points. We define a minimizing condition geodesic as a minimizing condition path parametrized by the condition arc length, that is when dA(t) dt F σ n (A(t)) −1 = 1 a. e.
A condition geodesic is an absolutely continuous path which is locally a minimizing condition geodesic. This concept of geodesic is related to the Riemannian structure defined on GL n,m by: We call it the condition Riemann structure on GL n,m . Our objective is to investigate the properties of the smallest singular value σ n (A(t)) along a condition geodesic. Our main result says: Theorem 1. For any condition geodesic t → A(t) in GL n,m , the map t → log (σ −2 n (A(t))) is convex. This theorem extends our main result in [1]. In that paper, the same theorem is proven for those condition geodesic arcs contained in the open subset GL > n,m = {A ∈ GL n,m : σ n−1 (A) > σ n (A)} that is when the smallest singular value σ n (A) is simple. The reason for this restriction is easy to explain. The smallest singular value σ n (A) is smooth in GL > n,m , and, in that case, we can use the toolbox of Riemannian geometry. But it is only locally Lipschitz in GL n,m ; for this reason we call the condition structure in GL n,m a Lipschitz-Riemannian structure.
Motivation: Let us now say a word about our motivations. The (today) classical papers [24], [25], and [26] by Shub and Smale relate complexity bounds for homotopy methods to solve Bézout's Theorem to the condition number of the encountered problems along the considered homotopy path. Ill-conditioned problems slow the algorithm and increase its complexity. For this reason it is natural to consider paths which avoid ill-posed problems, and, at the same time, are as short as possible. The condition metric has been designed to construct such paths. It has been introduced by Shub in [23], then studied by Beltrán and Shub in [3] in spaces of polynomial equations. The case of linear maps (and related spaces) appears in Beltrán-Dedieu-Malajovich-Shub [1] and Boito-Dedieu [6].
The main result relating condition metric and complexity appears in [11](see also [2] and [19]): there is an algorithm that, given a continuous path (f t ) t∈ [0,1] in the space of systems of homogeneous polynomial equations, computes a mesh 0 = t 0 < t 1 < · · · < t k = 1 and approximate zeros (in the sense of Smale) x i associated to ζ t i , where f t (ζ t ) ≡ 0. This algorithm terminates in time linear in where D is the maximum degree of the equations, is an arbitrary parameter and L(f t , ζ t ) is the condition length of the path (f t , ζ t ).
In the linear case, it is rather a remarkable fact that the inverse of the squared distance to singular matrices σ −2 n (A(t)) is log-convex along the condition geodesics. So, in particular, the maximum of log (σ n (A(t)) −2 ) along such paths is necessarily obtained at its endpoints and the condition geodesics stay away from singular matrices.
In our last theorem (Theorem 31) we prove a version of our main result in the context of the solution variety W = {(A, x) ∈ GL n,n+1 × P(K n ) : Ax = 0} .
Above, the notation P(E) denotes the projectivization of a linear space E. Namely, it is the space (manifold) of real or complex lines in E passing through the origin. For instance, P(R 2 ) is the classical projective plane, that can also be obtained by identifying antipodal points of the sphere S 2 . We think this property may help to find good preconditionners to solve linear systems which would be a good area for future research.
We don't know if the analogue of Theorem 31 holds in the solution variety corresponding to the space of polynomial systems, although Proposition 10 proves it for systems which vanish at a given point. If self-convexity of the condition number where to hold into the condition variety itself we would have a good geometric picture of what is involved in choosing a homotopy path in an optimal (or near optimal) way.

Outline of the paper
The condition number is not of class C 1 , hence we cannot apply the usual Riemannian geometry to the condition metric. In Section 2, we introduce Lipschitz-Riemann structures and develop the basic results, that allow us to do differential geometry in the non-smooth case. Using nonsmooth analysis techniques, we prove that any condition geodesic is C 1 with a locally Lipschitz derivative (Theorem 3). Such techniques are already present in Boito-Dedieu [6].
In Section 3 we develop an important tool for proving self-convexity, allowing a more systematic use of the symmetries. (A symmetry is an isometry of a manifold that leaves a function invariant). Theorem 12 gives a simplified computation of the Hessian when there is a Lie group of symmetries. This theorem may be of independent interest. It is so natural we would not be surprised if it is already known, but we have not found it anywhere. We were led to this theorem sometime after a conversation with John Lott on Hessians and Riemannian submersions while he was visiting the University of Toronto.
The strategy for proving the main theorem is to decompose the space of matrices in a finite union of smooth manifolds, so that in each of them the metric is smooth. In Section 4 we produce this decomposition, we study the group of symmetries of the condition number and then, using Theorem 12, we establish self-convexity on each piece.
In Section 5, we prove a result that may be of independent interest, Theorem 29: piecing together convexity results on restrictions of the Lipschitz-Riemann structure to a union of submanifolds of varying dimensions, where the structure is smooth, to obtain a global result.
In Section 6, we use all these tools to finish the proof of Theorem 1. We use the same tools in Section 7 to state and prove Theorem 31 about self-convexity in the solution variety.

Lipschitz-Riemann structures
Most textbooks of Riemannian geometry define a Riemannian structure on a smooth manifold M as a scalar product ·, · x on each tangent space T x M, depending smoothly on x. Here we drop the smoothness hypothesis.

Definition 2.
A Lipschitz-Riemann structure on a C 2 manifold M is a scalar product ·, · x at each T x M, such that its coefficients are locally Lipschitz functions of x. Also, let u x = u, u x be the associated norm in T x M.
The length of an absolutely continuous path whereẋ(t) denotes the derivative with respect to t. Its arc length is given by the map The distance d(a, b) between two points a, b ∈ M is the infimum of all the lengths of the paths containing a and b in their image. We call minimizing path an absolutely continuous path such that L(x, a, b) = d(a, b).
It is usual in differential geometry textbooks to construct geodesics as solutions of a certain second order differential equation, the geodesic differential equation. Unfortunately, the coefficients of this equation are given by a formula in terms of the partial derivatives of the metric coefficients. In a Lipschitz-Riemann structure, those coefficients are assumed to be Lipschitz, not necessarily differentiable functions. Also, it turns out that minimizing paths are not necessarily smooth.
We define a minimizing geodesic as a minimizing path parametrized by arc length, that is when A path in M parametrized by arc length is a geodesic when it is locally a minimizing geodesic.
The main result of this section is the following: Theorem 3. Any geodesic for a Lipschitz-Riemann structure belongs to the class C 1+Lip that is C 1 with a locally Lipschitz derivative.
This theorem is proved in sections 2.4 and 2.4, it extends a similar result by Charles Pugh [21] who proves the existence of locally minimizing C 1+Lip geodesics. His argument is based on a smooth approximation of the Lipschitz structure where the classical toolbox of Riemannian geometry applies, followed by a passageà la limite.
Using different techniques we prove here this regularity assumption for all geodesics.

Existence of geodesics in a Lipschitz-Riemann structure
Existence of minimizing geodesics with given endpoints may be deduced from the Hopf-Rinow Theorem. Because we cannot assume the smoothness of geodesics, we refer to Gromov Two examples of such spaces are given by Boito-Dedieu [6] for linear maps (X is one of the connected components of GL n,m equipped with the condition structure), and by Shub [23] when X is the solution variety associated with the homogeneous polynomial system solving problem equipped with the corresponding condition structure.

Lipschitz-Riemann structures in R k , generalized gradients and the problem of Bolza
An important example of Lipschitz-Riemann structure is given by an open set Ω ⊂ R k equipped with the scalar product where H is a locally Lipschitz map from Ω into the set of positive definite n × n matrices. A minimizing geodesic in the set of absolutely continuous paths y(t) with endpoints y(a) = x(a), and y(b) = x(b). This is an instance of the Bolza problem. For a smooth integrand L, a local solution x(t) of the Bolza problem inf b a L(y(t),ẏ(t))dt, where the infimum is taken in the set of a.c. paths with given endpoints, satisfies the Euler-Lagrange differential equation In our context, it is possible to differentiate L(x,ẋ) = ẋ T H(x)ẋ with respect to the second argument by ordinary differential calculus: If we avoidẋ = 0 (which will be the case), we deduce that L is smooth in the variableẋ and locally Lipschitz in the variable x. For this reason we replace the classical geodesic differential equation by a generalized version of the Euler-Lagrange equation based on generalized gradients. Let f : Ω ⊂ R k → R be a locally Lipschitz function defined on an open set. Its one-sided directional derivative at x ∈ Ω in the direction d ∈ R k is defined as The generalized directional derivative in Clarke's sense of f at x ∈ Ω in the direction d is defined as and the generalized gradient of f at x is the nonempty compact subset of R k given by It turns out that the generalized gradient is always a convex set. When f ∈ C 1 (Ω) the generalized gradient is just the usual one: The generalized directional derivative is related to the gradient via the equality We say that f is regular at x when the two directional derivatives exist and are equal: When f is defined on a C 1 manifold M, we say that f is regular at m ∈ M when its composition with a local chart at m gives a regular map in the usual meaning.
Good references for this topic is Clarke [10] or Schirotzek [22]. For the problem of Bolza described above the counterpart of the Euler-Lagrange equation is given by the following result (see [10] Theorem 4.4.3, and [9]). Theorem 5. Let x solve the Bolza problem (2.1) in the case in which L(x,ẋ) is a locally Lipschitz map and suppose thatẋ is essentially bounded. Then there is an absolutely continuous map p such thaṫ p(t) ∈ ∂ x L(x(t),ẋ(t)) and p(t) ∈ ∂ẋL(x(t),ẋ(t)) a.e.

Proof of Theorem 3
Since Theorem 3 is of local nature, it suffices to prove it locally in R k . Once this is done, take a local chart and transfer the Lipschitz-Riemann structure of M to an open set Ω ⊂ R k where the theorem is already proved. Therefore, let us show the theorem in R k .
By definition, a geodesic is a locally minimizing geodesic. Thus, it suffices to establish the theorem in this case.
A minimizing geodesic x(t) ∈ Ω, a ≤ t ≤ b, is parametrized by arc length so thatẋ (t) T H(x(t))ẋ(t) = 1 a.e., Thus,ẋ(t) is = 0 and essentially bounded: Moreover, x minimizes the integral b a ẏ(t) T H(y(t))ẏ(t)dt in the set of absolutely continuous paths with endpoints y(a) = x(a), and y(b) = x(b). Thus, according to Theorem 5, there is an absolutely continuous arc p such thatṗ for almost all t ∈ [a, b]. Since our integrand is smooth in theẋ variable we may write (2.3) Thus,ẋ(t) = H(x(t)) −1 p(t) is absolutely continuous and x(t) possesses a.e. a second derivativeẍ(t) ∈ L 1 ([a, b], R k ). We now have to show that this second derivative is essentially bounded. This comes from (2.2). Since √ · is a smooth function we get from Proposition 2.3.3 and Theorem 2.3.9 of Clarke's book [10] that From the hypothesis, the functions h ij (x) are locally Lipschitz. Their generalized gradients are compact convex sets in R k . The union of all these sets along the path x(t) gives us a bounded set. Since the curveẋ(t) is continuous, we deduce from these considerations, thatṗ(t) is bounded a.e. Thus p(t) is Lipschitz, andẋ(t) = H(x(t)) −1 p(t) is also Lipschitz. The second derivativë x(t) is thus bounded by the Lipschitz constant ofẋ(t), and we are done.
Remark 6. The previous lines give the following properties for a geodesic x in Ω: x ∈ C 1+Lip ,ẋ T H(x)ẋ = 1, and The initial value problem, and even the boundary value problem associated with this second order differential inclusion, may have many solutions. Examples are given in [6]. Moreover, solutions are not necessarily locally minimizing geodesics and geodesics are not necessarily unique.

Conformal Lipschitz-Riemann structure
The example of a Lipschitz-Riemann structure which motivates this paper is given by the condition structure on GL n,m . It is obtained in multiplying the Frobenius scalar product by the locally Lipschitz function σ −2 n . Let us put it in a more general setting. ·, · κ,x = α(x) ·, · x called condition Riemann structure or simply condition structure. We say that α is self-convex when log α(γ(t)) is convex for any geodesic γ in M κ .
We denote by L (respectively L κ ) the length of a curve γ in the Mstructure (respectively in the M κ -structure). We will speak of length or condition length, and also of distance or condition distance, geodesics or condition geodesics and so on.
Examples of self-convex maps are given in [1] where this concept is introduced for the first time.
Using this definition Theorem 1 above reads 3 Self-convexity in the smooth case and the computation of Hessians 3.1 Self-convexity in the smooth case Self convexity in the smooth case was studied in our previous paper [1] in this journal. We refer the reader to Section 2 of [1] for basic definitions regarding convexity and geodesic convexity. A snapshot of the main features of self-convexity in the smooth case follows.
We denote by D the Levi-Civita connection and by D X T the covariant derivative of a tensor T in the direction given by a vector field X. Recall that if we assume geodesic coordinates in the neighborhood of a point p, then (D X T ) p is the same as the ordinary directional (or Lie) derivative. The covariant derivative is coordinate independent, in the sense that D X T is a tensor.
If f is a function, then its derivative with respect to a vector field is denoted by The second covariant derivative of a function f (sometimes also known as the Hessian) is defined by where X and Y are smooth vector fields. The operator above is symmetric, in the sense that When α : M → R is C 2 , self-convexity of α is equivalent to the second covariant derivative of log(α) being positive semi-definite in the α-condition Riemann structure (see [28] Chap. 3, Theorem 6.2). Note that the second covariant derivative of a map M → R is different in M and in M κ . We denote them respectively by D 2 or D 2 κ . Self-convexity of α is equivalent to for any x ∈ M and for any vectorẋ ∈ T x M, the tangent space at x.

Self-convexity in a product space
Proposition 2 of [1] has an immediate corollary which can be useful. Suppose N is another C 2 Riemannian manifold. Give M × N the product metric. Let π : M × N → M be the projection on the first factor andα : M × N → R be the compositionα = α • π.
We thank an anonymous referee for pointing out the if part of this Proposition and simplifying the proof.
Proof. We prove first the only if part. Let (x, y) ∈ M × N and assume normal (geodesic) coordinates in a neighborhood of x ∈ M. Also, assume normal coordinates around y ∈ N with respect to the inner product ·, · N .
We claim that this defines a system of normal coordinates in M×N . This can be seen from the fact that the exponential map in a product manifold M × N is the partitionning of the exponential mappings of M and N . However, we give a direct proof below.
Let g ij and Γ k ij denote respectively the coefficients of the first fundamental form ·, · x,y and the Christoffel symbols. By construction, g ij (x, y) = δ ij . Also, it is easy to see that for all indexes i, j, k, Indeed, if indices (i, j, k) correspond to the same component M or N this follows from the choice of normal coordinates in each component. Otherwise, say that i, j correspond to coordinates in M and k to coordinates in N . Then g ik ≡ g jk ≡ 0 and furthermore, Thus Γ ikj (x, y) = 0 for all indexes i, j, k. This implies that Γ k ij (x, y) = 0 as well. Thus we have a normal system of coordinates around (x, y) ∈ M × N .
In that system of coordinates, From the block structure of the second covariant derivative above, it is clear We have raised the question in the introduction of whether self-convexity of the condition number holds for the condition Riemann structure on the solution variety considered in [23]. The theorems proven in this paper apply to the case of linear systems, but with the use of Proposition 9 they give us some information on polynomial systems almost for free. The proof follows from Propositions 19, 9 and Theorem 29.
An important point is that self-convexity is well-defined for Riemannian manifolds. Therefore, is we want to speak of self-convexity in P d,0 , we need to make it into an inner product vector space. We will follow [5] and assume the unitarily invariant metric in the space of degree d i polynomials. This is the same as the metric for symmetric d i -tensors. Then we define the product metric for P d and it is inherited by the subspace P d,0 . In more precise terms: The unscaled, normalized condition number is defined, for f ∈ P d,0 , by

Computation of the Hessian
When analyzing the convexity properties of σ n (A), we first note that this function is invariant through unitary changes of coordinates, namely for unitary matrices U ∈ U n , and V ∈ U m (resp. orthogonal matrices U ∈ O n , and V ∈ O m ). Let us consider this situation in a general framework. A Lie group is a group that is also a smooth manifold, and such that the group operations (multiplication and inversion) are smooth. We say that a Lie group G acts (smoothly) on a manifold M if there is a smooth map : G × M → M with ((g 1 g 2 ), p) = (g 1 , (g 2 , p)) and (1, p) = p .
In the example above, G = U n × U m acts on GL n,m by ((U, V ), p) = U pV * . For simplicity, we may write g(p) for (g, p) and assimilate g to the mapping p → g(p) = (g, p). We say that the Lie group G acts by isometries when for all g, the corresponding map g : p → g(p) is an isometry of M.
Definition 11. Let α : M → R. A group of symmetries of α is a Lie group, acting smoothly by isometries on M, and leaving α invariant (that is, α(g(p)) = α(p) for all g ∈ G and p ∈ M.
Note that it may happen (for instance, if G is a discrete group) that Given p ∈ M , G(p) = {g(p) : g ∈ G} will denote the G-orbit of p. The orbit G(p) is a manifold. If the group G is compact, the orbit is an embedded manifold. In any case, T p G(p) will denote the tangent space of the orbit G(p) at p, as a subspace of T p M. It can also be described as the set of all for a ∈ g, the Lie algebra of G.
For instance, when G = U n × U m , then g is A n × A m (the anti-symmetric matrices) and exp(ta) is the usual matrix exponential: . Let the vector field K be the infinitesimal generator associated with some element a in the Lie Algebra Let φ t (q) = φ(t, q) be the flow of grad α, defined for t ∈ (−ε, ε) and q close enough to p. Let B be a smooth vector field in M such that B(φ t (p)) = Dφ t (p)b where D denotes the usual derivative applied to the diffeormorphism φ t : M → M . Then, the following equality holds: Above, grad α( B, K )(p) = grad α(p), grad ( B, K p ) p is the directional derivative of B, K with respect to grad α.
Let us recall from (3.1) the intrinsic definition of the second covariant derivative or Hessian.
where X, Y are vector fields, X(p) = v, Y (p) = w, and D is the Levi-Civita connection. Also, [X, Y ] is the Lie bracket of two vector fields X and Y . It is defined for any α of class C 2 by It turns out that this is a first order differential operator, hence [X, Y ] is a vector field.
Another useful identity relating the Lie bracket and the Levi-Civita connection is: The proof of Theorem 12 is a consequence of the two following lemmas: Lemma 13. For any vector field X on M , we have Moreover,

4)
Proof. We recall that for vector fields X, Y, Z, Note that K(p) = k and K(q) ∈ T q G(q) for q ∈ M. As α is G-invariant, Moreover, the one-parameter group generated by K consists of global isometries, thus K is a Killing vector field, which implies that for any pair of vector fields X, Y , We can now compute Using grad α, K = 0, we conclude which proves the first assertion. When X = K, the second term above vanishes: using (3.3), Proof. By continuity of the formulas in the lemma, we can assume that k = 0 and that b, grad α(p) are lineary independent. Let N 0 be a codimension 2 submanifold of M with p in its interior. Assume that b ∈ T p N 0 , k is orthogonal to T p N 0 , and grad α(p) ∈ T p N 0 . Let N = ∪φ t (N 0 ) with φ t the flow associated with grad α and where the union is taken in a small interval around t = 0. N is a codimension 1 submanifold. For small ε, the integral curve of grad α is thus contained in N, and for Proof of Theorem 12. The second covariant derivative is a symmetric bilinear form. Thus, Theorem 12 follows from lemmas 13 and 14.
Corollary 15. Assume that for every p ∈ M: Here, φ t (q) = φ(t, q) is the flow of grad α, defined for t ∈ (−ε, ε) and q close enough to p.
• For every a ∈ g, the associated vector field Proof. α is self-convex if and only if D 2 κ log(α) is positive semi-definite. Now, let v = b + k ∈ M. According to Theorem 12, where K is as defined in Theorem 12 and B is a vector field such that B(φ t (p)) = Dφ t (p)b. Note that grad κ α( B, K κ ) depends only on the value of B, and K along the integral curve φ t (p). Moreover, from the second item in the hypotheses of our corollary. Thus, we have This quantity has to be non-negative for every v or, equivalently, • D 2 κ log α(p) has to be positive semi-definite in (T p G(p)) ⊥ , and • grad κ (( K κ ) 2 )(p), grad κ log(α)(p) κ,p ≥ 0 for every vector field K, K(q) = d dt (exp(ta)q) t=0 where a ∈ g. The second of these two items can be re-written using the original Riemannian structure ·, · . Note that The corollary follows.

Self-convexity in spaces of matrices
Let u ≤ n and (k) = (k 1 , . . . , k u ) ∈ N u such that k 1 + · · · + k u = n. We define P (k) as the set of matrices A ∈ GL n,m with u distinct singular values Above, U n is the group of unitary n × n matrices. If K = R, it should be replaced by the group of orthogonal n × n matrices.
We also let Notice that the singular values σ 1 > · · · > σ u can vary within each P (k) or each D(k).
Proof. To prove that P (k) is a real smooth embedded submanifold of GL n,m we use Lemma 33 (see the appendix). We take G = U n × U m , M = GL n,m , and D = D (k) . The group action of G on M is given by Under this action, the image of D (k) is P (k) , and the equivalence relation The last point to check to apply Lemma 33 is the continuity of the inverse of i. Suppose that X p → X with X p , X ∈ Im i = P (k) . We can write them X p = U p D p V * p and X = U DV * . Let (U pq , V pq ) be a subsequence which converges to (Ũ ,Ṽ ) (G is compact). Since X pq → X we have D pq →Ũ * XṼ = D, andŨDṼ * = U DV * . Now we consider the sequenceŨ * X pṼ . It is a convergent sequence, hence it has a unique limitD and (Ũ ,Ṽ ,D)R(U, V, D). Thus, π(Ũ * U p ,Ṽ * V p , D p ) converges to π(I, I, D). By left U n × U m action, we conclude that π(U p , V p , D p ) converges to π(U, V, D) as required.
Thus, the hypothesis of Lemma 33 is satisfied and P (k) is a real smooth embedded submanifold of GL n,m .
The computation of its dimension is easy: it is given by the difference of the dimension of G × D (k) and the dimension of the fiber above any point in the quotient space, that is The tangent space T D P (k) , D = diag (σ 1 I k 1 , . . . , σ u I ku ), is the image of the tangent space T (In,Im,D) G × D (k) by the derivative D(i • π)(I n , I m , D). It is the set of matrices AD +Ḋ − DB withḊ = diag (λ 1 I k 1 , . . . , λ u I ku ), A and B skew symmetric of sizes n and m. They all have the type described in Proposition 16 and this space of matrices has the right dimension.
Let us prove the smoothness of the map X ∈ P (k) → σ i (X) ∈ R. Since the map (U, V, D) ∈ G×D (k) → σ i (D) is smooth, and constant in the equivalence classes, the map π(U, V, D) ∈ (G × D (k) )/R → σ i (D) = σ i (U DV * ) is also smooth. Thus the map X = U DV * ∈ P (k) → σ i (X) is smooth as the composition of the previous map by i −1 .
Lemma 17. Let I be an open interval. Let (γ(t)) t∈I be a smooth path in P (k) . Then, there are smooth paths U (t) ∈ U n , V (t) ∈ U m and Σ(t) ∈ D (k) so that for all t ∈ I.
Proof. We will show that U (t), V (t) and Σ(t) are solutions of a certain differential equation on the manifold U n × U m × D (k) . An important fact to be used below is that T I U n is the space of skew-hermitian matrices. In the real case, T I O n is the space of skew-symmetric matrices. Let us assume for a while that (4.1) admits a solution. Differentiating (4.1) with respect to t, we obtain after a few trivial manipulations that For shortness, let We have now: Using block notation, we obtain for i < j that The equation for block M ji (t) reads: Transposing, We obtain therefore The blocks in the diagonal (that is, i = j) are of the form

hence we can solve by setting
Equations (4.2)-(4.3) are a system of smooth non-autonomous ordinary differential equations in variables U ∈ U n , V ∈ U m and Σ ∈ D (k) . The Lipschitz condition holds. Hence, for every t 0 ∈ I, there are > 0 and local solutions U (t), V (t) and Σ(t) for t ∈ (t 0 − , t 0 + ), solving (4.1). In order to show the existence of a global solution on all the interval, we need to check that as t → t 0 + , the solution converges to a limit in U n × U m × D (k) . The convergence of U (t) and V (t) follows from compactness of the unitary group. Because γ(t 0 + ) ∈ P (k) , Hence, the solution (U (t), V (t), Σ(t)) can be extended to an interval that is open and closed in I, hence to all I.
Let α : GL n,m be defined by α(A) = σ n (A) −2 . We also denote by α = σ −2 u its restriction to P (k) or to D (k) . We first consider first the case of diagonal matrices, then we prove self-convexity of α in P (k) .
1. If Σ 1 , Σ 2 ∈ D (k) , then any minimizing condition geodesic in P (k) joining Σ 1 and Σ 2 lies in D (k) , 2. The set D (k) is a totally geodesic submanifold of P (k) for the condition metric, namely, every geodesic in D (k) for the induced structure is also a geodesic in P (k) , or equivalently: 3. If Σ ∈ D (k) andΣ ∈ T Σ D (k) , then the unique geodesic in P (k) through Σ with tangent vectorΣ at Σ, remains in D (k) .
Proof. According to Proposition 16, P (k) is a smooth Riemannian manifold for the condition structure. Let γ(t), 0 ≤ t ≤ T, be a minimizing condition geodesic with endpoints Σ 1 and Σ 2 ∈ D (k) . Let γ(t) = U t Σ t V * t be a singular value decomposition of γ(t), choosen as in Lemma 17. Let σ u (t) be the smallest singular value of γ(t). It suffices to see that because the diagonal terms inΣ t are real numbers and those of A t Σ t − Σ t B t are purely imaginary when K = C and vanish when K = R. When γ t does not belong to D (k) , then the inequality above is strict.
The second assertion is an easy consequence of the first one, The third assertion is another classical characterization of totally geodesic submanifolds, see [20] Chapter 4, Proposition 13 or Theorem 5.
Finally, for log-convexity of α(X) = σ u (X) −2 , using [1] Proposition 3, it suffices to see that for Σ ∈ D (k) andΣ ∈ T Σ D (k) , where the second derivative is computed in the Frobenius metric structure. Now, is maximized for the 'unit vector' (in block representation) We deduce that The right-hand-side of (4.4) is precisely and equation (4.4) follows.
Proposition 19. The map α = σ −2 u is self-convex in P (k) . Proof. By unitary invariance, we may choose as initial point a matrix Σ ∈ D (k) with ordered distinct diagonal entries σ 1 > . . . > σ u > 0. We use Corollary 15, with the group G being U n × U m and the action The Lie algebra of G is the set A n × A m where A k is the set of k × k skewsymmetric matrices. We write G(L) for the G-orbit of a point L ∈ P (k) . In our case, this is the manifold of all U LV * with U ∈ U n , V ∈ U m . The tangent space to the Lie group action at L is the tangent manifold T L G(L) ⊆ T L P (k) .
First, we note that for any L ∈ D (k) , we have Let us denote by S this last set. We claim that S = D (k) . Indeed, D (k) ⊆ S, because the diagonal of any matrix of the form B 1 L+LB * 2 is purely imaginary and hence orthogonal to D (k) . The other inclusion is easily checked by a dimensional argument: The dimension of D (k) is u and the dimension of S is dim(P (k) ) − dim {B 1 L + LB * 2 : (B 1 , B 2 ) ∈ A n × A m } , that is dim(P (k) ) minus the dimension of the orbit of L under the action of U n × U m . We have computed these two quantities in Proposition 16, and we immediately conclude that dim(S) = u, for both K = C and K = R. Thus, for all L ∈ D (k) , (T L G(L)) ⊥ = D (k) .
We now check the three conditions of Corollary 15.
• We have to check that for small enough t, and foṙ where φ t is the flow of grad κ α. In our case, φ t can be computed exactly. Indeed, Thus, grad α preserves the diagonal form, and φ t (Σ) ∈ D (k) is a diagonal matrix, for every t while defined. Thus, Dφ t (Σ)(Σ) is again a diagonal matrix, for every diagonal matrixΣ. This proves that the second condition of Corollary 15 applies to our case.
• For (B 1 , B 2 ) ∈ A n × A m , the vector field K on GL n,m generated by (B 1 , B 2 ) is Note that 1. K * as a linear operator on GL n,m satisfies K * Thus, Hence, it suffices to see that J ≥ 0 where Expanding this expression and writing Σ = Σ * − σ u E * , we have which by Lemma 20 below is a non-negative quantity. The proposition follows.
and let us write B, C by blocks, where B 1 , C 1 are of the size of L and B 4 , C 4 are of the size of I ku . Then, Thus, . We will prove that these two terms are non-negative. For the first one, note that For the second one, we check that for every l, 1 ≤ l ≤ n−k u the l-th diagonal entry of the matrix ( L has a positive real part. Indeed, if we denote by v ∈ K ku the l-th row of B 2 by w ∈ K ku the l-th row of C 2 and by x the l-th row of C 3 , we have as σ u < σ l . This finishes the proof of Lemma 20 and hence of Proposition 19.

Puting pieces together
Before stating the main result of this section we have to introduce the following machinery:

Second symmetric derivatives
In the case of Lipschitz-Riemann structures, the mappings we want to consider are not necessarily C 2 and, to study their convexity properties, an approach based on the usual covariant second derivative is insufficient. We will use instead the second symmetric upper derivative. Let U ⊆ R k be an open set and φ : U → R be any function. The second symmetric upper derivative of φ at x ∈ U in the direction v ∈ R k is which is allowed to be ±∞. If U ⊆ R is an interval, we simply write SD 2 φ(x) for SD 2 φ(x; 1). It is well-known that a continuous function φ on an interval is convex if and only if SD 2 φ(x) ≥ 0 for all x (see for example [27] Theorem 5.29). There is a stronger result due to Burkill [8] Theorem 1.1 (see also [27] Corollary 5.31) which uses a weaker hypothesis: Theorem 21 (Burkill). Let φ :]a, b[→ R be a continuous function such that SD 2 φ(x) ≥ 0 for almost all x ∈]a, b[, and assume that SD 2 φ(x) > −∞ for x ∈]a, b[. Then, φ is a convex function.
Theorem 21 will allow us to assemble the pieces where convexity is proven in Proposition 19 to prove our main results (Theorems 1 and 31). We proceed a little more generally as the result may be of interest in other circumstances. Let M be a k-dimensional C 2 manifold (not necessarily having a Riemannian structure).
The following lemma is a consequence of Definition 22.
Lemma 23. Let M be a C 2 manifold, let α : M →]0, ∞[ be a locally Lipschitz mapping. Then, SD 2 α > −∞ if and only if, for any function Proof. The if part is trivial (just make φ(t) = t). In order to prove the only if part, we assume that SD 2 α > −∞. Let x ∈ M and let ϕ x : U x → R k be a coordinate chart such that ϕ x (x) = 0 and and similarly Notice that lim p→∞ is Lipschitz in a neighborhood of 0 we have, for a suitable constant D > 0, H 2 p ≤ Dh 2 p and K 2 p ≤ Dh 2 p . Thus, taking the lim sup as p → ∞ gives SD 2 φ(α • ϕ −1 x )(0, v) ≥ C + D > −∞ and we are done.

Projecting geodesics on submanifolds : the Euclidean case
The following technical lemma, interesting by itself, is a consequence of Lebesgue's Density Theorem.
Lemma 24. For any locally integrable function f defined in R with values in R n , let x ∈ R be a point where f is locally integrable. This means that Proof. Notice that, by Lebesgue's differentiation theorem, an antiderivative F of f exists a. e. and it is absolutly continuous. Suppose that F (x) = 0. Let us define so that h is a continuous function and F (y) = (y − x)h(y) for any y. Integrating by parts gives Since h is continuous, by the Mean Value Theorem, there exists ζ ∈ [x, x + ε] such that as ε → 0. On the other hand and we are done.
Our aim is now to see how close are a geodesic in a Lipschitz-Riemannian manifold and a geodesic in a submanifold when they have the same tangent at a given point. Let us start to study a simple case.
Let us consider the Lipschitz-Riemann structure defined on an open, kdimensional set Ω ⊂ R k containing 0 by the scalar product u, v x = v T H(x)u (see section 2.3).

The matrix H(0) is supposed to have the following block structure
We also suppose that (see section 2.3) 2. The entries h ij (x) of H(x) are regular at x = 0, The set Ω p = Ω ∩ (R p × {0}) is a submanifold in Ω. We suppose that so that Ω p is in fact a smooth C 2 Riemannian manifold for the induced H−structure. Let us now consider a vector a ∈ R p ×{0} and three parametrized curves denoted by x, x p , and y defined in a neighborhood of 0 in R, and such that: 6.
x is a geodesic in R k for the H-structure, 8. y is a geodesic in R p × {0} for the induced structure.
According to Theorem 3, x has regularity C 1+Lip so that its second derivative exists a.e. We suppose here that 9. The second derivativeẍ(t) is defined at t = 0, and In this context we have: Lemma 25. Under the hypotheses 1 to 9 above, the curves x p and y have a contact of order 2 at 0: x p (s) = y(s) + o(s 2 ).
We want to prove that x p (s) = y(s)+o(s 2 ). According to Taylor's formula with integral remainder, we have From Lemma 24 and hypothesis 9, the limit of this expression exists at s = 0, and it is equal toẍ p (0) −ÿ(0) = 0. This achieves the proof.
→ R k be a geodesic in R k with respect to the Lipschitz-Riemann structure. Then, there exists a zero-measure set Z ⊆ [a, b] such that for t 0 ∈ [a, b] \ Z the following holds: ) has a contact of order 2 with y(t), the unique geodesic in R p with respect to the Lipschitz-Riemann structure H p with initial conditions From Lemma 25, for every such t 0 , if in addition x(t 0 ) ∈ R p × {0} anḋ x(t 0 ) ∈ R p × {0}, then x p (t) has a contact of order 2 with y(t) and we are done.

Projecting geodesics on submanifolds : the Riemannian case
Our aim, in this section, is to prove another version of Lemma 25 in a different geometric context. Let M be a C 3 Riemannian manifold with distance d, of dimension k, and let N be a submanifold of dimension p. Let us first define the projection onto N (Fig.1). To each q ∈ N and to a vector u = 0 normal to N at q we associate the geodesic γ q,u in M such that γ q,u (0) = q andγ q,u (0) = u. Let n ∈ N be given, and let U be an open neighborhood of n such that, for each m ∈ U there exists a unique geodesic arc See Li-Nirenberg [18] or Beltran-Dedieu-Malajovich-Shub [1].
We call it the α−structure. We suppose that α is C 2 when it is restricted to N so that N is C 2 and not only Lipschitz for the induced α−structure. "If γ(t 0 ) ∈ N andγ(t 0 ) ∈ T γ(t 0 ) N , then the projection γ N (t) = (K • γ)(t) of γ onto N has a contact of order 2 with δ(t), the unique geodesic in N such that δ(t 0 ) = γ(t 0 ) andδ(t 0 ) =γ(t 0 )". Remark 28. If M, N and α are assumed to be smooth then Z = ∅ in Proposition 27. See for example the proof of Proposition 5.9 in [29].
Proof. The proof consists in a transfer from M to R k where we apply Proposition 27. Let We have to check that Z is a zero measure set. It suffices to see that for every t ∈ (a, b) there is an open interval I containing t and such that I ∩ Z has zero measure. Without loose of generality, we may assume that t = 0. Thus, let t = 0 ∈ (a, b) and let n = γ(0).
Since M is C 3 , the normal bundle to N is C 2 and there exists a C 2 diffeomorphism φ : U → V ⊂ R k , where V is an open set containing 0, satisfying 3. For any q ∈ N and any vector u = 0 normal to N at q, φ (γ q,u ) is a straight line in R k orthogonal to R p × {0}.
We make φ an isometry in defining on V ⊂ R k a Lipschitz-Riemannian structure by Dφ(m)u, Dφ(m)v φ(m) = α(m) u, v m for any m ∈ U , and u, v ∈ T m M. Let us denote x = φ(m), a = Dφ(m)u, b = Dφ(m)v, we also write this scalar product where H is a locally Lipschitz map from V into the k × k positive definite matrices.
Notice that H is regular because α is regular in N . Since for everyn ∈ N ∩ U , H(x) has the block structure Since α is C 2 when restricted to N we have the same regularity for the restriction of H to R p × {0}.
Since φ is an isometry the curves φ • γ and φ • δ are geodesics in R k and R p ×{0} respectively, and, from the definition of φ, the orthogonal projection (in the Euclidean meaning Thus, the hypotheses of Proposition 26 are satisfied so that φ • γ N and φ • δ have an order 2 contact at every t out of a zero measure set Z 0 . This gives easily an order 2 contact for γ N and δ at t ∈ Z 0 in M in terms of the α−distance but also, since 1/α is locally Lipschitz, in terms of the initial Riemannian distance. The proposition follows.

Arriving to the main theorem
We are now ready to state the main theorem in this section: Riemannian manifold, enumerable union of the submanifolds M i . Let α : M →]0, ∞[ be a locally Lipschitz mapping. Assume that: 1. α is regular, 2. For each i, the restriction of α to M i is C 2 and self-convex in M i , Then, α is self-convex in M.
Proof. Once again we add to M the α−structure. If this theorem is false, there exists a geodesic γ in M for the α−structure such that SD 2 log(α(γ(t))) < 0 on a positive measure set P ⊂ R (Theorem 21 and Lemma 23). Since an enumerable union of zero-measure sets is also a zero-measure set, we can suppose that P ⊂ M i for some i, so that γ(t) ∈ M i for every t ∈ P .
According to the Lebesgue Density Theorem, almost all points t ∈ P are density points, that is We remove the "non-density points" from P to obtain a new set, also called P , with positive measure and only density points. Since γ ∈ C 1+Lip (Theorem 3), the second derivativeγ(t) exists for almost all t. We also remove from P the zero measure set of Proposition 27. Let t ∈ P be given. Since it is a density point of P , we have s ∈ P for "a lot of points" close to t. Since γ(s) ∈ M i for such points, and since γ is Take now the geodesic δ in M i for the induced α−structure such that δ(t) = γ(t) andδ(t) =γ(t). As we have removed the zero-measure set of Proposition 27, γ i and δ have a contact of order 2 at t. By self-convexity of α in M i , and since δ is C 2 we get Let us now consider It is not difficult to prove that t is a density point of Let us denote by γ i the projection of γ on M i (see section 5.3). For the points From the contact of order 2 between γ i and δ we then conclude, Since δ is C 2 , taking the limit as h → 0 gives Since this last expression is nonegative we obtain SD 2 log(α(γ(t))) ≥ lim ∆ 2 (h) ≥ 0 which contradicts our hypothesis SD 2 log(α(γ(t))) < 0 on P .

Proof of Theorem 1
Theorem 1 is a consequence of Theorem 29 applied to M = GL n,m considered as the union of the submanifolds P (k) (see section 4) and to the mapping α(A) = σ n (A) −2 , the inverse of the square of the smallest singular value of A ∈ GL n,m . According to propositions 16 and 19 we just have to prove that α is a regular map and that SD 2 α > −∞. Let us start with this last inequality. We must prove that for every A ∈ GL n,m , B ∈ K n×m , where A h = A + hB. Now, let S + n be the set of symmetric, positive definite n × n matrices. Then, where, λ n denotes the smallest eigenvalue. Since, for any S ∈ S + n , λ n (S) = inf it is a concave function of S, and λ −1 n is convex. Thus, We conclude that This last quantity is bounded in absolute value for λ −1 n is locally Lipschitz, so in particular SD 2 σ −2 n (A; B) > −∞. To prove that α is regular it suffices to write it as the composition of C 1 maps and of the convex λ −1 n which is also a regular map (see [10] Prop. 2.3.6). This finishes the proof of our Main Theorem 1.
Remark 30. In [1] we have sometimes taken A to lie in the unit sphere of K n×m or even the projective space P(K n×m ). The interested reader can check [1] for the relations between self-convexity in the various settings.
As we have done in the case of GL n,m , we divide the proof in several sections.
Proposition 32. For any choice of (k), the set W (k) is a smooth submanifold of W, σ u is a smooth function and α = σ −2 u is self-convex in W (k) .
Proof. Let us consider the map which is a smooth mapping between two smooth manifolds. Since 0 is a regular value of ψ, its preimage ψ −1 (0) is a smooth submanifold of P (k) × K n+1 \ {0}. Moreover, σ u is the composition of the projection onto the first coordinate W (k) → P (k) and the function σ u which is smooth by Proposition 16. To check that σ u is self-convex in W (k) we use Corollary 15 and proceed as in the proof of Proposition 19. Let G = U n × U n+1 , and consider the action G × W (k) → W (k) ((U, V ), (A, x)) → (U AV * , V x) Let p = (Σ, e n+1 ) where e T n+1 = (0, . . . , 0, 1) and Σ ∈ D (k) has ordered distinct singular values σ 1 > · · · > σ u > 0. Recall that T p G(p) is the tangent space in p of the orbit G(p) of p by the Lie group G. As in Propositions 16 and 19, we have Note that T p G(p) ⊥ is isometric to the set of diagonal n × n matrices with eigenvalues σ 1 > . . . > σ u > 0 of respective multiplicities k 1 , . . . , k u . Let us check the conditions of Corollary 15. By unitary invariance, we can choose a pair p = (Σ, e n+1 ) as above.

2.
We have to check that for small enough t, and for b = (Σ, 0) ∈ T p G(p) ⊥ , Dφ t (p)b is perpendicular to where φ t is the flow of grad κ α in W (k) . Now, as in the proof of Proposition 19, the operator grad preserves the diagonal form of (Σ, e n ) and hence Dφ t (p)b is of the form (Σ , 0) where Σ is diagonal with Σ e n = 0. In particular, it is orthogonal to T φt(p) G(φ t (p)). Thus, the second condition of Corollary 15 applies to our case.

Appendix
In this appendix we prove the following which gives a sufficient condition for the image of a submanifold under a group action to be a submanifold. 3. For every sequence (x k ) ∈ (G × D)/R such that (i(x k )) converges to y ∈ P the sequence (x k ) converges, then, P is an embedded submanifold in M.
Proof. Let X be a manifold and let R denote an equivalence relation defined on X . A classical necessary and sufficient condition to define on the quotient space X /R a unique quotient manifold structure making the canonical surjection π : X → X /R a submersion is the following: the graph G of the relation is a closed submanifold in X ×X and the first projection pr 1 : G → X is a submersion.
In the context of our lemma this condition comes from the first hypothesis and from the definition of the equivalence relation via the group action.
Let f : Y → Z be a smooth map between two manifolds. Its image f (Y) is a submanifold in Z when f is an immersion and a homeomorphism onto its image.
By construction, i is smooth. It is a homeomorphism by the third hypothesis and an immersion by the second one. To check that it is injective, we have to show that if gd = g d , then (g, d)R(g , d ). This follows from the construction of the relation R.