Improved approximation rates for a parabolic control problem with an objective promoting directional sparsity

We discretize a directionally sparse parabolic control problem governed by a linear equation by means of control approximations that are piecewise constant in time and continuous piecewise linear in space. By discretizing the objective functional with the help of appropriate numerical quadrature formulas, we are able to show that the discrete optimal solution exhibits a directional sparse pattern alike the one enjoyed by the continuous solution. Error estimates are obtained and a comparison with the cases of having piecewise approximations of the control or a semilinear state equation are discussed. Numerical experiments that illustrate the theoretical results are included.


Introduction
Sparsity promoting techniques in optimal control are of great practical importance. A sparse control is usually easier and cheaper to implement physically than another one that does not have this property. Moreover, a small support for the control is essential in many practical implementations. An important issue is to determine the small region from where the action of the control is the most efficient. Consequently, a great effort has been made during the last years to develop mathematical techniques that lead to problems that have sparse solutions and to solve them.
In [26] it is proved that a term involving the L 1 -norm of the control in the objective functional leads to a solution with sparse structure. The problem treated in that work is a stationary problem governed by a linear elliptic equation. In [6,7] and [29] the finite element discretization of this kind of problems is studied. An alternative to the L 1 approach is to use measures as controls. See [4,13,14,23].
Sparse time-dependent problems have been investigated in [1, 5, 8-10, 12, 15, 16]. In this type of problems, there are different possible formulations leading to optimal controls that have different sparse structures; see [8]. Here, we are interested in controls that are sparse in space, not necessarily in time. Moreover, it is desirable that the sparsity pattern does not change in time. Such kind of sparsity is called directional sparsity. The term was coined in the paper [15], where an optimal control problem governed by a linear parabolic equation is studied; to obtain directional sparsity, the authors introduce a term involving the norm in L 1 (Ω; L 2 (0, T )). They describe the sparsity properties of the optimal control as well as a semi-smooth Newton method to solve the problem.
In this work, we continue our study, started in [10], about finite element approximations of problems where the optimal solutions are directionally sparse.
Both in the paper at hand and in [10] the state equation is a parabolic partial differential equation and the functional involves a tracking-type term for the state, a Tikhonov regularization term for the control and a non differentiable term that promotes spatial sparsity. Also in both works, the state and the adjoint state equations are discretized using a discontinuous in time and continuous in space Galerkin scheme, namely dG0-cG1, and the controls are discretized in time using piecewise constant functions.
While in [10] the controls are discretized in space using piecewise constant functions, in this work we discretize the controls using functions that are piecewise constant in time and continuous piecewise linear in space. For easier comparison of the results, we will refer to the space of approximation functions used in [10] as Uσ and to the space of approximation functions used in this work as Vσ.
The theoretical analysis becomes more involved for functions in Vσ. One of the main consequences of using elements in this space is that, with the usual discretization of the functional, the discrete solution may lose its directionally sparsity properties.
To overcome this difficulty, we approximate the integrals of the cost functional involving the control by a numerical quadrature formula (the analogous in 2D or 3D to the well known composite trapezoid rule). Proceeding in this way, we recover the same sparsity pattern exhibited by the continuous solution of the control problem. This sparse structure was also proved in [10] for approximations of the controls by piecewise constant functions in time and space. This numerical integration leads to new difficulties in the derivation of error estimates. Finally, we are able to obtain an order of convergence for the L 2 (Q)error in the control variable of order O(τ + h) in the case of having a parabolic linear state equation. Numerical experiments in Section 8 confirm this obtained order of convergence. This order seems optimal for our approach, regarding the H 1 (Q)-regularity of the optimal control -see [10,Theorem 3.3]-, which limits the order of convergence we can achieve. This is not a special situation in sparse control; see e.g. [18,Theorem 6.1] or [11,Theorem 2.14], where it is emphasized how the approximation error by discrete functions affects the error estimates of the solutions of different optimal control problems. In [19] a problem similar to ours but without sparsity promoting terms is studied and, as in our case, the optimal control exhibits H 1 (Q)-regularity. If no extra assumptions are done, the order of convergence obtained is also O(τ + h); see [19,Corollary 5.3].
In [10] an order of O( √ τ + h) is obtained. In that paper, the equation is semilinear. With the technique of proof that we show in this work, it can be proven that the order of convergence for approximations in Uσ is also O(τ + h) if the state equation is linear. On the other hand, once we have shown how to deal with elements of Vσ, the proofs of [10] can be adapted to obtain an order of convergence of O( √ τ +h) when the state equation is semilinear. We comment on this in Section 7.
A close inspection to the proofs of [10], in comparison with our Theorem 6.7 and the corresponding result for the semilinear elliptic case [6] gives some insight on the reason we obtain an order of convergence of just √ τ in time for the case of having a semilinear parabolic equation. In the elliptic case, the second order sufficient optimality conditions do not involve the non-differentiable terms of the functional, and in our Theorem 6.7 we use the linearity of the equation to get rid of this terms in the obtention of error estimates. Nevertheless, second order sufficient optimality conditions for the parabolic case involve the non-differentiable term of the functional, see [10,Theorem 4.2], which is what eventually leads to this order √ τ .
In [16] a different approach is followed to obtain directional sparsity. Instead of using functions, the control variable is a measure in M(Ω; L 2 (0, T )), whose norm in this space is introduced in the functional. The problem is governed by a linear parabolic equation and the error obtained for the L 2 (Q) norm of the state The plan of the paper is as follows. We introduce the problem in Section 2 and recall some results about the continuous problem in Section 3. The problem is discretized in Section 4. Our main results are in Section 5, where we show the sparsity pattern of the discrete solution, and in Section 6, where we prove the convergence of the discretization and provide error estimates. In Section 7 we comment how to adapt the results of [10] to obtain an error estimate of order O( √ τ + h) in the case of having a semilinear equation and how our methods of proof are also valid to show order of convergence O(τ + h) for piecewise constant approximations of the control if the equation is linear. Finally, we perform numerical experiments in Section 8.
3 Some results about the continuous problem.
Let us recall some results concerning problem (P). Since the equation is linear, the study of the differentiable part of the functional is classical. See e.g. [17] or [28]. Under the Assumptions 1-2, for every u ∈ L 2 (Q) there exists a unique yu ∈ Y = H 2,1 (Q) = L 2 (0, T ; H 2 (Ω)) ∩ H 1 (0, T ; L 2 (Ω)) solution of (1) and the controlto-state mapping is C ∞ . Also, F : A * being the adjoint operator of A. Next we state the properties of the nondifferentiable part of the functional. The proof of the following result can be found in [8].
for a.a. x ∈ Ωu and t ∈ (0, T ), (5) where We finish this section stating existence, uniqueness, first order optimality conditions and regularity of the solution. Existence of a solution in L 2 (Q) can be proved by the usual method of taking a minimizing sequence that is bounded in L 2 (Q). We omit the details of the proof since it is completely standard. Uniqueness follows from the strict convexity of the functional with respect to u. First order optimality conditions and regularity of the solution are proved as in [10,Theorem 3.3].

Numerical approximation
In this section, we will further assume that Ω is convex. We will discretize both the state and the control using functions that are continuous piecewise linear in space and piecewise constant in time. To this aim, we will consider, cf. [2, def. (4.4.13)], a quasi-uniform family of triangulations {K h } h>0 ofΩ and a quasiuniform family of partitions of [0, T ], 0 = t 0 < t 1 < · · · < t Nτ = T . We will denote Ω h = int ∪ K∈K h K, N h and N I,h the number of nodes and interior nodes of K h , . We assume that every boundary node of Ω h is a point of Γ . Additionally we suppose that the which is always satisfied if n = 2 and Γ is of class C 2 ; see, for instance, [24,Section 5.2]. Under this assumption we have that where | · | denotes the Lebesgue measure. In the sequel we denote Q h = Ω h × (0, T ). Now we consider the finite dimensional spaces where P 1 (K) is the space of polynomials of degree less or equal than 1 on the element K, and The elements of Yσ can be written as i=1 of the triangulation and χ j denotes the characteristic function of the interval I j = (t j−1 , t j ). For every u ∈ L 2 (Q h ), we define its associated discrete state as the unique element yσ ∈ Yσ such that This corresponds to an implicit Euler scheme. It is proved in [18,20] that there exist h 0 > 0 and τ 0 > 0 such that To discretize the controls, we will also use continuous piecewise linear in space and piecewise constant in time functions.
The elements of Vσ can be written as To formulate the discrete problem we will use numerical quadrature formulas to approximate the norms appearing in the functional J. For every This leads to the following scalar product in Vσ: We will denote the related norm as uσ σ = (uσ, uσ) The discrete problem reads as This numerical quadrature scheme is related to a mass lumping approximation, whose necessity in obtaining component-wise subdifferential relations for sparsity penalties is already pointed out in the dissertations of Pieper [22,Section 4.5.3] and Trautmann [27, Section 6.5].
5 Sparsity properties of the discrete solutions.
In this section we analyze the discrete control problem (Pσ). We prove that the unique discrete optimal solution has a sparse structure similar to the one established for the continuous optimal control. This sparsity structure follows from the optimality system. Firstly, we observe that under the assumptions 1-2, Fσ : Vσ → R is of class C ∞ . Moreover, for every uσ, vσ ∈ Vσ, we have that where ϕσ ∈ Yσ is the discrete adjoint state associated with uσ. We recall that the discrete adjoint state ϕσ(u) ∈ Yσ associated with an element u ∈ L 2 (Q h ) is the solution of the equation where yσ = yσ(u) is the discrete state corresponding to u.
We rewrite now F σ (uσ) using Carstensen's quasi-interpolation operator, cf. [3], Notice that for every z ∈ L 1 (Ω h ) and Given ϕσ(uσ), we denote The derivative of the differentiable part of the approximated functional can be expressed as Let us study the non-differentiable part of the cost functional. First, we compute the directional derivatives. To this end let us set The directional derivative of jσ at a point uσ ∈ Vσ in the direction vσ ∈ Vσ can be written as Now, we focus on the subdifferentiability of jσ. We have the following characterization of the elements of the subdifferential of jσ(uσ).

From this and
Adding this inequality and (22), we obtain that λσ satisfies the subgradient condition.
Proof Existence of solution follows from the coercivity and continuity of Jσ and the uniqueness is a consequence of its strict convexity. From the optimality ofūσ and the convexity of jσ(uσ), we have that This proves that J σ (ūσ; vσ) ≥ 0 ∀vσ ∈ Vσ. To obtain (24), first we observe that the previous chain of inequalities implies that Taking into account the definition (15) of the scalar product (·, ·)σ, the expression for the derivative (16) and the property (17) of Carstensen's quasi-interpolation operator, we have that for any uσ ∈ Vσ we have that Finally, from (25) and the definition of subdifferential we infer that − 1 µ (Π hφσ + νūσ) ∈ ∂jσ(ūσ) and, hence, (24) follows. The uniqueness ofλσ follows from (24).
For a local minimumūσ ∈ Vσ with related discrete adjoint stateφσ, we will denote the components of Cartensen's quasi interpolation ofφ h,j as i.e. Π hφσ = Nτ j=1 N h i=1φ i,j e i χ j . We finish this section by characterizingλσ and proving the sparsity ofūσ. Theorem 5.3 Letūσ be the solution of (Pσ). Then, the elementλσ ∈ ∂jσ(ūσ) satisfying (24) can be written as The optimal control satisfies Finally, we have the following property Proof The proof follows the lines of that of [10, Th 5.2] with the obvious changes and taking into account formulas (19) and (24).

Convergence and error estimates
In this section we prove the convergenceūσ →ū and then we establish some error estimates forūσ −ū. First, we prove some properties of the mesh dependent norms · σ and | · |σ and we recall a useful continuity property of Cartensen's quasi-interpolation operator.
Lemma 6.1 The following inequalities hold ∀uσ ∈ Vσ: Moreover we have that for every z ∈ L 2 (Ω) Proof By the convexity of the real function g(s) = s 2 and the fact that we get To deduce (29) we use the convexity of the norm in R Nτ Finally, we prove the inequality (30). Using again the convexity of g(s) = s 2 and the properties of {e i } N h i=1 as above, we get Using this properties, we are able to show the boundness of the discrete optimal controls. Lemma 6.2 Letūσ ∈ Yσ be the solution of (Pσ). Then, there exist C > 0 independent of σ such that Proof Using (28), we have that and therefore we can deduce that there exists C > 0 independent of σ such that ūσ 2 Hence, πτ u is the projection of u on the space of piecewise constant functions and πτ u L 2 (0,T ) ≤ u L 2 (0,T ) ∀u ∈ L 2 (0, T ).
where we have used (31). Inequality (34) is proved with (32) as follows The inequalities (35) and (36) are obvious. Finally we prove the convergence of the sequence {uσ}σ. From (30) and the convergence of πτ u and Π h u we obtain Finally, using (36) and the continuous embedding L ∞ (0, T ) → L 2 (0, T ), So, from (27) we deduce that for µ ≥ µc,ūσ ≡ 0 for all σ. Now we can show that the solutions of the discretized problems converge strongly to the solution of problem (P) in L 2 (Q).
whereū is the solution of (P).
Proof Through the proof, we will consider that all the elements of Vσ are extended by zero to Q h \ Q From Lemma 6.2 we know that {ūσ}σ is bounded in L 2 (Q). Hence, we can extract a subsequence, still denoted in the same way, such that uσ u * weakly in L 2 (Q). We are going to prove that u * =ū.

Error Estimates
We will establish now the error estimates for Cartensen quasi-interpolation operator as well as for the projection operator.
Lemma 6.6 There exists C > 0 independent of h and τ such that where uσ = πτ Π h u in Q h and we extend uσ = u in Q \ Q h in the last inequality.
In the following theorem we will extend the elements of Uσ byū in Q \ Q h . Notice that, if µ > 0, using the sparsity property of the control (10) and the zero boundary condition of the adjoint state equation, we have that for h > 0 small enough,ū = 0 in Q \ Q h . Theorem 6.7 Letū be the solution of (P) and letūσ be the solution of (Pσ). Then, there exists C > 0 independent of σ such that for all h < h 0 and τ < τ 0 , Proof Let yū σ and ϕū σ be the continuous state and adjoint state related to the discrete optimal controlūσ, defined according to (1) and (4) andȳ andφ the state and adjoint state related to the optimal controlū, as defined in (7) and (8). Using integration by parts and taking into account the zero boundary conditions for the state and the adjoint state, the equalities yū σ (0) =ȳ(0) = y 0 , and the zero final conditions for the adjoint states, it follows in a standard way that and hence, from (3) we readily deduce that Let us denote now uσ = πτ Π hū . From the first order optimality conditions for (P) and (Pσ) and the definition of subdifferential, we have Adding this inequalities and taking into account that j(ūσ) ≤ jσ(ūσ), see (29), and jσ(uσ) ≤ j(ū), see (34), we obtain From the previous inequality and (46) we have that The first term is estimated with where we can use the finite element error estimates in [18,20] thanks to the stability estimate in Lemma 6.2 and (43) thanks to the regularity of the optimal control stated in Theorem 3.2. The next two terms satisfy where we have used (28) and that uσ = πτ Π hū together with the fact that πτ is a projection in L 2 (0, T ) and the projection-like formula for Carstensen's quasiinterpolation (17) to deduce that (ūσ, uσ)σ = Qū σūdxdt.
To estimate the last term in the right hand side of (47) we write where we have used the Lipschitz dependance of the adjoint state with respect to the control (the dependance is linear continuous indeed), (43), (44) and the H 1 (Q) regularity of the optimal control and its related adjoint state provided in Theorem 3.2.
And the proof follows collecting all the estimates.

Some extensions
In [10] we investigated the approximation of the controls by means of piecewise constant functions in both space and time. We obtained an order of O( √ τ + h) for a problem governed by a semilinear parabolic equation. The proofs of Theorem 5.7 of that reference can be adapted to the approximation shown in the work at hand to obtain the same order of convergence for a problem with a semilinear state equation.
Conversely, the proof of Theorem 6.7 can be adapted to the case of using piecewise constant approximations for the control to obtain also O(τ + h) for a problem governed by a linear equation. The main difference is that we should take the L 2 (Ω) projection π h instead of Carstensen's quasi-interpolation, and the proof can be slightly simplified: there is no need for numerical integration in the computation of the norms that appear in the objective functional, the terms that appear in the left had side of (49) are zero and so is the midterm in the right hand side of (50).

Numerical experiments
We report on three numerical experiments. In the first one, we describe an example with known solution and show error estimates (cf. Theorem 6.7 and [10, Theorem 5.7]). In the second one, we show how the sparsity properties of the solution change as µ changes (cf. Remark 6.4 and [8, Remark 2.10]). In the third one, we show the sparsity pattern for a 2D example.
8.1 Experiment 1. Error estimates for an example with known solution.
We have solved the problem using approximations of the control in Vσ, the space of continuous piecewise linear functions in space described in this work, For easier comparison, we also include the results provided in [10] using piecewise constant control functions.
In both cases, we confirm the estimates We take two families of uniform partitions in space and time, with h = 2 −i , i = i 0 , . . . , I, and τ = 2 −j j = j 0 , . . . , J for some values of I and J big enough. We have been able to achieve I = J = 13 in a PC with Matlab. To solve the discrete problems, we use a semismooth Newton method as described in [15].
We perform three tests: 1. We take h = h i and τ = τ i , this is, we let both parameter tend to 0 at the same time. 2. We fix a small τ = τ J and take h = h i for i = i 0 , . . . , I * . This is, we refine only in space. 3. We fix a small h = h I and take τ = τ j for j = j 0 , . . . , J * , so we refine only in time.
To measure the error, we compute whereπσū =πτπ hū . The operatorπτ is the numerical approximation of the L 2 (0, T ) projection onto the set of piecewise constant functions given by the midpoint rule:πτ f = Nτ j=1 f ((t j−1 + t j )/2)χ (tj−1,tj ) . The operatorπ h is the usual nodal interpolation in space for the experiment with continuous piecewise linear functions in space andπ h is the numerical approximation of the L 2 (Ω) projection onto the set of piecewise constant functions given by the midpoint rule. Let us denote σ i,j = (h i , τ j ). The experimental order of convergence is measured as in the first cases and analogously in the other cases. For the first test (h = τ ), we obtain the results shown in Table 1. The results are as expected from Theorem 6.7.

Vσ
Uσ   For the second test (τ fixed and small, refinements only in the space step), we get the results summarized in Table 2. The error due to τ = 2 −13 is small, but not zero. So the values obtained for the error due to the discretization in space are not of the form Ch i , but of the form Ch i ± Eτ J . So it seems reasonable to discard the results for which the error in time starts to be big enough. For i ≥ 10 it maybe more than 10% of the error, so we stop at I * = 9. We obtain an order of convergence of O(h), as expected.
In Table 3 we show the results for the third test (h fixed and small, refinements in the time step). Since the spatial error is not zero, we discard the results for which it is at least the 10% of the global error and stop at J * = 8. We obtain an order of convergence close to O(τ ).

Experiment 2. Directional sparsity properties of the control
Let Ω = (0, 1) ⊂ R and let T = 1. We have solved the unconstrained version of the example shown in [8,Remark 2.11]. The data for the example are y 0 = 0, f = 0, A = −∂ 2 xx , ν = 1e − 4, µ = µ 0 = 4e − 3 and We solve the problem in a mesh with h = τ = 2 −8 . In Figure 1, we show the support of the optimal control for the values µ = M µ 0 , M = 0, . . . , 8. For µ = 0, we have no sparsity pattern for the control. Then we see how the control is directionally sparse for µ > 0 and how the support of the control is smaller as µ increases. After a few essays, we find thatū ≡ 0 for µ ≥ 7.4803µ 0 . As expected, the value of the objective functional increases as µ increases. You may find the obtained numerical values for Jσ(ūσ) in Table 4.

Experiment 3. Directional sparsity properties of the control in 2D
Let Ω = (0, 1) × (0, 1) ⊂ R 2 and let T = 1. We solve a 2D version of the previous example. The data for the example are y 0 = 0, f = 0, A = −∆, ν = 1e − 4, µ = µ 0 = 4e − 3 and We solve the problem in a mesh with h = τ = 2 −8 . In Figures 2 and 3 we see graphs of the approximations obtained using piecewise constant and piecewise linear function respectively in different moments of time.