• Government of Styria • City of GrazPARABOLIC CONTROL PROBLEMS IN MEASURE SPACES WITH SPARSE SOLUTIONS ∗

Optimal control problems in measure spaces lead to controls that have small support, which is desirable, e.g., in the context of optimal actuator placement. For problems governed by parabolic partial differential equations, well-posedness is guaranteed in the space of square-integrable measure-valued functions, which leads to controls with a spatial sparsity structure. A conforming approximation framework allows one to derive numerically accessible optimality conditions as well as convergence rates. In particular, although the state is discretized, the control problem can still be formulated and solved in the measure space. Numerical examples illustrate the structural features of the optimal controls.


1.
Introduction.This paper is concerned with the analysis and approximation of the optimal control problem (P) min u∈L 2 (I,M(Ω)) where I = [0, T ] and y is the unique solution to the initial-boundary value problem (1.1) ⎧ ⎨ ⎩ ∂ t y − Δy = u in Ω T = Ω × (0, T ), y = 0 on Σ T = Γ × (0, T ), y(x, 0) = y 0 in Ω for given y 0 ∈ L 2 (Ω).We assume that α > 0, y d ∈ L 2 (Ω T ), and Ω is a bounded domain in R n , 1 ≤ n ≤ 3, which is supposed to either be convex or have a C 1,1 boundary Γ.Hereafter M(Ω) denotes the space of regular Borel measures in Ω, and u L 2 (M) denotes the norm of u in the space L 2 (I, M(Ω)); see section 2 below for details.
Formulating the control problem in a measure space is motivated by the observation that the resulting optimal controls possess sparsity properties (i.e., have small support), which is desirable in many applications such as optimal sensor or actuator placement; see [5,2] in the context of elliptic equations.Although similar features can be achieved using L 1 control costs, the corresponding control problem in general does not admit a solution in the absence of further regularization because L 1 spaces lack the necessary compactness properties.For parabolic problems, the situation is even more delicate since (1.1) is not well-posed for right-hand sides in M(Ω T ) (which would require C(Ω T ) regularity for the adjoint equation; see Definition 2.1 below).This leads to considering controls in L 2 (I, M(Ω)).The associated norm u L 2 (M) for the control is a natural one from the point of view of well-posedness of the state equation (1.1) and allows for sparsity in space.The numerical results will illustrate precisely this property of our formulation.The spatiotemporal coupling of the corresponding control cost, however, presents a challenge for deriving numerically useful optimality conditions.
Besides the analysis of the control problem (P), the main focus of this paper consists in providing an approximation framework which, in spite of the difficulties due to the measure space setting, leads to implementable schemes for which a priori error estimates can be provided.We show that the optimal measure controls can be approximated efficiently by linear combinations of Dirac measures in space which are piecewise constant in time.We point out that even after discretization, the control problem is formulated and solved in the measure space.
Let us mention some related works.A similar approximation framework for elliptic control problems in measure spaces was proposed in [2].Differently from the elliptic case, parabolic control problems with sparsity-promoting constraints have received very little attention.In [3], the approximate control of y(T ) by measures u ∈ M([t 0 , t 1 ] × Ω) with 0 < t 0 < t 1 < T is discussed (using the smoothing property of the heat equation to ensure y(T ) ∈ L 2 (Ω)); finite-dimensional approximation and numerical solution are not addressed.Although not specifically concerned with parabolic equations, the approach of [9] covers control problems with L 1 (Ω, L 2 ([0, T ])) control costs (together with additional pointwise control constraints).The resulting optimal controls have directional sparsity; i.e., their support is constant in time.In contrast, we will show that solutions to (P) have a nonseparable sparsity structure.
This paper is organized as follows.In the next section, we discuss the functional analytic setting of the control problem and analyze well-posedness of the state equation.Section 3 is concerned with existence of and optimality conditions for solutions to (P), the latter implying a sparsity property of the optimal controls.The proposed approximation framework is the subject of section 4, where we introduce the discretization (section 4.1) and show convergence of solutions to the discretized state equation (section 4.2) and to the discrete optimal control problem (section 4.3).Convergence rates are derived in section 5. Section 6 addresses the numerical solution of the discrete control problem, for which we derive a reformulated optimality system that is amenable to solution by a semismooth Newton method.(The continuous counterpart of this optimality system is sketched in the appendix.)Finally, section 7 illustrates the structure of the optimal controls with some numerical examples.

Function spaces and well-posedness of the state equation.
In this section we first define the control space and give some of its properties.Then we turn to the analysis of the state equation.
Associated to the interval I = [0, T ] we define the spaces L 2 (I, C 0 (Ω)) and L 2 (I, M(Ω)), where L 2 (I, C 0 (Ω)) is the space of measurable functions z : [0, T ] → C 0 (Ω) for which the associated norm given by is finite.Due to the fact that C 0 (Ω) is a separable Banach space, L 2 (I, C 0 (Ω)) is also a separable Banach space; see, e.g., [18,Theorem I.5.18].As a consequence of the nonseparability of M(Ω), the definition of the space L 2 (I, M(Ω)) is more delicate.Indeed, we need to distinguish between weakly and strongly measurable functions u : [0, T ] → M(Ω).Hereafter we denote by L 2 (I, M(Ω)) the space of weakly measurable functions u for which the norm is finite.This choice makes L 2 (I, M(Ω)) a Banach space and guarantees that it can be identified with the dual of L 2 (I, C 0 (Ω)), where the duality relation is given by 2.2.Analysis of the state equation.Given 1 < p < ∞, we denote by W 1,p 0 (Ω) the Sobolev space of functions of L p (Ω) with distributional derivatives in L p (Ω) and having a zero trace on Γ, and we set W −1,p (Ω) to be the dual of W 1,p 0 (Ω), where 1/p + 1/p = 1.These spaces are reflexive and separable, and hence the spaces L 2 (I, W 1,p 0 (Ω)) formed by the measurable functions y : [0, T ] → W 1,p 0 (Ω) for which the norm is finite are separable and reflexive Banach spaces whose dual is identified with L 2 (I, W −1,p (Ω)); see [6,Theorem 8.25.5].
The notion of solution to the state equation makes use of the following space of test functions: is endowed with the graph norm.By the Rellich-Kondrachov theorem, Z embeds compactly into L 2 (I, C 0 (Ω)).Definition 2.1.We say that y , and there exist constants Proof.We adapt the proof of [1].Let {u k } k be a sequence in C( ΩT ) satisfying From the last two equations we get for any In the following estimate we use maximal regularity of the heat equation in an essential way.If Ω is convex, its boundary is of Lipschitz class, and hence there exists a p with p > 4 if n = 2 and p > 3 when n = 3 such that Δ : W 1,p 0 (Ω) → W −1,p (Ω) is an isomorphism for each p < p < p, where 1/p + 1/p = 1; see [10].
is an isomorphism for every 1 < p < +∞.)In particular, combining [8,Theorem 5.4] and (2.3), we obtain for every p < p < n n−1 < p the existence of a constant Ĉp such that Using the reflexivity of L 2 (I, W 1,p 0 (Ω)), we can obtain a subsequence, denoted in the same way, and an element y ∈ L 2 (I, W 1,p 0 (Ω)) such that y k y in L 2 (I, W 1,p 0 (Ω)).For ψ 0 ∈ L 2 (Ω T ) arbitrary and z ∈ Z solution to (2.5) for ψ j = 0, 1 ≤ j ≤ n, it follows from (2.4) and (2.5) that Passing to the limit in this identity and in (2.6), we obtain (2.1) and (2.2).Using the fact that ∂ t + Δ is an isomorphism from Z to L 2 (Ω T ) and (2.1), we conclude the uniqueness of y ∈ W 1,p 0 (Ω).Finally, independence of y with respect to p follows from the existence of a solution y in L 2 (I, W 1,p 0 (Ω)) for every p < p < n n−1 and its uniqueness in n+2 }, with p as in the proof of Theorem 2.2, and hence y ∈ L 2 (Ω T ).As a consequence, we deduce that y ∈ C(I, L 2 (Ω)); see [16,Proposition III.1.2].
(ii) Under our regularity conditions, an equivalent definition for the solution to (1.1) is the following.A function y ∈ L 2 (I, W 1,p 0 (Ω)) with p 0 < p < n n−1 is called a solution to (1.1) if and z(T ) = 0.This follows from (2.1) and the density of Z in this new space of test functions.Theorem 2.2 remains valid with this definition if we only assume that Ω has a Lipschitz boundary.This is the regularity of Ω required to have the maximal parabolic regularity; see [8].We have chosen the above definition because it is more convenient for the numerical analysis to be developed later in this paper.
(iii) The preceding theorem as well as the rest of the results given in this paper are valid if we replace the heat operator in (1.1) by a more general parabolic operator ∂ t + A that enjoys maximal parabolic regularity.
We finish this section by proving a continuity result of the states with respect to the controls. Theorem Then, from Definition 2.1 and using the boundedness of {u k } ∞ k=1 in L 2 (I, M(Ω)), we have This convergence and the above inequality conclude the proof.

Analysis of the control problem.
In this section we establish existence of an optimal control and derive the optimality conditions.
Proposition 3.1.The control problem (P) has a unique solution ū.
Proof.Let {u k } ∞ k=1 be a minimizing sequence, which is thus bounded in the space L 2 (I, M(Ω)).Since the predual L 2 (I, C 0 (Ω)) is separable, there exists a subsequence, denoted in the same way, converging weakly- * to some ū ∈ L 2 (I, M(Ω)).From Theorem 2.4 we get that y(u k ) → y(ū) strongly in L 2 (Ω T ).Hence, the weakly- * lower semicontinuity of the norm • L 2 (M) implies that ū is a solution.The uniqueness is a consequence of the strict convexity of J, which follows from the injectivity of the control-to-state mapping.
Hereafter ū will denote the solution to (P) and ȳ the associated state.Now, we give the first order optimality conditions, which are necessary and sufficient due to the convexity of (P).
From now on, we will assume that the optimal control ū = 0.By using (3.2) and (3.3) we can prove some sparsity property for ū.Let us consider the Jordan decomposition ū(t) = ū+ (t) − ū− (t) for almost every t ∈ I. Then we have the following theorem.
Theorem 3.3.For almost every t ∈ I the following embeddings hold: We have to check that v : I → M(Ω) is weakly measurable.To this end the only delicate point is the weak measurability of t ∈ I → δ xt ∈ M(Ω).This follows from the measurability of the mapping t → x t and the continuity of x ∈ Ω → δ x ∈ M(Ω) when M(Ω) is endowed with the weak- * topology.By definition of v we get From (3.2), (3.9), (3.5), and (3.10) we obtain As a consequence of these inequalities and (3.9) we conclude that Lemma 3.4.Let μ ∈ M(Ω) and z ∈ C 0 (Ω), both of them not zero, be such that and let μ = μ + − μ − be the Jordan decomposition of μ.Then we have Proof.We will prove (3.13), the proof of (3.14) being analogous.First we observe that due to (3.12) we obtain for all measures ν ∈ M(Ω) We have as well that Moreover, the inequality is strict unless μ + and μ − are concentrated at the set of points x ∈ Ω where z(x) ≥ 0 and z(x) ≤ 0, respectively.Let us define the sets and the measures ν Because of (3.15) we conclude that Supp(μ + ) ⊂ A + and Supp(μ − ) ⊂ A − .Now we distinguish two cases in the proof of (3.13) depending on whether the norm bound is attained from above.

Approximation of the control problem.
We consider a dG(0)cG(1) discontinuous Galerkin approximation of the state equation (1.1) (i.e., piecewise constant in time and linear nodal basis finite elements in space; see, e.g., [17]).Associated with a parameter h we consider a family of triangulations {K h } h>0 of Ω.To every element K ∈ K h we assign two parameters ρ(K) and ϑ(K), where ρ(K) denotes the diameter of K and ϑ(K) is the diameter of the biggest ball contained in K.The size of the grid is given by h = max K∈K h ρ(K).We will denote by {x j } N h j=1 the interior nodes of the triangulation K h .In this section Ω will be assumed to be convex.In addition, the following usual regularity assumptions on the triangulation are assumed: (i) There exist two positive constants ρ Ω and ϑ Ω such that and Γ h being its interior and boundary, respectively.We assume that the vertices of K h placed on the boundary Γ h are also points of Γ and that there exists a constant C Γ > 0 such that dist(x, Γ) ≤ C Γ h 2 for every x ∈ Γ h .This always holds if Γ is a C 2 boundary.
In the case of polygonal or polyhedral domains, it is reasonable to assume that the triangulation satisfies that Γ h = Γ.From this assumption we know [14, section 5.2] that (4.1) where | • | denotes the Lebesgue measure.We also introduce a temporal grid 0 and set τ = max 1≤k≤Nτ τ k .We assume that there exist ρ T > 0, C Ω,T > 0, and c Ω,T > 0 independent of h and τ such that We will use the notation σ = (τ, h) and Ω hT = Ω h × (0, T ).

Discretization of the controls and states.
We first discuss the spatial discretization, which follows [2].Associated to the interior nodes {x j } N h j=1 of K h we consider the spaces j=1 is the nodal basis formed by the continuous piecewise linear functions such that e j (x i ) = δ ij for every 1 ≤ i, j ≤ N h .Such functions attain their maximum and minimum at one of the nodes, and thus for all y h ∈ Y h , where we have identified y h with the vector y h = (y 1 , . . ., y N h ) T ∈ R N h of its expansion coefficients, and | • | p denotes the usual p-norm in R N h .Similarly, we have for all Hence endowed with these norms, U h is the topological dual of Y h with respect to the duality pairing For every σ we define the space of discrete controls and states by and where The elements u σ ∈ U σ and y σ ∈ Y σ can be represented in the form Thus U σ and Y σ are finite-dimensional spaces of dimension N τ × N h , and bases are given by {χ k δ xj } k,j and {χ k e j } k,j .Identifying again u σ with the vector u σ of expansion coefficients u kj , we have for all u σ ∈ U σ that It is thus straightforward to verify that, endowed with these norms, U σ is the topological dual of Y σ with respect to the duality pairing Next we define the linear operators Λ h : The operator Π h is the nodal interpolation operator for Y h .Concerning the operator Λ h we have the following result.
Finally, we prove (4.9).From (4.7) we know that {Φ σ u} σ is bounded in the space L 2 (I, M(Ω)).Then there exists a subsequence, denoted in the same way, and an element ũ ∈ L 2 (I, M(Ω)) such that Φ σ u * ũ in L 2 (I, M(Ω)).Then for every Using (4.6) and (4.10), we find Combining these two equalities, we have that and therefore u = ũ and the whole sequence {Φ σ u} σ converges weakly- * to u.By the convergence Φ σ u * u and (4.7) we obtain which concludes the proof of (4.9).We finish this section by proving the following approximation result.Theorem 4.3.Let y and y σ be the solutions to (1.1) corresponding to u and Φ σ u, respectively.Then there exists a constant C > 0 independent of u and σ such that Proof.Let f ∈ L 2 (Ω T ) be arbitrary and take z ∈ Z satisfying (4.16) Due to the convexity of Ω, there exists a constant C independent of f such that . By (2.1) and (4.6) we get (4.17) Now, we will prove that From the error estimates of the interpolation in Sobolev spaces [4, Chapter 3] we get (4.19) Here and below, C denotes a constant independent of σ.By an inverse inequality (see [4,Theorem 17.2]) and using (4.2) for the last inequality in the following estimate, we obtain ds dt which implies (4.15).

Discrete state equation.
In this section we approximate the state equation and provide error estimates.We recall that I k was defined as (t k−1 , t k ] and consequently To approximate the state equation in time we use a dG(0) discontinuous Galerkin method, which can be formulated as an implicit Euler time stepping scheme.Given a control u ∈ L 2 (I, M(Ω)), for k = 1, . . ., N τ and z h ∈ Y h we set (4.21) For instance, we can choose for y 0h the projection P h y 0 of y 0 on Y h given by the variational equation For any such choice of y 0h , the estimate (4.22) implies that there exists a constant C 1 > 0 independent of h such that (4.23) Indeed, by using an inverse inequality and the well-known estimates for the projection operator Obviously (4.21) defines a unique solution y σ .Let us observe that from (4.5) we have the following important consequence.
Lemma 4.4.Let y σ and ỹσ denote the solutions to (4.21) associated to the controls u and Φ σ u, respectively.Then the identity y σ = ỹσ holds.
The rest of the section is devoted to the proof of the stability of the scheme (4.21) and to the derivation of error estimates for y − y σ L 2 (ΩT ) , where y and y σ are the solutions to (1.1) and (4.21) associated to a given control u ∈ L 2 (I, M(Ω)).To this end, we introduce some operators that will be used in the proof of the theorems.For every h we consider the Ritz projection R h : From the theory of finite elements we know that for all z ∈ H 2 (Ω) ∩ H 1 0 (Ω), Indeed, for every k = 1, . . ., N τ we have Theorem 4.5.Given a control u ∈ L 2 (I, M(Ω)), let y σ be the solution to (4.21) corresponding to u.Then there exist constants C i > 0, i = 1, 2, independent of u and σ such that From here we get with the aid of an inverse estimate [4, Theorem 17.2] . Downloaded 08/20/13 to 193.144.185.39.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php In the last inequality we used (4.2).Summing from k = 1 to m and using (4.2), it follows that Hence (4.28) Here we have used an inverse inequality, (4.2), and (4.23) to get Finally, since 1 ≤ m ≤ N τ is arbitrary, (4.26) follows from (4.28).Now we prove (4.27).Given f ∈ L 2 (Ω T ), we take z ∈ Z satisfying (4.16).Integrating by parts, we get Taking z σ = R σ z, we get from the above identity and (4.25) that (4.29) Let us estimate each of these terms.From the definition of z σ and (4.23) we obtain (4.30) where we have used that there exists a constant C > 0 independent of σ such that Using (4.24), we deduce that for every w ∈ H 2 (Ω) ∩ H 1 0 (Ω), with κ = 1 if n ≤ 2 and κ = 1/2 if n = 3.Then (4.31) follows from the above inequalities.
Concerning the last term of (4.29), we will prove where κ is defined as above.First we observe that (4.26) implies (4.33) From the definition of z σ and (4.24) we deduce which is equivalent to (4.27).
In the next theorem we show error estimates for the discretization of the state equation.

Integrating by parts, we get
From this identity, (4.21), and (4.25) we deduce Let us estimate each of these three terms.For the first term we observe that The proof of this inequality is the same as that of (4.18); it is enough to replace Π h by R h and to use (4.24).Using this inequality, we obtain the first estimate as follows: For the second term we proceed with the aid of (4.23): Finally, the third term of (4.36) was estimated in (4.32).Thus, using (4.37), (4.38), and (4.32) in (4.36), the inequality is obtained, which leads to (4.34).

4.3.
Discrete optimal control problem.The approximation of the optimal control problem (P) is defined as where y σ is the discrete state associated to u, i.e., the solution to (4.21).Let us observe that, analogously to J, the functional J σ is convex.However, it is not strictly convex due to the noninjectivity of the control-to-discrete-state mapping and the nonstrict convexity of the norm of L 2 (I, M(Ω)).Although the existence of a solution can be shown in the same way as for problem (P), we therefore cannot deduce its uniqueness.
On the other hand, if ũσ is a solution to (P σ ) and if we take ūσ = Φ σ ũσ , then Lemma 4.4 and the inequality (4.7) imply that J σ (ū σ ) ≤ J σ (ũ σ ), and hence ūσ is also a solution to (P σ ).Since for u σ ∈ U σ the mapping u σ → y σ (u σ ), with y σ (u σ ) the solution to (4.21) for u = u σ , is linear, injective, and dim U σ = dim Y σ , this mapping is bijective.Therefore, the cost functional J σ is strictly convex on U σ , and hence (P σ ) Downloaded 08/20/13 to 193.144.185.39.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.phphas a unique solution in U σ , which will be denoted by ūσ hereafter.We summarize this discussion in the following theorem.
Remark 4.8.The fact that (P σ ) has exactly one solution in U σ is of practical interest.Indeed, recall that ūσ , as an element of U σ , can be uniquely represented as The numerical computation of ūσ therefore is equivalent to the computation of the coefficients We finish this section by analyzing the convergence of the solution in U σ to (P σ ) to the solution to Theorem 4.9.For every σ, let ūσ be the unique solution to (P σ ) belonging to U σ and let ū be the solution to (P).Then the following convergence properties hold for σ → 0 + : where ȳ and ȳσ are the continuous and discrete states associated to ū and ūσ , respectively.
Proof.First of all, let us show that (4.43) where y σ and y are the discrete and continuous states associated to the controls u σ and u, respectively.Indeed, let us write y − y σ = (y − y σ ) + (y σ − y σ ), where y σ is the continuous state associated to u σ .Then by Theorems 2.4 and 4.6 we deduce (4.43).
Turning to the verification of (4.39), we observe that with ŷσ0 denoting the uncontrolled discrete state, which implies the boundedness of {ū σ } σ in L 2 (I, M(Ω)).By taking a subsequence, we have that ūσ * u in L 2 (I, M(Ω)).Then using (4.1), (4.43), lower semicontinuity of the norm • L 2 (M) , and (4.9), we obtain Hence u = ū by the uniqueness of the solution to (P), and the whole sequence {ū σ } σ converges weakly- * to ū.In addition, the above inequality implies (4.42).Using again (4.43), we deduce (4.41).Finally, (4.40) follows immediately from (4.41) and (4.42).Downloaded 08/20/13 to 193.144.185.39.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php 5. Error estimates.We now turn to the proof of error estimates for the optimal costs and for the optimal states.We still require Ω to be convex and assume in addition (5.1) Recall that ȳ and ȳσ denote the continuous and discrete states associated to the optimal controls ū and ūσ , respectively.Theorem 5.1.There exists a constant C > 0 independent of σ such that Proof.Taking r as in (5.1) and using Hölder's inequality and (4.1), we deduce that for all φ ∈ L 2 (I, L r (Ω)) and n = 2 or 3, holds.Observe that Ω = Ω h for n = 1; consequently (5.3) holds with C = 0. Let y and y σ be the continuous and discrete states associated to a given control u.As a consequence of (4.34) and (5.3), with φ = y − y d , we obtain Now, by the optimality of ū and ūσ we have and hence From (4.40) we deduce that {ū σ } σ is bounded in L 2 (I, M(Ω)).Therefore, (2.2) implies that the continuous associated states {y ūσ } σ are bounded in L 2 (I, W 1,p 0 (Ω)) for every 1 ≤ p < n n−1 , and therefore in L 2 (I, L r (Ω)) as well.We now apply (5.4) with u = ūσ and u = ū, respectively.Together with (5.5) this establishes (5.2).
In the following theorem we establish a rate of convergence for the states.Theorem 5.2.There exists a constant C > 0 independent of h such that with κ as defined in Theorem 4.1.
Proof.Let S : L 2 (I, M(Ω)) → L 2 (Ω T ) and S σ : L 2 (I, M(Ω)) → L 2 (Ω T ) be the solution operators associated to (1.1) and (4.21), respectively.From (4.34) it follows that (5.7) are equivalent.In this section we present a numerical algorithm to solve (Q σ ) as an alternative formulation to (P σ ).Due to the spatiotemporal coupling of the norm in L 2 (I, M(Ω)), its subdifferential is difficult to characterize.However, using Fenchel duality combined with an equivalent reformulation that decouples the spatiotemporal structure, we can obtain optimality conditions that can be solved using a semismooth Newton method.
For the reader's convenience, we recall the Fenchel duality theory, e.g., from [7,Chapter 4].Let V and Y be Banach spaces with topological duals V * and Y * , respectively, and let Λ : V → Y be a continuous linear operator.Setting R = R∪{∞}, let F : V → R, G : Y → R be convex lower semicontinuous functionals which are not identically equal to ∞ and for which there exists a v 0 ∈ V such that F (v 0 ) < ∞, G(Λv 0 ) < ∞, and G is continuous at Λv 0 .Let F * : V * → R denote the Fenchel conjugate of F defined by which we can calculate using the fact that (6.1) Here, ∂F denotes the subdifferential of the convex function F , which reduces to the Gâteaux derivative if it exists, and the left-hand side arises from differentiating the duality pairing.The Fenchel duality theorem states that under the assumptions given above, (6.2) inf v∈V holds, and that the right-hand side of (6.2) has at least one solution.Furthermore, the equality in (6.2) is attained at (v, q) if and only if where the derivative of the duality pairing again enters the left-hand side.We now apply the Fenchel duality theorem to (Q σ ), which we express in terms of the expansion coefficients ūkj .Let N σ = N τ × N h and identify as above u σ ∈ U σ with the vector u σ = (u 11 , . . ., u 1N h , . . ., u Nτ N h ) T ∈ R Nσ of coefficients, and similarly y d,σ ∈ Y σ ; see section 4.1.To keep the notation simple, we will omit the vector arrows from here on.Denote by M h = ( e j , e k ) N h j,k=1 the mass matrix and by A h = (a(e j , e k )) N h j,k=1 the stiffness matrix corresponding to Y h .For the sake of presentation, we fix y 0 = 0. Then the discrete state equation (4.21) can be expressed as L σ y σ = u σ with 13 to 193.144.185.39.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php(Note that the "mass matrix" corresponding to ( δ xj , e k ) N h j,k=1 is the identity.)Introducing for v σ ∈ R Nσ the vectors v k = (v k1 , . . ., v kN h ) T ∈ R N h , 1 ≤ k ≤ N τ , the discrete optimal control problem (Q σ ) can be stated in reduced form as .
We now set Λ : R Nσ → R Nσ , Λv = L −1 σ v, and calculate the Fenchel conjugates with respect to the topology induced by the duality pairing (4.3).For G, we have by direct calculation that since the supremum is attained if and only if q k = M h (v k − y d,k ) for each 1 ≤ k ≤ N τ due to (6.1) and the definition of the duality pairing.For F , we appeal to the fact that in any Banach space the Fenchel conjugate (with respect to the weak- * topology) of a norm is the indicator function of the unit ball with respect to the dual norm (see, e.g., [15,Example 2.2.6]), and to the duality between U σ and Y σ , to obtain The adjoint Λ * : R Nσ → R Nσ (with respect to the above duality pairing) is given by L −T σ .Dropping the constant term in G * and substituting p σ = Λ * q σ , i.e., q σ = L T σ p σ , we obtain the dual problem (6.4) min Since v 0 = 0 = Λv 0 satisfies the regular point condition, the Fenchel duality theorem is applicable, implying the existence of a solution pσ which is unique due to the strict convexity in (6.4).While the second relation of (6.3), (6.5) can in principle be used to obtain ūσ from pσ , the first relation remains impractical for numerical computation.We thus consider the following equivalent reformulation Downloaded 08/20/13 to 193.144.185.39.Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php of (6.4), which decouples the spatiotemporal constraint given by the term ι α (p σ ): where c σ = (c 1 , . . ., c Nτ ) T ∈ R Nτ .Since the constraints satisfy a Slater condition (take p σ = 0 and c k = T −1/2 α, 1 ≤ k ≤ N τ ), we obtain (e.g., from [12]) existence of Lagrange multipliers and λ ∈ R such that the (unique) solution (p σ , cσ ) satisfies the optimality conditions (6.6) where M σ ∈ R Nσ×Nσ is a block diagonal matrix containing N τ copies of M h .We now rewrite the optimality system in a form amenable to the numerical solution using a semismooth Newton method.First, μ 1 k and μ 2 k are scaled by τ k > 0 to eliminate this factor from the first and second relations (which does not affect the complementarity conditions).Using the componentwise max and min functions, the complementarity conditions for μ 1 k , μ 2 k and pk can be expressed equivalently for any γ > 0 as ).We argue similarly for the min term.Furthermore, comparing the first relation of (6.6) with (6.5), we deduce that ūk = μ ). Inserting these relations into (6.6),we obtain for every γ > 0 the optimality system (6.7)Since the max and min functions are globally Lipschitz mappings in finite dimensions, this defines a semismooth equation which can be solved using a generalized Newton method; see, e.g., [13,11].Here we recall that the Newton derivative of max(0, v) with respect to v is given componentwise by and similarly that of min(0, v).In practice, we have to account for the possibly local convergence of the Newton method.To compute a suitable starting point, as an initialization step we successively solve a sequence of approximating problems that are obtained from (6.7) by replacing the max and min terms with max(0, γ(p k − ck )) and min(0, γ(p k + ck )), respectively, and letting γ tend to infinity.(This can be interpreted as a Moreau-Yosida regularization of the complementarity conditions.)Since now u k no longer appears in the argument of the max and min functions, it can be eliminated from the optimality system using the third equation (which also allows computing ūk given (p k , ck )), yielding Starting with γ = 1 and p 0 = y 0 = 0, c 0 = T −1/2 α, and λ 0 = 1, we solve (6.8) using a semismooth Newton method, increase γ, and compute a new solution for increased γ with the previous solution as starting point.Once a solution satisfies the constraints (or a stopping value γ * is reached), we use it as a starting point for the solution of (6.7) with γ = 1.Remark 6.1.By virtue of the chosen discretization (specifically, the adjoint consistency of discontinuous Galerkin methods and the discrete topology mirroring the continuous one), the discrete optimality system (6.8)coincides with the discretization of the continuous optimality system obtained by applying Fenchel duality, the relaxation approach, and a Moreau-Yosida approximation to problem (P).Since the continuous optimality system may be of independent interest, the derivation is sketched in the appendix.we consider the target z 1 and α = 0.1.Figure 7.6(a) shows the difference |J h − J h * | for a series of successively refined grids with N h = 32, 40, . . ., 128 and N τ (h) = 1  16 N 2 h .The observed approximately linear convergence rate agrees with the rate obtained in Theorem 5.1.The corresponding L 2 error y h − y h * L 2 of the discrete states also decays with a linear rate, which is faster than predicted by Theorem 5.2.A similar behavior was observed in the elliptic case; see [2].

Conclusion.
For the appropriate functional-analytic setting of parabolic optimal control problems in measure spaces, there exists a straightforward approximation framework that retains the structural properties of the norm in the measurevalued Banach space and allows deriving numerically accessible optimality conditions as well as convergence rates.In particular, although the state is discretized, the control problem is still formulated and solved in measure space.The numerical results demonstrate that the optimal controls exhibit the expected sparsity pattern.
Appendix.Continuous optimality system.In this section we sketch the derivation of the continuous optimality system using Fenchel duality and the relaxation approach.Let S : L 2 (I, M(Ω)) → L 2 (Ω T ) denote the solution operator corresponding to the state equation (1.1) with homogeneous initial conditions.It will be convenient to introduce the parabolic differential operator L such that the solution y to (1.1) satisfies Ly = u.Then we can express problem (P) in reduced form as min By approximating p γ and y γ in Y σ , using the fact that for linear finite elements the pointwise maximum and minimum is attained at the nodes, and the adjoint consistency of discontinuous Galerkin methods (i.e., (L * ) σ = L T σ ), we recover (6.8).