Optimal Control of Semilinear Parabolic Equations by BV-Functions

Here, we assume that Ω is a bounded domain in R, 1 ≤ n ≤ 3, with a Lipschitz boundary Γ, Q = Ω × (0, T ), Σ = Γ × (0, T ), and y0 ∈ L∞(Ω). BV (0, T ) denotes the space of bounded variation functions defined in (0, T ), with 0 < T < ∞ given. The controllers in (P) are supposed to be separable functions with respect to fixed spatial shape functions gj and free temporal amplitudes uj . The specific new feature in (P) is given by the choice of the control norm as the BVseminorm ‖uj‖M(0,T ). It enhances that the optimal controls are piecewise constant in time and that the number of jumps is penalized. The weights in (P) are assumed to satisfy αj > 0 and βj ≥ 0. Thus the goal of the optimal control problem (P) is to achieve a simple control strategy while simultaneously being as close to the target yd as possible. Let us further comment on the importance of this fact. If we consider the classical formulation of the control problem with a quadratic cost functional for the control, then the optimal control ū is equal to a multiple of the optimal adjoint state. Hence, while it is a regular function of time, its practical implementation can be involved in comparison to piecewise constant controls. Of course, ū can be approximated by piecewise constant functions, but a good approximation may require many jumps. Looking for a simpler structure for ū, one can consider the bang-bang formulation of the control problem by introducing pointwise constraints on the control: α ≤ u(t) ≤ β. Then, we can expect for ū to take only the values α and β. A drawback of this approach is given by the fact that ū frequently takes the extreme values all the time. This can lead to undesirable amounts of energy used to control the system. Our formulation pursues an optimal control ū with a simple structure and with lower energy than in the bang-bang case: We look for a piecewise constant control with just a few jumps. We show that this goal can be achieved with our formulation. The numerical tests also confirm the desired simple structure of the optimal controls.

Here, we assume that Ω is a bounded domain in R n , 1 ≤ n ≤ 3, with a Lipschitz boundary Γ, and y 0 ∈ L ∞ (Ω).BV (0, T ) denotes the space of bounded variation functions defined in (0, T ), with 0 < T < ∞ given.The controllers in (P) are supposed to be separable functions with respect to fixed spatial shape functions g j and free temporal amplitudes u j .The specific new feature in (P) is given by the choice of the control norm as the BV-seminorm u j M(0,T ) .It enhances that the optimal controls are piecewise constant in time and that the number of jumps is penalized.The weights in (P) are assumed to satisfy α j > 0 and β j ≥ 0. Thus the goal of the optimal control problem (P) is to achieve a simple control strategy while simultaneously being as close to the target y d as possible.Let us further comment on the importance of this fact.If we consider the classical formulation of the control problem with a quadratic cost functional for the control, then the optimal control ū is equal to a multiple of the optimal adjoint state.Hence, while it is a regular function of time, its practical implementation can be involved in comparison to piecewise constant controls.Of course, ū can be approximated by piecewise constant functions, but a good approximation may require many jumps.Looking for a simpler structure for ū, one can consider the bang-bang formulation of the control problem by introducing pointwise constraints on the control: α ≤ u(t) ≤ β.Then, we can expect for ū to take only the values α and β.A drawback of this approach is given by the fact that ū frequently takes the extreme values all the time.This can lead to undesirable amounts of energy used to control the system.Our formulation pursues an optimal control ū with a simple structure and with lower energy than in the bang-bang case: We look for a piecewise constant control with just a few jumps.Corollary 10 shows that this goal can be achieved with our formulation.The numerical tests also confirm the desired simple structure of the optimal controls.The use of the BV-seminorm necessitates to develop novel techniques for the analysis and numerical realization of (P).
The appearance of the mean T 0 u j (t) dt in the cost is related to the kernel of the BV-seminorm.For linear and certain classes of nonlinear functions f the choice β j = 0 is admissible, while for more severe nonlinearities we have chosen the option β j > 0 to guarantee existence of a solution to (P).
The choice of the control costs related to BV-norms or BV-seminorms has not received much attention in the literature.However, let us mention [10] where the effect of L 2 -, H 1 -, measure-valued, and BV-valued control costs on the qualitative behavior of the optimal control was pointed out and compared.In [13] the use of BVcosts was investigated further for the case of linear elliptic equations.BV-seminorm control costs are also employed in [5], where the control appears as a coefficient in the p-Laplace equation.
Let us also compare the use of the BV-term in (P) with the efforts that have been made for studying optimal control problems with sparsity constraints.These formulations involve either measure-valued norms of the control or L 1 -functionals combined with pointwise constraints on the control.We cite [4,14] from among the many results which are now already available.Thus the use of the BV-seminorm can also be understood as a sparsity constraint for the first derivative, which in our case is the temporal derivative.
Let us briefly outline the following sections.Section 2 contains a precise problem statement, the analysis of the state equation, and the differentiability properties of the cost functional.The analysis of the optimal control problem, sparsity properties of the optimal controls as well as second order necessary and sufficient optimality conditions are contained in section 3. Section 4 is devoted to a finite element approximation of the control problem and its well-posedness.A convergence analysis of this approximation scheme is provided in section 5.In section 6 we derive an algorithm to solve the control problem.Numerical results illustrating that the desired behavior of the optimal controls can actually be observed numerically are presented in section 7.
By using [1,Theorem 3.44] it is easy to deduce that there exists a constant C T such that (2) In addition, we mention that BV (0, T ) is the dual space of a separable Banach space.Therefore, every bounded sequence {u k } ∞ k=1 in BV (0, T ) has a subsequence converging weakly * to some u ∈ BV (0, T ).The weak * convergence u k * u implies that u k → u strongly in L 1 (0, T ) and u k * u in M(0, T ); see [1, pp. 124-125].We will also use that BV (0, T ) is continuously embedded in L ∞ (0, T ) and compactly embedded in L p (0, T ) for every p < +∞; see [1,Corollary 3.49].From this property we deduce that the convergence u k * u in BV (0, T ) implies that u k → u strongly in every L p (0, T ) for all p < +∞.
By using these assumptions, the following theorem can be proved in a standard way; see, for instance, [2] or [23,Theorem 5.5].
Proposition 1.For every u ∈ L p (0, T ) m , with p > 1, the state equation (1) has a unique solution y u ∈ L ∞ (Q) ∩ L 2 (0, T ; H 1 0 (Ω)).In addition, for every M > 0 there exists a constant K M such that In what follows, we will denote Y = L ∞ (Q) ∩ L 2 (0, T ; H 1 0 (Ω)) and S : L p (0, T ) m −→ Y the mapping associating to each control u the corresponding state S(u) = y u , with p > 1.By the implicit function theorem, we deduce in the classical way the following result [7,Theorem 5.1].
For all elements u, v, and w of L p (0, T ) m , the functions z v = S (u)v and z vw = S (u)(v, w) are the solutions of the problems respectively.
Next, we analyze the differentiability of the cost functional.In J we separate the smooth and the convex parts J(u) = F (u) + G(u) with where g : M(0, T ) −→ R is given by g(µ) = µ M(0,T ) .From Proposition 2 and the chain rule the following proposition can be obtained.
The derivatives of F are given by ( 10) The L ∞ (Q) regularity of ϕ u follows from the assumptions on y d and the fact that y u ∈ L ∞ (Q).For the continuity of ϕ u in Q it is enough to use that the terminal and boundary conditions are zero.
Proposition 5 (see [6,Proposition 3.3]).Let µ, ν ∈ M(0, T ), then Now, we analyze the mapping G.To this end, let us introduce the operator D t : BV (0, T ) −→ M(0, T ) by D t u = u .Its adjoint operator is defined by The following identities hold for all u ∈ BV (0, T ): where dv = h v d|u | + dv s is the Lebesgue decomposition of v with respect to |u |.
Proof.Since g : M(0, T ) −→ R is convex and continuous and D t : BV (0, T ) −→ M(0, T ) is a linear and continuous mapping, we can apply the chain rule [11, Chapter I, Proposition 5.7] to deduce that ∂(g t ∂g(u ), which immediately leads to (15).
To verify (16) it is enough to observe that and to apply (14).This completes the proof.
3. Analysis of the optimal control problem (P).This section is devoted to the proof of the existence of at least one solution of (P) and to the optimality conditions and their consequences.
Theorem 7. Let us assume that one of the following assumptions hold: 1. β j > 0 for every 1 ≤ j ≤ m.
2. There exist q ∈ [1, 2) and C > 0 such that Then, problem (P) has at least one solution.Moreover, if f is affine with respect to y, the solution is unique.
Let us observe that condition (17) is satisfied in the case of affine functions with respect to y.
m be a minimizing sequence.We prove that this sequence is bounded in BV (0, T ) m .As introduced in section 2, we consider the decomposition u k = a k + ûk , where a k = (a k,1 , . . ., a k,m ), ûk = (û k,1 , . . ., ûk,m ), and This boundedness is obvious if the first assumption is satisfied.Otherwise, let us denote by y k and ŷk the solutions (1) associated to the controls u k and ûk , respectively.From the inequalities . Now, we define z k = y k − ŷk , which produces a bounded sequence in L 2 (Q) as well.Subtracting the equations satisfied by y k and ŷk and using the mean value theorem, we infer that (18) To argue by contradiction, let us assume that Then, introducing ζ k = 1 ρ k z k , we obtain from ( 18) that ( 19) From this equation, using (4), (5), and the boundedness of the right-hand side in L ∞ (Q), we have that ζ k L ∞ (Q) ≤ M for some M > 0 and all k.Moreover, the boundedness of 17) and Hölder's inequality with 2 q and 2 2−q lead to Combined with the aforementioned properties of {ζ k } ∞ k=1 this shows that the left-hand side of the partial differential equation in (19) converges to zero in the distribution sense.However, by the definition of ρ k we have that the right-hand side does not converge to zero, which is a contradiction.Consequently, {a k } ∞ k=1 is a bounded sequence in R m , hence the minimizing sequence {u k } ∞ k=1 is bounded in BV (0, T ) m because of (2).Therefore, we can take a subsequence, denoted in the same way, such that u k * ū in BV (0, T ) m , which implies u k → ū strongly in L p (0, T ) m for every p < +∞.As a consequence of Proposition 2 we have that y k → ȳ strongly in Y, where ȳ is the state associated to ū, and thus F (u k ) → F (ū). Furthermore, the convergence u k,j * ū k,j in M(0, T ) for every 1 ≤ j ≤ m yields that Hence, J(ū) ≤ lim inf k→∞ J(u k ) = inf (P) and ū is a solution of (P).The uniqueness of a solution when f is affine with respect to y is an immediate consequence of the strict convexity of F and the convexity of G.
Next, we analyze the first order optimality conditions.Since (P) is not a convex problem it is convenient to deal with local solutions.Definition 8. Let ū ∈ BV (0, T ) m .We shall call ū a local solution of (P) if there exists ε > 0 such that We say that ū is an L p (0, T ) m -local solution (1 ≤ p ≤ ∞) if the above statement is true with the L p (0, T ) m norm in place of the BV (0, T ) m norm.Finally, ū is called a strong local solution if for some ε > 0, where ȳ and y u denote the states associated to ū and u, respectively.The solution is said to be strict in any of the previous senses if the inequality J(ū) < J(u) holds in the above statements whenever ū = u.
We have the following relationships among these concepts.Since BV (0, T ) is continuously embedded into L p (0, T ) for any p ∈ [1, +∞], we deduce that if ū is an L p (0, T ) m -local solution of (P), then it is a local solution.On the other hand, from Propositions 1 and 2 we infer that any strong local solution is an L p (0, T ) m -local solution for 1 < p ≤ +∞.
Given ū ∈ BV (0, T ) m with associated state and adjoint state ȳ and φ, respectively, we define ( 20) This quantity will allow us to obtain information on the structure of the optimal control ū.From Corollary 10 below we shall deduce that the support of ū j is contained in the set where | Φj (t)| = α j .In particular, jumps in ūj can only occur at t with | Φj (t)| = α j .But at first we need to derive the following structure theorem for Φj .
Corollary 10.Under the assumptions of Theorem 9, the following inclusions are valid for each j ∈ {1, . . ., m} for which ūj is not a constant function on [0, T ] : where ū j = ū + j − ū − j is the Jordan decomposition of the measure ū j .
This corollary is a straightforward consequence of ( 21), (22), Proposition 4 with λ = − 1 αj Φj , and the fact that ū j = 0 if ūj is not a constant function in [0, T ].Remark 11. 1.Let us observe that if there are only finitely many t with Φj (t) ∈ {−α j , +α j }, then ū j is a combination of Dirac measures centered at those points.In particular, we obtain that ūj is piecewise constant in [0, T ].This will be illustrated in the numerical examples; cf.sections 7.1 and 7.2.
2. Given α = (α j ) m j=1 , let us denote by ūα = (ū α,j ) m j=1 a solution of (P) and by (ȳ α , φα ) the associated state and adjoint state.We note that if α j is decreased, then the BV (0, T ) seminorm of ūα,j is increasing.On the contrary, if α j is increased, then the BV (0, T ) seminorm of ūα,j is decreasing.In fact, there is a threshold M j < +∞ such that if α j > M j , then ū α,j = 0, i.e., ūα,j is constant in [0, T ].Moreover, there exists a vector ξ ∈ R m such that for any α with α j > M j for all 1 ≤ j ≤ m, the constant function ξ is a solution of (P).Let us provide an upper bound for these values M j .
Let y 0 be the solution of the state equation associated to the control u ≡ 0. From the optimality of ūα we get From these inequalities we deduce From the adjoint state equation we obtain where C Ω is the constant satisfying z L 2 (Ω) ≤ C Ω ∇z L 2 (Ω) for any z ∈ H 1 0 (Ω).From the definition of Φj and the above estimates we get for every t ∈ [0, T ] Relations (25) imply that ū α,j = 0 if α j > M j .
To prepare for the second order necessary conditions we introduce the critical cone as follows ( 26) It seems natural that the second order optimality conditions must be imposed only on those directions where the directional derivatives vanish.Let us point out some properties of this critical cone.
Proposition 12. C ū is a closed convex cone that can equivalently be expressed in the form where v js is the singular part of the measure v j with respect to |ū j |.
The identity (27) shows that the criterion for v to be in C ū can be expressed in terms of the singular part of v j with respect to |ū j | for 1 ≤ j ≤ m.In particular, any function v ∈ B(0, T ) m such that v j is absolutely continuous with respect to |ū j | for every j is an element of the critical cone.
Proof.The cone property and closedness of C ū are a straightforward consequence of the continuity and positive homogeneity of the mapping v → F (ū)v + G (ū; v).Let us prove the convexity property.First, we observe that (10) and the definition of Φj implies that ( 28) Taking into account (23), using the definition of the subdifferential and passing to the limit as ρ 0 we infer for 1 Multiplying this inequality by α j and summing in j we get with (28) 28), making an integration by parts as in the proof of Theorem 9, and using the Lebesgue decomposition dv j = h v j d|ū j | + dv js , we obtain From (25) we deduce that d|ū j | = 1 αj Φj dū j for 1 ≤ j ≤ m.Inserting this identity in the above equality we infer (30) Now, using ( 16) it follows that This equality and (30) lead to which is equivalent to the expressions given in (27) for 1 ≤ j ≤ m.
Now we formulate the second order necessary optimality conditions.
Theorem 13.If ū is a local minimum of (P), then F (ū)v 2 ≥ 0 for all v ∈ C ū.
Proof.Let v be an element in C ū and consider the Lebesgue decomposition dv j = h v j d|ū j | + dv js , 1 ≤ j ≤ m.For every integer k ≥ 1 we set Let us take v j,k ∈ L 1 (0, T ) as the primitive of v j,k with T 0 (v j − v j,k ) dt = 0, and set m .Moreover, since the singular parts of v j,k and v j with respect to |ū j | coincide and v ∈ C ū, then (27) implies that v k ∈ C ū for every k.
For any 0 < ρ < 1 k , using ( 13) and ( 14), we find Using that ū is a local minimum of J and making a Taylor expansion we obtain for every k and 0 < ρ < 1 k the existence of θ = θ(k, ρ), with 0 < θ < 1, such that Finally, dividing the last term by ρ/2 and taking the limit for ρ → 0 and subsequently for k → ∞, we arrive at F (ū)v 2 ≥ 0.
As usual we have to consider an extended cone of critical directions to formulate a sufficient second order condition for optimality.For every τ > 0 we denote where z v = S (ū)v, with S defined just above Proposition 2. The second order condition involves this cone as follows: (SSOC) There exist positive constants κ and τ such that Theorem 14.Let ū ∈ BV (0, T ) m satisfy the first order optimality conditions (21)-( 22) and (SSOC).Then, there exist positive constants ε > 0 and ν > 0 such that The proof of this theorem can be done along the lines of [8,Theorem 9].Let us point out some small differences.First, the parameter γ in [8] must be taken as zero.Second, we have a nondifferentiable part in the cost functional and a slightly different cone of critical directions.To deal with the nondifferentiable term G we use (29) and its convexity and Lipschitz continuity: for every u ∈ BV (0, T ) m , In this way we eliminate the nondifferentiable part of the cost functional.The rest is the same.
Corollary 15.Under the assumptions of Theorem 14 there exist two constants ε > 0 and δ > 0 such that This is an immediate consequence of (32) and the estimate see [8,Corollary 3] for the proof.We observe that the sufficient second order optimality condition (31) along with the first order optimality condition imply that ū is a strong local solution of (P).

Approximation of the control problem.
In this section we assume that Ω is a convex set and y 0 ∈ L ∞ (Ω) ∩ H 1 0 (Ω).Then, it is well known that the solutions [21,Proposition 2.4].We consider a dG(0)cG(1) discontinuous Galerkin approximation of the state equation (1), i.e., piecewise constant in time and linear nodal basis finite elements in space; see, e.g., [22].Let {K h } h>0 be a quasi-uniform family of triangulations of Ω; see [9].We set Ω h = ∪ K∈K h K with Ω h and Γ h being its interior and boundary, respectively.We assume that the vertices of K h placed on the boundary Γ h are also points of Γ and there exists a constant C Γ > 0 such that dist(x, Γ) ≤ C Γ h 2 for every x ∈ Γ h .This always holds if Γ is a C 2 boundary and n = 2.In the case of polygonal or polyhedral domains it is reasonable to assume that the triangulation satisfies Γ h = Γ, hence this condition obviously holds.This also holds if n = 1.From this assumption we know [19, section 5.2] that (34) where | • | denotes the Lebesgue measure.We also introduce a temporal grid 0 = t We assume that there exist ρ T > 0 such that τ ≤ ρ T τ k for 1 ≤ k ≤ N τ .We will use the notation σ = (h, τ ) and where χ k denotes the characteristic function of the interval I k .Let us observe that the elements u τ ∈ U τ are piecewise constant functions whose distributional derivative is given by ( 35) where δ t denotes the Dirac measure concentrated at the point t.We further define the projection operator Proposition 16.For any u ∈ BV (0, T ) the following properties hold: Proof.The inequality (36) is simple to establish for u ∈ C 1 [0, T ].Henceforth, let u ∈ BV (0, T ).Then there exists a sequence see [1,Remark 3.22].Now we estimate as follows: Using (39) we can pass to the limit in the above inequality as j → ∞ to deduce (36).
4.2.Discrete state equation.Associated with the interior nodes of the triangulation {x j } N h j=1 we consider the space where {e j } N h j=1 is the nodal basis formed by the continuous piecewise linear functions such that e j (x i ) = δ ij for every 1 ≤ i, j ≤ N h .For every σ we define the space of discrete states by We approximate the state equation (1) as follows.For any control u ∈ BV (0, T ) m we define the associated discrete state y σ ∈ Y σ as the solution of the system (41) where (•, •) denotes the scalar product in L 2 (Ω), a is the bilinear form associated to the operator −∆, i.e., a(y, z) = Ω ∇y • ∇z dx, and y 0h is the projection P h y 0 of y 0 on Y h given by the variational equation It is well known that y 0h → y 0 in H 1 0 (Ω).
Proposition 17.For every u ∈ BV (0, T ) m the system (41) has a unique solution y σ ∈ Y σ .In addition, if either f is affine with respect to the state or if n < 3, then the following estimate holds: where C is independent of σ.
Remark 18.These results are proved in [16] and [17] for f affine and nonlinear, respectively.The constant C there depends on the norms of the state in H 2,1 (Q), and also on the L ∞ (Q) norm in the semilinear case.These quantities can be estimated in our case by the L 2 (0, T ) m norm of u.During the preparation of this manuscript the following result was proved by Boris Vexler.Assuming that τ ≤ C 0 h θ for some C 0 > 0 and θ > 0, and y 0 ∈ H 2 (Ω) ∩ H 1 0 (Ω), then the estimate holds.
Remark 19.Given {u j } m j=1 ⊂ BV (0, T ), we observe that Utilizing this in (41), we deduce that the discrete states associated to {u j } m j=1 and {Λ τ u j } m j=1 coincide.4.3.Discrete optimal control problem.The discrete control problem is defined as where y σ is the discrete state associated to u = (u j ) m j=1 .The following assumption will be used to analyze the existence and uniqueness of a solution of (P σ ): There exists h 0 > 0 such that (A) holds for every h < h 0 .
Proof.Let us recall that {e k } N h k=1 denotes the nodal basis of Y h .Since the supports ω j of the functions g j are compact and disjoint, we deduce the existence of ĥ > 0 such that for every h < ĥ, if for some e k and some 1 ≤ j ≤ m we have that supp(e k )∩ω j = ∅, then supp(e k )∩ω i = ∅ for every i = j.
Moreover, there exists h with the following property: ∀h < h and ∀j there exists some k such that (g j , e k ) = 0. Indeed, if this is not the case, we infer the existence of a sequence {h i } ∞ i=1 decreasing to 0 such that (g j , z hi ) = 0 for every z hi ∈ Y hi .In particular, taking z hi equal to the L 2 (Ω)-projection of g j on Y hi we obtain which contradicts the assumption g j = 0 imposed for (P).
Finally, for any h < h 0 = min{ ĥ, h} the assumption (A) holds.If not, then there exists a vector (a For any j we choose e k ∈ Y h such that (g j , e k ) = 0. Hence, supp(e k )∩ω j = ∅, and supp(e k )∩ω i = ∅ holds for every i = j.Then, which implies that a j = 0. Since j was arbitrary in {1, . . ., m} we arrive at a contradiction.
Theorem 21.Let us assume that (A) holds.Then problem (P σ ) has at least one solution.Moreover, if ũ is a solution of (P σ ), then ūτ = (Λ τ ũj ) m j=1 is also a solution of (P σ ).In addition, if f is affine with respect to y, then ūτ is the unique solution belonging to U m τ .Proof.To establish the existence of a solution ũ we follow the lines of the proof of Theorem 7. The only concern is the boundedness of the sequence {a k } ∞ k=1 in R m .For this purpose we consider the difference z σ,k = y σ,k − ŷσ,k , where y σ,k and ŷσ,k are the solutions to (41) corresponding to u k and ûk , respectively.Thus, z σ,k is solution of the following system: where ξ i,h;k = ŷi,h;k + θ i,h;k (x, t)z i,h;k with 0 ≤ θ i,h;k (x, t) ≤ 1.
As in the proof of Theorem 7 we have that k=1 and {y σ,k } ∞ k=1 are bounded in L ∞ (Q) as well.Therefore, the sequences {ξ i,h;k } ∞ k=1 are also bounded in L ∞ (Ω × I i ).Again we argue by contradiction and we assume that ρ k .By taking a subsequence we have that ζ σ,k → 0 in L ∞ (Q) and âk,j → âj , 1 ≤ j ≤ m for some {â j } m j=1 ⊂ R. We observe that by definition of ρ k the vector â = 0. Dividing (43) by ρ k we obtain the mentioned subsequence Passing to the limit in this system as k → ∞ we infer that m j=1 Hence, assumption (A) implies â = 0, which is the desired contradiction.Consequently, the sequence {a k } ∞ k=1 is bounded, so the existence of a solution ũ follows by standard arguments.
The fact that ūτ = (Λ τ ũj ) m j=1 is also a solution of (P σ ) is an immediate consequence of Remark 19 and inequality (37).Finally, we prove the uniqueness of a solution in U m τ if f is affine with respect to the state.First, we observe that both terms in the cost functional are convex in this case.Moreover, the first term is strictly convex on U m τ provided that the affine mapping u τ → y σ is injective.To this end we assume that for some u τ = (u j ) m j=1 ∈ U m τ , with u j = Nτ k=1 u j,k χ k , the associated discrete state y σ is identically zero.Then from (41) we have that Again by assumption (A) we infer that u j = 0 for every 1 ≤ j ≤ m, hence u τ = 0.
Remark 22.In the case that β j > 0 for all 1 ≤ j ≤ m, condition (A) is not needed to establish the existence of a solution of (P σ ).However, it is still necessary for the uniqueness in the case that f is affine with respect to y.
The rest of this section is devoted to the formulation of the first order optimality conditions for the problem (P σ ).Arguing in a similar way as for the continuous problem (P), we separate the smooth and the convex parts of J σ , where y σ is related to u by (41).The derivative of F σ is expressed by where ϕ σ ∈ Y σ is the adjoint state associated to u, i.e., Using this expression for F σ and arguing exactly as in the proof of Theorem 9 we obtain the first order optimality conditions for a local solution ūτ ∈ BV (0, T ) m of (P σ ).For this purpose we introduce the functions where φσ ∈ Y σ is the adjoint state associated to ūτ .
Corollary 24.Let ūτ = (ū τ,j ) m j=1 ∈ U m τ be a local solution of (P σ ).Then, for each j ∈ {1, . . ., m} such that ūτ,j is not a constant function on [0, T ], we have where ū τ,j = ū + τ,j − ū − τ,j is the Jordan decomposition of the measure ū τ,j .Proof.The proof of this result is a consequence of the representation formula for ū τ given in (35).In addition, we use 1 αj Φσ,j ∈ ∂g(ū τ,j ) along with Proposition 4, and the fact that ū τ,j = 0 by assumption.Finally, we take into account that Φ σ,j is piecewise linear and continuous, and Φ σ,j (0) = Φ σ,j (T ) = 0. Consequently, its maximal and minimal values are attained at the interior grid points {t k } Nτ −1 k=1 . 5. Convergence analysis.The goal of this section is to prove the convergence of solutions of (P σ ) to solutions of (P) as σ → 0. Additionally, we give some error estimates for the difference between the optimal discrete and continuous states.
Theorem 25.Let us assume that either f is affine with respect to y or β j > 0 for every 1 ≤ j ≤ m, and let {ū τ } τ ⊂ BV (0, T ) m be a family of global solutions of problems (P σ ), σ = (h, τ ).Then this family is bounded in BV (0, T ) m .In addition, if f is affine or n < 3, then any weak * limit ū of a subsequence when σ → 0 is a global solution of (P).For such a subsequence we have where ȳ and ȳσ are the continuous and discrete states associated to ū and ūτ , respectively.
For the proof we will use the following lemma.
Lemma 26.Let d σ ∈ L 2 (Q) and take y σ ∈ Y σ to be the solution of Then, there exists a constant C Ω > 0 dependent only on Ω such that (52) Proof.The proof is standard, except for the nonlinear term.Choosing z h = y k,h in (51), we obtain Using the monotonicity of f with respect to y we deduce The rest of the proof can be completed as in the linear case.Let ŷτ be the discrete state associated with ûτ .The proof is divided into three steps.
The compactness of the embedding BV (0, T ) ⊂ L p (0, T ) for every p ∈ [1, +∞) implies the strong convergence ūτ → ū in L p (0, T ) m .Let us denote by ȳ and ŷσ the continuous and discrete states corresponding to ū.From Proposition 17 we know that ŷσ → ȳ in L 2 (Q) as σ → 0. Subtracting the equations satisfied by ȳσ and ŷσ we obtain for In the case of an affine function f we simply have ∂ y f (x, t, ξ k,h ) = c 0 (x, t).Arguing as in Lemma 26 and using that ∂ y f ≥ 0 we get Hence, ȳσ = ŷσ + ζ σ → ȳ in L 2 (Q).Now, the following relations hold: As a consequence we have G(ū) = lim τ →0 G(ū τ ).Finally, taking into account that ū j M(0,T ) ≤ lim inf τ →0 ū τ,j M(0,T ) for 1 ≤ j ≤ m, we deduce ū τ,j M(0,T ) → ū j M(0,T ) for 1 ≤ j ≤ m.This completes the proof.The next theorem addresses the approximation of local solutions of (P) by local minima of (P σ ).It is in some sense a converse of the previous theorem.
We consider the problem The existence of at least one solution ūτ for (P σ,ρ ), σ = (h, τ ), is obvious.Arguing as in the proof of the previous theorem, we deduce that {ū τ } τ has converging subsequences and any of these limits is a solution of the problem Since ū is the unique solution of (P ρ ), it follows that the entire family {ū τ } τ converges to ū in the sense of ( 49) and (50).Due to the convergence ū − ūτ L p (0,T ) m → 0, we deduce the existence of σ 0 such that ūτ ∈ B ρ (ū) for every |σ| ≤ |σ 0 |, and hence ūτ is a local minimum of (P σ ) in the ball B ρ (ū).
The rest of this section is devoted to the analysis of the rate of convergence for the states ȳ − ȳσ L 2 (Q) .Let ū be a local solution of (P) such that the sufficient second order condition (SSOC) (31) holds.Theorem 14 implies that ū is a strict strong local solution, and hence it is a strict L p (0, T ) m -local solution as well.Let ρ > 0 such that ū is a global minimum of J in Bρ (ū) ∩ BV (0, T ) m .Let {ū τ } τ be a family of global minima of J σ on Bρ (ū) ∩ BV (0, T ) m converging to ū in L p (0, T ) m , for p > 1.Then we have the following rate of convergence of the associated states.
Theorem 28.Let us assume that ū satisfies the (SSOC) and that either f is affine or n < 3 holds.Then, under the above notations, there exists C > 0 independent of σ such that for all σ sufficiently small Proof.Since ūτ → ū in L p (0, T ) m with p > 1, we have that y ūτ − ȳ L ∞ (Q) → 0 as σ → 0, where y ūτ is the continuous state corresponding to ūτ .Let > 0 be as introduced in Corollary 10.Then there exists σ ε such that where Let us estimate these terms.For the first term we use Proposition 17 as follows: The third term is estimated in the same way, and for the second it is enough to observe the last inequality being a consequence of the fact that J σ achieves the minimum value in the ball B ρ (ū) ∩ BV (0, T ) m at ūτ .All together this leads to Finally, we obtain where we have used again Proposition 17.
Remark 29.In the case that f is nonlinear and n = 3, arguing as in the proof of the above theorem and using the inequality of Remark 18, we obtain the estimate Remark 30.Under the assumptions of the above theorem, and supposing that y d ∈ L 2 (0, T ; L 4 (Ω)), and using (34) and Proposition 17, we can argue as in [4,Theorem 5.1] to deduce that |J(ū) − J σ (ū τ )| ≤ C(τ + h 2 ).In the case of a nonlinear function f and n = 3, Remark 18 implies 6. Numerical solution.In this section we show how (P σ ) can be solved numerically.We take f ≡ 0 and y 0 ≡ 0 in (1), i.e., we consider the case of a linear state equation with zero state at the initial time.

6.1.
A fully discrete formulation.Defining y d,σ as the L 2 (Q h ) projection of y d onto Y σ , problem (P σ ) can be equivalently expressed as Therefore, Theorem 21 guarantees that we can find a solution for (P σ ) by solving In the following we denote N ρ = mN τ and vτ = (v 11 , v 12 , . . ., v 1Nτ , v 21 , . . ., v mNτ ) T for every vτ ∈ R Nρ .Furthermore, let us set Using that every u τ ∈ U m τ can be represented by a coefficient vector ûτ ∈ R Nρ and defining dτ ∈ R Nρ by d j1 = u j1 and d jk = u jk − u j(k−1) for 1 ≤ j ≤ m and 2 ≤ k ≤ N τ , we infer from (35) that (Q σ ) is equivalent to the finite-dimensional optimization problem where S ∈ R Nσ×Nρ is the discrete control-to-state mapping d → y(d), and M σ ∈ R Nσ×Nσ and Q ∈ R Nρ×Nρ are the matrix representations of the quadratic forms appearing in the first and last terms of (Q σ ).The precise form of these matrices can be found in the preprint of this paper.

Discrete optimality conditions and regularization. Since
. Since both the differentiable and the nondifferentiable part of J ρ are continuous, we obtain from the sum rule that 0 where we have used that M σ and Q are symmetric.Thus, d * τ is optimal for (Q ρ ) if and only if there exists λ * τ ∈ R Nρ such that (60) The sum rule and the chain rule (cf.[11, Chapter I, Proposition 5.7]) yield that ∂Ψ( d * τ ) ⊂ R Nρ is given by where ψ : R → R denotes ψ(x) = |x|.We recognize in the discrete version of ( Φj ) m j=1 (cf.( 20)), which indicates that first-discretize-thenoptimize and first-optimize-then-discretize coincide.To enable the use of semismooth Newton methods we proceed in two steps.The first step is to apply a regularization to (Q ρ ).More precisely, instead of (Q ρ ) we consider for γ > 0 the problem where Ψ γ is defined by for 1 ≤ j ≤ m and 2 ≤ k ≤ N τ .We notice that (Q ρ,γ ) can be interpreted as the discrete counterpart of min Since there holds u j L 1 (0,T ) = u j M(0,T ) for this problem due to u j ∈ L 1 (0, T ), this problem can be regarded as a regularized version of (P).
Arguing as above we obtain that (Q ρ,γ ) has the optimality conditions (60), but with ∂Ψ replaced by ∂Ψ γ .In addition, ∂Ψ γ has the same structure as ∂Ψ, but with ∂ψ in the component jk replaced by ∂ψ k γ , where Therefore, the optimality conditions of (Q ρ,γ ) can be recast as where we have employed the definition ( λα τ ) jk = α j λ jk for 1 ≤ j ≤ m and 1 ≤ k ≤ N τ , and used for 1 ≤ j ≤ m the mappings F γ,j : R Nρ × R Nρ → R Nτ given by Since F γ is semismooth, we can apply a semismooth Newton method to solve F γ = 0.
6.3.Path-following algorithm.Since we have approximated (Q ρ ) by (Q ρ,γ ), we consider a path-following algorithm that drives γ to zero.It is called Algorithm BV.In this algorithm we use the definition v Algorithm BV: Path-following method to solve (Q ρ ).
Several variants of this algorithm are conceivable.For instance, a damping strategy could be included, TOL F could depend on γ k , and ν could vary with k.

Numerical examples.
We illustrate our findings by three examples.Our main goal is to exemplify the structure of optimal controls for (P).Throughout, we treat the case where f ≡ 0, β j = 0 for all j, and y 0 ≡ 0. In particular, (P) is convex and Theorem 7 yields the existence of a unique and global optimal solution.
In all examples we consider controls defined on (0, T ) = (0, 2) and employ uniformly spaced temporal and spatial grids.We found γ 0 = 1, TOL F = 10 −12 , TOL γ = 10 −14 , as well as ν = 0.1 (for the majority of examples), and ν = 0.5 (for some examples) to be reliable choices in Algorithm BV.We use d0 τ = 0 and take λ0 τ such that ( d0 τ , λ0 τ ) satisfies the condition S T M σ (S dτ − ŷd,σ ) + λα τ = 0 in the optimality system F γ = 0.When γ k reaches TOL γ , the inner while loop in Algorithm BV is executed until ≤ TOL F are satisfied for three consecutive i.We use GM-RES to solve the nonsymmetric linear system (61) to a relative accuracy of 10 −12 .
Due to the presence of S and S T in (61), each iteration of GMRES requires to solve two PDEs.These PDE solves are performed to a relative accuracy of 10 −12 using preconditioned GMRES.
7.1.Example 1: One control and one spatial dimension.We start with an example in which m = 1, Ω = (−1, 1), and ω = (0, 1).The remaining specifications are made such that an exact analytic solution ū of (P) is known.The optimal control ū exhibits l ∈ N jumps and it is constant apart from these jumps.Consider min u∈BV (0,T ) where y u is the solution to the parabolic state equation We take g ≡ 1 in ω and g ≡ 0 elsewhere, i.e., g = χ ω .Let κ > 0, l ∈ N, and In particular, this implies ū = To conclude that ū is the optimal solution of the above optimization problem, we check if ū satisfies the necessary optimality conditions of Theorem 9. Since we are dealing with a convex problem, this is already sufficient for global optimality.Alternatively, the optimality of ū can be established using the conditions from Theorem 14, in particular the condition (SSOC).Considering the first order conditions from Theorem 9, we first note that the adjoint equation L * ϕ ū = y ū − y d together with boundary conditions is satisfied by construction.Second, we confirm that which establishes ( 21) and (22).Thus, ū is optimal.In view of Corollary 10 we note where the inclusion is an equality if and only if all c k are positive.Since we have c k and we easily compute ), the optimal value is given by For the numerical experiments we choose l = 5, κ = 0.01, c 1 = c 3 = c 5 = 2, and c 2 = c 4 = 1, which yields ᾱ = 1/(125π 2 ) ≈ 8.1 • 10 −4 and J(ū) ≈ 1.9 • 10 −2 .Furthermore, it implies that ū exhibits five jumps, which occur exactly at those t where Φ(t) = ᾱ.
Unless indicated otherwise we employ N t = 2560 and N h = 255, which corresponds to τ = 1/1280 and h = 1/128.Application of Algorithm BV yields ȳσ , ūτ , and the optimal dual variable λτ , which can be interpreted as discretization of λ = 1 ᾱ Φ = 1 2 (1 − cos(5πt)).These quantities-more precisely, linear interpolations of them-are depicted together with y d,σ in Figure 1.We observe that ūτ and λτ resemble closely their continuous counterparts ū and λ.In particular, ūτ clearly displays the five distinct jumps of ū.
To assess the discretization errors we apply Algorithm BV on different grids, where each grid satisfies N τ = 10((N h + 1)/16) 2 .We use we require ȳ.Since ȳ is not known explicitly, we compute y σ (ū) on a very fine grid and use it as a replacement.The grid for the computation of y σ (ū) is described by N h + 1 = 2 9 and, as before, N τ = 10((N h + 1)/16) 2 , which gives τ = 10240 and N h = 511.Let us point out that the large number of time steps is a consequence of the choice τ = τ (h) = O(h 2 ) that we make since the error estimates in Theorem 28 and Remark 30 predict convergence order O( √ τ + h), respectively, O(τ + h 2 ).For the error ȳ − ȳσ L 2 (Q) we observe quadratic convergence in Figure 2, which is better than the result from Theorem 28.This agrees to some extent with previous contributions on optimal control with measures (cf.[3,4,15,18]), where it is also observed that this error decays faster than linear.The error ȳ − y σ (ū) L 2 (Q) converges quadratically, which is in accordance with Proposition 17.The optimal objective value appears to converge at a cubic rate.This is faster than we would expect from Remark 30.Next, we investigate the influence of α on solutions of (P).For this purpose we continue to work with l = 5, κ = 0.01, c 1 = c 3 = c 5 = 2, and c 2 = c 4 = 1.In particular, we keep the corresponding y d .However, instead of ᾱ = 1/(125π 2 ) we use in the objective.We stress that for θ = 1 we do not know the exact solution of (P).Employing L * φ = κ( π 2 4 sin(lπt) cos( π 2 x) − lπ cos(lπt) cos( π 2 x)) it follows from the definition that y d does not satisfy the initial condition y(x, 0) ≡ 0 of the state equation.This implies ȳ = y d regardless of the value of θ.Figures 3, 4, and 5 show ȳσ = ȳθ σ , ūτ = ūθ τ , and λτ = λθ τ for different values of θ.We observe that ūθ τ is constant for θ = 100.Although not depicted, this is also true for every θ > 100 that we tested.Hence, in accordance with Remark 11 the optimal control is constant for sufficiently  (1 − cos(5πt)).These quantities-more precisely, linear interpolations of themare depicted together with y d,σ in Figure 7.1.We observe that ūτ and λτ resemble closely their continuous counterparts ū and λ.In particular, ūτ clearly displays the five distinct jumps of ū.
To assess the discretization errors we apply Algorithm BV on different grids, where each grid satisfies N τ = 10((N h + 1)/16) 2 .We use N h + 1 = 2 j with 4 ≤ j ≤ 8.The resulting errors ȳ − ȳσ L 2 (Q) and |J(ū) − J σ (ū τ )| are plotted in Figure 7.2.Moreover, this figure shows the error ȳ − y σ (ū) L 2 (Q) .To evaluate ȳ − ȳσ L 2 (Q) and ȳ − y σ (ū) L 2 (Q) we require ȳ.Since ȳ is not known explicitly, we compute y σ (ū) on a very fine grid and use it as replacement.The grid for the computation of y σ (ū) is described by N h + 1 = 2 9 and, as before, N τ = 10((N h + 1)/16) 2 , which gives τ = 10240 and N h = 511.Let us point out that the large number of time steps is a consequence of the choice τ = τ (h) = O(h 2 ) that we make since the error estimates in Theorem 5.4 and Remark 5.6 predict convergence order O( √ τ + h), respectively, O(τ +h 2 ).For the error ȳ − ȳσ L 2 (Q) we observe quadratic convergence in Figure 7.2, which is better than the result from Theorem 5.4.This agrees to some extent with previous contributions on optimal control with measures, cf.[3,4,15,18], where it is also observed that this error decays faster than linear.The error ȳ − y σ (ū) L 2 (Q) converges quadratically, which is in accordance with Proposition 4.2.The optimal objective value appears to converge at a cubic rate.This is faster than we would large values of α.As θ decreases, the number of jumps of ūθ τ increases.For θ < 1 jumps with negative height occur.Approximately around θ = 0.1 the measures of supp((ū θ τ ) ) and {t ∈ (0, T ) : λθ τ (t) = ±1} become positive.As θ decreases further, these measures increase further.
To draw a comparison between (P) and the classical L 2 -regularized tracking problem, we now replace α θ u M(0,T ) in the objective by The precise form of Q can be found in the preprint of this paper.Figure 6 depicts the optimal controls ūθ τ,L 2 that we obtain for α θ = θ ᾱ and various values of θ. Figure 7 shows the corresponding tracking errors as well as the tracking errors for (P).It also displays the norms of the controls as they appear in the objective.The missing data point for the norm of the BV-control at θ = 100 results from the fact that the corresponding control is constant, hence its BV-seminorm equals zero.We observe that the tracking errors for both control problems have a similar order of magnitude.From a practical point of view, however, the controls of (P) have a simpler structure.We note, in particular, that for θ ≈ 5 the tracking errors are approximately equal for the L 2 and BV-seminorm cases.The BV-control, however, is cheaper and also reproduces four jumps, whereas the L 2 -control has a complicated structure.We observe that ūθ τ is constant for θ = 100.Although not depicted, this is also true for every θ > 100 that we tested.Hence, in accordance with Remark 3.5 the optimal control is constant for sufficiently large values of α.As θ decreases, the number of jumps of ūθ τ increases.For θ < 1 jumps with negative height occur.Approximately around θ = 0.1 the measures of supp((ū θ τ ) ) and {t ∈ (0, T ) : λθ τ (t) = ±1} become positive.As θ decreases further, these measures increase further.
To draw a comparison between (P) and the classical L 2 -regularized tracking problem, we now replace α θ u M(0,T ) in the objective by α θ 2 u 2 L 2 (0,T ) .The discretization of α θ 2 u 2 L 2 (0,T ) is given by α θ 2 dT τ QT Q dτ with Q ∈ R Nτ ×Nτ .The precise form of Q can be found in the preprint of this paper.Figure 7.6 depicts the optimal controls ūθ τ,L 2 that we obtain for α θ = θ ᾱ and various values of θ.
as well as the tracking errors for (P).It also displays the norms of the controls as they appear in the objective.The missing data point for the norm of the BV-control at θ = 100 results from the fact that the corresponding control is constant, hence its BV-seminorm equals zero.We observe that the tracking errors for both control problems have a similar order of magnitude.From a practical point of view, however, the controls of (P) have a simpler structure.We note, in particular, that for θ ≈ 5 the tracking errors are approximately equal for the L 2 and BV-seminorm cases.The BV-control, however, is cheaper and also reproduces 4 jumps, whereas the L 2 -control has a complicated structure.Moreover, we demonstrate that even in the absence of strict complementarity Algorithm BV yields optimal controls that retain the simple structure of their continuous counterparts.In this example we have Ω = (−1, 1), and The following construction ensures that for every j the optimal control ūj has exactly 0 ≤ l j ≤ m jumps and is constant apart from these jumps.We consider as well as for 1 and ) for all j we readily confirm the optimality of ū = (ū j ) m j=1 in a similar manner as in the first example.The numerical results that follow are obtained by choosing m = 3, ), ω 3 = ( 1 2 , 1), κ = 10 −2 , c 11 = 5, c 22 = 3, c 33 = 1, and all other c jk equal to zero.This implies that ū1 , ū2 and ū3 each have exactly one jump.These choices are specifically made to study the numerical behavior in situations where the inclusion supp(ū + j ) ⊂ {t ∈ [0, T] : Φj (t) = α j } is strict, which is equivalent to saying that strict complementarity does not hold.Similar to Example 1, we use y σ (ū) as replacement for ȳ.We apply Algorithm BV with N t = 6144 and N h = 255, which corresponds to 7.2.Example 2: Three controls and one spatial dimension.The second example generalizes the first one by allowing for m ∈ N controls rather than only one.Moreover, we demonstrate that even in the absence of strict complementarity Algorithm BV yields optimal controls that retain the simple structure of their counterparts.In this example we have Ω = (−1, 1), and The following construction ensures that for every j the optimal control ūj has exactly 0 ≤ l j ≤ m jumps and is constant apart from these jumps.We consider where y u denotes the solution to (62), but with ug replaced by m j=1 u j g j .We take g j = χ ωj for all j.Let κ > 0 and c jk ≥ 0 for 1 ≤ j, k ≤ m.Define  τ = 1/3072 and h = 1/128.Figure 7.8 displays y d,σ , ȳσ , (ū τ,j ) j , and ( λτ,j ) j .The dual variables ( λτ,j ) j resemble closely their continuous counterparts ( λj ) j = 1 αj Φj = 1 2 (1 − cos(3πt)).In particular, each of them has three isolated maximums with value approximately 1.The approximated optimal controls (ū τ,j ) j appear to be very similar to the continuous optimal controls (ū j ) j .In particular, each of these controls exhibits exactly one jump and thus reproduces very well the simple structure of its continuous analogue.Summarizing we conclude from this example and other experiments that the case of strict inclusion supp(ū j ) {t ∈ [0, T] : Φj (t) = ±α j } can be handled very well by Algorithm BV.
7.3.Example 3: One control and two spatial dimensions.The first two examples are structurally similar to each other.In particular, in both examples the desired states y d have a rather low temporal regularity.Contrary to this, the third example is constructed in such a way that y d is C ∞ with respect to time and space.Moreover, the spatial domain Ω is two dimensional in this example.In this entirely different setup we will again observe that the optimal control has a very simple structure.We choose m = 1, Ω = (−1, 1) 2 , ω = (0, 1) 2 and consider the same objective as well as for 1 and where L = ∂ ∂t −∆ and ȳ = y ū.Observing Φj (t) = αj 2 (1−cos(mπt)) for all j we readily confirm the optimality of ū = (ū j ) m j=1 in a similar manner as in the first example.The numerical results that follow are obtained by choosing m = 3, ), ω 3 = ( 1 2 , 1), κ = 10 −2 , c 11 = 5, c 22 = 3, c 33 = 1, and all other c jk equal to zero.This implies that ū1 , ū2 , and ū3 each have exactly one jump.These choices are specifically made to study the numerical behavior in situations where the inclusion supp(ū + j ) ⊂ {t ∈ [0, T] : Φj (t) = α j } is strict, which is equivalent to saying that strict complementarity does not hold.Similar to Example 1, we use y σ (ū) as a replacement  function and state equation as in the first example, except that Ω and ω are different.We take g = χ ω , y d (x 1 , x 2 , t) = (x 1 − 1.2)(x 1 + 1)(x + 1)(x 2 − 0.9)te −t , and ᾱ = 10 −3 .The choice of y d yields ȳ = y d since y d does not satisfy the boundary conditions of the state equation.We apply Algorithm BV with N t = 512 and N h = 63 2 , which corresponds to τ = 1/256 and h = (2 − √ 2)/64.Figure 7.9 shows y d,σ and ȳσ at different points in time.Moreover, it depicts ūτ = ūτ,BV and λτ , as well as the optimal control ūτ,L 2 obtained through classical L 2 -regularization (analogously as for Example 1).It seems that in this example {t ∈ [0, T ] : Φ(t) = ±ᾱ} does not consist of a finite number of points, but has positive measure.However, the structure of ū is still very simple.In particular, ū is constant on large parts of its domain.
While the tracking errors associated to the controls in Figure 7.9 are comparable,   function and state equation as in the first example, except that Ω and ω are different.We take g = χ ω , y d (x 1 , x 2 , t) = (x 1 − 1.2)(x 1 + 1)(x 2 + 1)(x 2 − 0.9)te −t , and ᾱ = 10 −3 .The choice of y d yields ȳ = y d since y d does not satisfy the boundary conditions of the state equation.We apply Algorithm BV with N t = 512 and N h = 63 2 , which corresponds to τ = 1/256 and h = (2 − √ 2)/64.Figure 7.9 shows y d,σ and ȳσ at different points in time.Moreover, it depicts ūτ = ūτ,BV and λτ , as well as the optimal control ūτ,L 2 obtained through classical L 2 -regularization (analogously as for Example 1).It seems that in this example {t ∈ [0, T ] : Φ(t) = ±ᾱ} does not consist of a finite number of points, but has positive measure.However, the structure of ū is still very simple.In particular, ū is constant on large parts of its domain.
While the tracking errors associated to the controls in Figure 7.9 are comparable,  for ȳ.We apply Algorithm BV with N t = 6144 and N h = 255, which corresponds to τ = 1/3072 and h = 1/128.Figure 8 displays y d,σ , ȳσ , (ū τ,j ) j , and ( λτ,j ) j .The dual variables ( λτ,j ) j resemble closely their continuous counterparts ( λj ) j = 1 αj Φj = 1 2 (1 − cos(3πt)).In particular, each of them has three isolated maximums with value approximately 1.The approximated optimal controls (ū τ,j ) j appear to be very similar to the continuous optimal controls (ū j ) j .In particular, each of these controls exhibits exactly one jump and thus reproduces very well the simple structure of its continuous analogue.Summarizing we conclude from this example and other experiments that the case of strict inclusion supp(ū j ) {t ∈ [0, T] : Φj (t) = ±α j } can be handled very the BV-control is simpler than that of the L 2 -control.For the control terms in the objectives we have ᾱ (ū τ,BV ) M(0,T ) ≈ 4 • 10 −4 and ᾱ 2 ūτ,L 2 2 L 2 (0,T ) ≈ 1 • 10 −2 .8. Conclusions.In this paper we gave a rather complete analysis for optimal control problems governed by semilinear parabolic equations for the case where the temporal control cost is realized in the BV-seminorm.This leads to optimal controls that are piecewise constant in time.This simple structure of the optimal controls, which is confirmed analytically and numerically, is desirable from a practical point of view.It is distinctly different from optimal controls that arise from quadratic controlcost functionals.The obtained results can be expanded in several directions.For instance, it would be interesting to consider controls that are BV functions in space and time, or to use BV functionals in the context of switching controls.well by Algorithm BV.Moreover, the spatial domain Ω is two dimensional in this example.In this entirely different setup we will again observe that the optimal control has a very simple structure.We choose m = 1, Ω = (−1, 1) 2 , ω = (0, 1) 2 , and consider the same objective function and state equation as in the first example, except that Ω and ω are different.We take g = χ ω , y d (x 1 , x 2 , t) = (x 1 − 1.2)(x 1 + 1)(x 2 + 1)(x 2 − 0.9)te −t , and ᾱ = 10 −3 .The choice of y d yields ȳ = y d since y d does not satisfy the boundary conditions of the state equation.We apply Algorithm BV with N t = 512 and N h = 63 2 , corresponds to τ = 1/256 and h = (2 − √ 2)/64.Figure 9 shows y d,σ and ȳσ at different points in time.Moreover, it depicts ūτ = ūτ,BV and λτ , as well as the optimal control ūτ,L 2 obtained through classical L 2 -regularization (analogously as for Example 1).It seems that in this example {t ∈ [0, T ] : Φ(t) = ±ᾱ} does not consist of a finite number of points, but has positive measure.However, the structure of ū is still very simple.In particular, ū is constant on large parts of its domain.
While the tracking errors associated to the controls in Figure 9 are comparable,

Conclusions.
In this paper we gave a rather complete analysis for optimal control problems governed by semilinear parabolic equations for the case where the temporal control cost is realized in the BV-seminorm.This leads to optimal controls that are piecewise constant in time.This simple structure of the optimal controls, which is confirmed analytically and numerically, is desirable from a practical point of view.It is distinctly different from optimal controls that arise from quadratic controlcost functionals.The obtained results can be expanded in several directions.For instance, it would be interesting to consider controls that are BV functions in space and time, or to use BV functionals in the context of switching controls.

4. 1 .
Discretization of the controls.Associated with the grid {t k } Nτ k=0 we define the subspace The elements y σ ∈ Y σ can be represented in the form (40) y σ = Nτ k=1 y k,h χ k = Nτ k=1 N h j=1 y kj χ k e j with {y k,h } Nτ k=1 ⊂ Y h and {y kj } 1≤k≤Nτ 1≤j≤N h ⊂ R.

7. 2 .
Example 2: Three controls and one spatial dimension.The second example generalizes the first one by allowing for m ∈ N controls rather than only one.

7. 3 .
Example 3: One control and two spatial dimensions.The first two examples are structurally similar to each other.In particular, in both examples the desired states y d have a rather low temporal regularity.Contrary to this, the third example is constructed in such a way that y d is C ∞ with respect to time and space.