Joint Scheduling and Coding For Low In-Order Delivery Delay Over Lossy Paths With Delayed Feedback

We consider the transmission of packets across a lossy end-to-end network path so as to achieve low in-order delivery delay. This can be formulated as a decision problem, namely deciding whether the next packet to send should be an information packet or a coded packet. Importantly, this decision is made based on delayed feedback from the receiver. While an exact solution to this decision problem is challenging, we exploit ideas from queueing theory to derive scheduling policies based on prediction of a receiver queue length that, while suboptimal, can be efficiently implemented and offer substantially better performance than state of the art approaches. We obtain a number of useful analytic bounds that help characterise design trade-offs and our analysis highlights that the use of prediction plays a key role in achieving good performance in the presence of significant feedback delay. Our approach readily generalises to networks of paths and we illustrate this by application to multipath transport scheduler design.


I. INTRODUCTION
In this paper we revisit the transmission of packets across a lossy end-to-end network path so as to achieve low inorder delivery delay. Consideration of end-to-end packet transmission is motivated by improving operation at the transport layer and with this in mind we also assume the availability of feedback from client to server. This feedback is delayed by the path propagation delay and, in contrast to the link layer, this feedback delay may be substantial. For example, on a 50Mbps path with 25ms RTT there are around 100 packets in flight and so the server only learns of the fate of a packet after a further 100 packets have been sent. In other words, the server has to make predictive decisions about what to transmit in those 100 packets, in particular whether they are information or redundant/coded packets. Information theory tells us that we do not need to make use of feedback in order to be capacity achieving in a packet erasure channel. However, it also tells us that feedback can be used to reduce in-order delivery delay, possibly very considerably [1]. More generally, there is a trade-off between rate and delay, and feedback can be used to modify this trade-off, and it is this which is of interest.
While much attention in 5G has been focused on the physical and link layers, it is increasingly being realised that a wider redesign of network protocols is also needed in order to meet 5G requirements. Transport protocols are of particular relevance for end-to-end performance, including end-to-end latency. For example, ETSI have recently set up a working group to study next generation protocols for 5G [2]. The requirement for major upgrades to current transport protocols is also reflected in initiatives such as Google QUIC [3] and the Open Fast Path Alliance [4] as well as by recent work such as [5]. In part, this reflects the fact that low delay is already coming to the fore in network services. For example, Amazon estimates that a 100ms increase in delay reduces its revenue by 1% [6], Google measured a 0.74% drop in web searches when delay was artificially increased by 400ms [7] while Bing saw a 1.2% reduction in per-user revenue when the service delay was increased by 500ms [8]. But the requirement for low latency also reflects the needs of next generation applications, such as augmented reality and the tactile Internet.
As we will describe in more detail shortly, by use of modern low-delay streaming code constructions, the task at the transport layer can be formulated as one of deciding whether the next packet to send should be an information packet or a coded packet, with this decision being made based on stale/delayed feedback from the receiver. The use of feedback in ARQ has of course been well studied, but primarily in the case of instantaneous feedback i.e. where there is no delay in the server receiving the feedback. When feedback is delayed the problem becomes significantly more challenging, and has received almost no attention in the literature (notable exceptions include [9], [10], [11]). While the decision task can be formulated as a dynamic programming problem, the complexity grows combinatorially with the delay 1 and so quickly becomes unmanageable for even quite small delays.
In particular, such solutions are unsuited to the real-time decision-making required within next generation networks.
In this paper we take a different approach and make use of a helpful connection between coding and queuing theory. We use this connection to derive scheduling policies based on the prediction of the receiver queue length that, while suboptimal, can be efficiently implemented and offer substantially better performance than state of the art solutions. This approach also allows us to obtain a number of useful analytic bounds that help characterise design trade-offs. Our analysis highlights that the use of prediction plays a key role in achieving good performance in the presence of significant feedback delay, and that it is prediction errors that drive the rate-delay trade-off. To the best of our knowledge this work is the first to make use of prediction with delayed feedback. Although our main focus is on single paths, our approach readily generalises to networks of paths and we illustrate this by application to multipath transport scheduler design.

II. RELATED WORK
The literature contains several different proposals for coding schemes that make use of feedback. For instance, Sundarajan et al. introduce in [12] a new linear coding scheme that includes feedback. They exploit it so that the encoder learns the packets that have been "seen" by the receivers, thus speeding the decoding process. A similar approach, considering wireless multicast communications, is described in [13], which proposes a joint coding/feedback scheme, scalable with respect to the number of receivers. The authors of [14] propose an extension of LT and Raptor Codes that adds information feedback, with the objective of reducing the coding overhead. Hagedorn et al. present in [15] a generalized LT coding scheme that relies on feedback information. Other interesting approaches include Hybrid ARQ [16], which combines a forward error correction scheme with automatic repeat-request. A recent work that promotes the use of Hybrid ARQ for low latency and ultra reliable applications is, for example, that from Cabrera et al [17].
However, most of the existing literature does not consider the impact of feedback delay. Under circumstance with no delayed feedback, it is well known that ARQ is optimal both in terms of capacity and delay [9]. However, when feedback is delayed the situation changes fundamentally, and the end-toend delay with ARQ can greatly increase. The use of coding schemes can reduce this end-to-end delay, even when the feedback delay is not small [9]. The importance of considering feedback is also considered by [10], where the authors studied how the performance of block-coding varies with and without feedback, especially when considering the impact of delayed feedback.
The analysis of coding schemes with delayed feedback remains largely open. In [11] the authors study the throughput and end-to-end delay of a variable-length block coding scheme, focusing on regimes where the feedback delay was shorter than the minimum block size. In addition, the authors focus on saturated network conditions, where the sender has an unlimited number of packets waiting to be sent. Fig. 1: Example of two codes with different throughputdelay characteristics. Shaded squares indicated coded packets, unshaded indicate information packets.

A. Low Delay Streaming Codes
We model an end-to-end network path as a packet erasure channel (packets carry a unique sequence number and a checksum thus losses can be detected). Most previous works on packet erasure channels have been based on use of block codes, whereby the sequence of information packets to be transmitted is partitioned into blocks of size k and n − k coded packets are appended to these to create a block of size n information plus coded packets, which implies a code with rate k/n, see Fig. 1a. As already noted, the requirement for low latency in next generation networks has led to renewed interest in whether alternative code constructions can yield a more favourable trade-off between throughput and in-order delivery delay. To see that this may indeed be the case let us consider, for example, a rate k n systematic block code and suppose that the code is an ideal one in the sense that receipt of any k of the n packets allows all of the k information packets to be reconstructed. Furthermore, assume that the first information packet is lost. All remaining information packets have to be buffered until the first coded packet is received. At this point, the first information packet can be reconstructed and all of the information packets can be delivered in-order. The in-order delivery delay is therefore proportional to k. Alternatively, suppose that the n− k coded packets are distributed uniformly among the information packets, rather than all being placed after the k information packets, see Fig. 1b. To keep the code causal, suppose that each coded packet only protects the preceding information packets in the block 2 . Assume again that the first information packet is lost. This loss can now be recovered on receipt of the first coded packet resulting in a delay that is now proportional to k n−k (i.e, this is much lower than k when n is large).
With the aim of obtaining an improved trade-off between rate and delay, [1] recently proposed an alternative code construction for packet erasure channels, referred to as a streaming code (a form of convolutional code). The code is constructed by interleaving information packets u j , j = 1, 2, . . . with coded packets c i , i = 1, 2, . . .. One coded packet is inserted after every l − 1 information packets and transmitted over the network path, resulting in a code of rate   Fig. 3: Example generator matrix for the low delay code with sliding window showing the coefficients used to produce each packet. In this example, we assume that the transmitter has obtained knowledge from the receiver by time 10 indicating that it has successfully received/decoded packets u 1 and u 2 allowing it to adjust the left-hand edge of the coding window to exclude them from packet c 2 . Image adapted from [1]. l−1 l . Fig. 2 illustrates this code construction. Coded packet c i can only recover an erasure of packets already transmitted and it is generated by taking random linear combinations of the previously transmitted information packets within the coding window {u L , . . . , u (l−1)i }, where L represents the first packet protected, the coding window could be reduced by setting L as the last packet acknowledged by the receiver. With the lefthand edge of the coding windows equals to 1 (L = 1) a coded packet is generated by: where each information packet u j is treated as a vector in F Q and each coefficient w ij ∈ F Q is chosen randomly from an i.i.d. uniform distribution, with F an appropriate choice of finite field, for instance GF (2 8 ).
Note that in practice the left-hand edge L of the coding window can made be larger than 1. In particular, suppose that the receiver has received or decoded all information packets up to and including packet u j . Feedback can be used to communicate this to the transmitter allowing it to use L = j + 1 for all subsequent coded packets. The generator matrix shown in Fig. 3 illustrates this sliding window approach, where the columns indicate the information packets that need to be sent and the rows indicate the composition of the packet transmitted at any given time.
The receiver decodes on-the-fly once enough packets/degrees of freedom have been received. In more detail, the receiver maintains a generator matrix G t at time t, which is similar to that shown in Fig. 3 except that it is composed only of the coefficients obtained from received packets. If G t is full rank, Gaussian elimination is used to recover from any packet erasures that may have occurred during transit. We will make the standing assumption that the field size Q is sufficiently large that with probability approaching one each coded packet helps the receiver recover from one information packet erasure i.e. each coded packet row added to generator matrix G t increases the rank of G t by one.
In summary, this streaming code construction generates coded packets that are (i) individually streamed between information packets (rather than being transmitted in groups of size n − k packets) and (ii) each coded packet protects all preceding information packets (rather than just the information packets within its block). See [1] for a detailed analysis of the throughput and delay performance of this code, but for a given code rate it is easy to see that this code construction tends to decrease the overall in-order delivery delay at the receiver compared to a block code, as illustrated in the example above.

B. Decision Problem
Our interest in the above streaming code construction is twofold. Firstly, for a given coding rate under a wide range of conditions it offers lower in-order deliver delay compared to standard block codes [1]. Thus it provides a useful starting point for developing methods for low delay transmission across lossy network paths. Secondly, it lends itself to being embedded within a clean decision problem. Namely, one where rather than transmitting coded packets periodically according to a predetermined schedule, at each transmission opportunity the transmitter dynamically decides whether to send an information packet or a coded packet based on feedback from the receiver 3 .
Formally, assume a time-slotted system where each slot corresponds to transmission of a packet. We have an arrival process consisting of a sequence of information packets {A k , k = 1, 2, . . . }, where A k ∈ {0, 1} is the number of new information packets in slot k, and defineā := lim k→∞ 1 k k i=1 A i as the average arrival rate. These information packets are buffered at the transmitter and then sent across a lossy path to a receiver. The queue occupancy Q t k at the transmitter 4 in slot k behaves according to: 3 Use of block codes leads to a significantly more complex decision problem. To see this observe that losing more than n−k packets within a block requires transmission of additional coded packets from that block in order to avoid a decoding failure. These are then received interleaved with later blocks. Thus we lose the renewal structure of open-loop block code constructions and the decision-maker needs to (i) keep track of multiple generations of interleaved blocks, each perhaps of a different size, and (ii) decide from which block to send a coded packet as well as deciding whether to send an information or coded packet. 4 Note that packets dequeued from Q t are held in an encoding buffer at the transmitter until the receiver has signalled that they have been successfully received and so the left-hand edge L in (1) can be updated, see earlier discussion.

Tx Rx
Feedback delay d a p Fig. 4: Schematic of the decision problem setup. Packets arrive at Tx with mean rateā, are transmitted from Tx to Rx and may be erased with probability p. Rx informs Tx of its state via feedback, which is delayed by d slots.
where S k ∈ {0, 1} is the number of information packets transmitted in slot k and Q t 1 = 0. We lets := lim k→∞ Define a random variable X k , which takes value 1 when a packet transmitted in slot k is erased and 0 otherwise. We will assume the sequence of random variables {X k } is i.i.d. X k ∼ X with Prob (X = 1) = p, and that when p = 0 then X k = 0 for all k (so as k → ∞ the occurrence of a non-zero but finite number of losses is excluded).
Received packets are buffered at the receiver until they can be delivered in-order to an application i.e. when an information packet is erased then subsequently arriving information packets are buffered until the lost packet can be recovered. A coded packet sent in slot k is built as the random linear combination of all information packets sent before slot k. In each slot k the receiver also sends feedback to the transmitter, informing of the packets already received as of slot k. This feedback arrives at the transmitter after delay d, in slot k+d. It is assumed, for simplicity, that none of these feedback packets are lost. Fig. 4 illustrates this problem setup. In each slot k the transmitter has the choice of (i) doing nothing, (ii) sending the information packet at the head of the transmitter queue, or (iii) sending a coded packet. Our task is to solve the transmitter decision problem while satisfying a number of constraints: both the transmitter and receiver queues are stabilized, the link capacity is respected, and the buffering delay at the receiver is kept small.

A. Introduction
When the feedback delay is zero then the decision problem in Fig. 4 is akin to ARQ, which of course has been well studied and for which fairly complete results are known. However, situations where the feedback delay is non-zero have received far less attention in the literature. In part this is because most work has focussed on the link layer where feedback delays are low, plus it is well known that open-loop block codes (which do not use feedback) are capacity achieving. And in part this is because of the complexity of the decision problem with delayed feedback, which grows combinatorially with the feedback delay. As already noted, next generation transport protocols seek to achieve low delay transmission over end-to-end paths. This means that they are required to operate with significant delays before feedback is received. This, together with our observation in Section III-B that the low delay streaming code construction in Section III-A lends itself to the use of feedback to make more refined decisions as to when to send coded packets, motivates revisiting the analysis and design of schedulers using delayed feedback.
A basic difficulty is that the complexity of deciding on an optimal packet schedule grows exponentially with the feedback delay. This means that optimal decision-making quickly becomes unmanageable for real-time operation. Ad hoc heuristic approaches are of course possible, but they typically remain difficult to analyze and come with few performance guarantees. To make progress we make use of the observation that the decoding process at the receiver can be modelled using a queueing approach. Namely, information packets arriving at the receiver are delivered in-order to an application until an information packet is lost, at which point subsequent information packets are buffered until the lost packet can be recovered. Each arriving coded packet can repair the loss of any one preceding information packet, with decoding taking place once the number of received coded packets matches the number of erased information packets. We thus define a virtual queue at the receiver, with occupancy Q r k , which behaves according to: where X k = 1 when packet k is erased (lost) and 0 otherwise, S k = 1 when an information packet is sent in slot k, C k = 1 if a coded packet is sent, while C k = S k = 0 when no transmission is made. The queue occupancy Q r k increases whenever an information packet is deleted and decreases if a coded packet is successfully received. Decoding events occur at slots k where Q r k = 0. While low queue occupancy is, by itself, no guarantee of low decoding delay, in practice it tends to encourage frequent emptying of the virtual queue and so short decoding delay.
Intuitively, the length of this virtual queue is correlated with the in-order delivery delay at the receiver -as Q r k grows the number of information packets buffered at the receiver will also tend to grow. The relationship is not one to one, and we explore it further in the next section, but as we will see it is sufficient to form the basis of simple yet effective scheduling policies. Importantly, by taking this approach we are able to obtain bounds on delay and rate which can be used for analysis and design.

B. Relating Delay and Queue Occupancy
We proceed by considering in more detail the relationship between end-to-end in-order delivery delay, the transmitter queue occupancy Q t k and the receiver virtual queue occupancy Q r k . First, observe that the end-to-end delay can be divided into: (i) the time between being enqueued at the sender and being first transmitted, D qt , and (ii) the time between being first transmitted and when the packet is successfully delivered to the application layer, D qr . We expect that D qt is related to Q t k and D qr with Q r k , and indeed this can be seen in Fig. 5. This figure plots the average of the delays, after repeating : Impact of queue lengths Q t k and Q r k on the average delay at the transmitter, D qt , and the receiver, D qr . In this experiment, erasure rate is p = 0.2, arrival rate is a = 0.7.
the experiment 100 times, D qt and D qr per packet Vs. the queue occupancies Q t k and Q r k over a path with erasure rate p = 0.2 and with coded packets sent periodically every p/(1 − p) information packets. Also indicated is the 95% confidence interval. The strong correlation between delay and queue occupancy is clearly evident. Further, it can be seen that the impact of the receiver queue occupancy D qr on delay is much larger than that of the transmitter queue D qt . This is perhaps to be expected, since a loss causes all subsequent information packets to be delayed at the receiver until the loss is repaired and decoding takes place (Q r k becomes zero), hence amplifying the effect of a non-zero queue occupancy Q r k on delay. Although the data in Fig. 5 is for a particular choice of loss and arrival rate it is representative of the behaviour seen for other choices.

C. Transmission Policies
Based on the insight provided by the above analysis we consider the following class of transmission policies: where function F (·) is a design parameter, which we will discuss in more detail shortly. Observe that selection of C k uses only information available at the sender at time k. Since S k = min{Q t k + A k , 1 − C k }, an information packet is transmitted when (i) 1 − C k = 1, and (ii) the transmission queue contains a packet to be sent. Furthermore, Q r k−d is only available at the sender after feedback delay d. We will focus on the estimator which simplifies toθ(Q r k−d ) = Q r k when the feedback delay d = 0. This estimator makes a d-step ahead prediction of the value of Q r k based on Q r k−d and the average path loss p. We will consider the impact of the accuracy of estimator predictions in more detail shortly. Other choices of estimator are of course possible, but (7) has the virtues of simplicity and tractability. This class of transmission policies includes ARQ and openloop FEC as special cases. Namely, when F (Q r k−d ,Q r k , Q t k ) = −Q r k and d = 0, then C k = 1 when Q r k > 0 i.e. a coded packet is sent whenever the receiver reassembly queue is non-empty. Since for code construction considered this coded packet will actually be an information packet, we have ARQ. Similarly, we recover the open-loop FEC in [1], whereby a coded packet is sent every p/(1−p) information packets. To see this, observe Recall from Section IV-B that the delay is much more strongly affected by the receiver queue occupancy Q r than by the transmitter queue occupancy Q t . With this in mind, Fig. 6 compares the end-to-end system delay for different transmission policies. First, we take ARQ as a baseline scheme, where ρ is a configuration parameter that modulates the weight given to the transmission queue length. As can be seen, the more weight that is given to Q t (higher ρ), the longer the end-toend system delay. This suggests that we should favour policies P such that: where γ ≥ 0 is a design parameter. Observe that this class of policies corresponds to a threshold rule, namely C k = 1 whenQ r k − γ > 0 and C k = 0 otherwise. As noted above, when d = 0 and γ = 1 this transmission policy reduces to ARQ, while when d → ∞ then it reduces to open-loop FEC. That is, in these two boundary cases this transmission policy reverts to the state of the art.

D. Estimator Accuracy
Before proceeding to analyse transmission policy P we first derive some bounds on the accuracy of estimator (7) that will prove useful later.
The following lemma is a restatement of [18, Proposition 3.1.2], Hence, We can obtain sharper bounds on k−1 j=k−d (X j − p) by taking more advantage of the fact that X j is a random variable. For example, when losses are i.i.d then the {X j } are also i.i.d. and we can use Hoeffding's inequality [19] applied to Bernoulli random variables to obtain where ǫ > 0. It follows immediately that Prob(|Q r k −Q r k | ≥ δ) ≤ 2e −2(δ/2dp) 2 d and so with probability at least q. The bound (14) is generally substantially sharper than bound (12), as can be seen in Fig.  7.

E. Bounding Virtual Receiver Queue
Armed with these bounds on estimator accuracy we are now in a position to bound the receiver queue occupancy (recall that we have already seen that the end-to-end delay mostly depends on the receiver queue occupancy). The following establishes that for the class of policies P with estimator (7) Fig. 7: Comparing the bounds on |Q r k −Q r k | obtaining using worst-case analysis (12) and using Hoeffding's inequality (14) (with q = 0.9). Loss rate p = 0.6. Theorem 1. Consider transmission policy P using estimator (7). Suppose estimateQ r k satisfies Q r k −Q r k ≤ δ, k = 1, 2, . . . . When 0 < p < 1 then Q r k converges almost surely to the when Q r k−d ≥ 1. Conversely, when Q r k−d < 1 then Q r k−d = 0 since it is non-negative and integer valued.
We proceed by considering the following two cases. Case (i): −γ +Q r k ≥ 1. Since −γ +Q r k > 0 then C k = 1, S k = 0. When Q r k−d ≥ 1 then (16) applies and since −γ Observe that ∆ 1 k is strictly less than zero when X k−d = 0 and ∆ 2 k is strictly less than zero when X k−d = 0 and C k−d = 0. By assumption 0 < p = Prob(X k = 1) < 1 and X k−d , ∀k are independent ofQ r k . Hence if −γ +Q r k ≥ 1 persists then, with probability one, a slot will occur where X k−d = 0 and so ∆ 1 k < 0. Further, when −γ +Q r k ≥ 1 and Q r k−d < 1 then it follows that C j = 0 for at least ⌈d(1 − p)⌉ of the slots in the sum. Therefore, regardless of (S k−d + C k−d ), with positive probability over any d slots a slot will occur where X k−d = 0, C k−d = 0 and ∆ 2 k < 0. Case (ii) −γ +Q r k ≤ 1. We now have two subcases to consider: (a) When 0 ≤ −γ +Q r k ≤ 1, then C k = 1 and S k = 0. By update (16) Hence,Q r k+1 ≤Q r k and, therefore, −γ +Q r k+1 ≤ 1 (b) When −γ +Q r k ≤ 0 then S k ∈ 0, 1 and C k = 0. By update (16) We have that −γ +Q r k+1 never increases and strictly decreases with positive probability when −γ +Q r k > 1. And when −γ +Q r k ≤ 1 then −γ +Q r k+1 never goes above 1. Hence, we can conclude thatQ r k converges almost surely and that it is indeed upper bounded by −γ +Q r k+1 ≤ 1 i.e.Q r k+1 ≤ γ + 1. Since Q r k −Q r k ≤ δ it follows that Q r k ≤Q r k + δ ≤ γ + 1 + δ, and the stated interval now follows from the fact that Q r k ≥ 0.
Importantly, observe that the bound in Theorem 1 is in terms of the instantaneous queue length Q r k and applies to every sample path. It is therefore much stronger than a bound on the average queue length. One immediate consequence of this, for example, is that the requirement that the estimator is accurate in the sense that Q r k −Q r k ≤ δ can be relaxed to one that this only holds with a given probability q. The bound in the Theorem 1 then applies to those sample paths for which the estimator is sufficiently accurate i.e. also applies with probability q. For example, using this observation we can immediately use the Hoeffding's bound on estimator accuracy (14) to select a value for δ.
From Theorem 1 it can be seen that the maximum queue length Q r k , and so delay, tends to increase with design parameter γ. Hence, to minimise delay we should choose parameter γ small. This is also confirmed by simulation, e.g. see Figure  8 which plots delay vs traffic load for various values of γ. Bound (14) tells us that the maximum queue length also tends to increase with the feedback delay d and with loss rate p, although no more than linearly in both.

F. Impact of Imperfect Prediction: Rate Sub-Optimality
Transmission policies P use estimatorθ(·) to make a d-step ahead prediction of Q r k . As discussed in more detail later, use of prediction lowers delay. However, inevitably, this estimator will make mistakes when predicting Q r k due to the uncertainty in the fate of the packets "in flight", i.e. those transmitted but not yet acknowledged. WhenQ r k ≥ γ and Q r k = 0 then the scheduler will send extra coded packets that are not useful (since Q r k = 0 there are no outstanding losses at the receiver). Prediction errors therefore translate into a loss in capacity, since these extra coded packets replace information packets that would have otherwise have been sent.

1) Capacity Achieving Transmission Policies:
We begin introducing the following technical Lemma, when Q r k−d ≥ 1 and (17), for all values of Q r k−d and taking expectations with respect to the packet arrival and loss processes, where we have used the fact that X k−d is independent of S k−d and C k−d . We proceed by considering the following two cases.
Case (i): −γ +Q r k ≤ 0. Then C k = 0, and S k ∈ 0, 1: Applying this recursively we have, Combining this with (30) and taking expectation over the loss and arrival processes yields where we have used the fact that X i is independent of Q r i . By Lemma 2 and the fact that and the claimed result now follows by rearranging and using the fact that η → 0 as k → ∞.
Theorem 2 says that as parameter γ → ∞ the transmission slots E[Ŝ k ] available for sending information packets tends to the path capacity 1 − p. That is, the transmission policy is capacity achieving as γ → ∞. The requirement that E[S k |Q r k ] ≥ ǫ > 0 whenQ r k ≤ γ excludes transient arrival processes (for instance when packets arrive for a period of time and then no further arrivals happen) and is satisfied when, for example, the packet arrival process is ergodic and independent of the receiver queue.
2) Estimating Rate Sub-Optimality For Small γ: Lemma 2 tells us that for γ large enough our scheduler is achieving, even where the feedback delay, d, is greater than zero. However, this is not as comforting as it might seem at first sight since Theorem 1 also tells us that large γ can lead to a large receiver queue and so large decoding delays. By taking a different analysis approach, however, we can obtain fairly good estimates of the capacity loss induced by prediction errors when γ is small. These estimates indicate that the capacity loss is moderate.
Recall that under transmission policy P , whenQ r k > γ then a coded packet is transmitted. However, if Q r k = 0 then this coded packet is not useful, since there is no outstanding receiver queue i.e. no outstanding packet loss that bound benefit from the coded packet. Defining random variable R k which takes value 1 whenQ r k > γ and Q r k = 0 and 0 otherwise thenr = 1 K lim K→∞ K k=1 R k is the transmission rate for redundant coded packets. We would like to estimater.
To proceed we make the following simplifying assumptions: (i) feedback is only received every d slots, (ii) either an information packet or a coded packet is transmitted in every slot, (iii) γ = 0, and (iv) Q r k = 0 for k = id + 1, i = 0, 1, . . . . The assumptions mean that we have less knowledge of the decoder status and so expect the number of redundant packets transmitted to be larger i.e. we expect our estimate of r to be larger than the true value. Also, assumption (iv) implies that after d slots we By assumption (i), at slots k = id + 1, i = 0, 1, . . . we perform updatê where in step (a) we have used assumption (ii) that S j + C j = 1. By assumptions (iii) and (iv), over the next d slots {k + 1, . . . , k + d} then at most ⌈dp⌉ coded packets will be sent (fewer packets may be sent depending on the sample path S j , C j , j ∈ {k − d, . . . , k} and when the threshold γ > 0).
Despite the assumptions made in deriving (35), empirical tests indicate that the estimator is nevertheless quite accurate. For example, Fig. 9 compares estimater with the measured average number of redundant packets transmittedr as the feedback delay and loss rate p is varied. It can be seen that r is essentially an upper bound onr, and while it becomes less accurate as the loss rate p increases it stays reasonably close to the true value. Observe also that in Fig 9c the mean rate a = 0.6 of packet arrivals is significantly less than the path capacity 1 − p = 0.9 and so assumption (ii) (persistent queue backlog at the transmitter) is violated, nevertheless the estimater remains accurate.
These results in Fig. 9 indicate that the capacity loss due to the transmission of redundant packets generally stays below 5%. However, observe also that when the delay d is less than the reciprocal of the loss rate 1/p then no redundant packets are sent i.e.r = 0. This behaviour is accurately captured bŷ r (since ⌈dp⌉ = 1 when d < 1/p in (35)). Hence, on links with lower loss larger feedback delays can be tolerated without incurring redundant packet transmissions e.g. for p = 0.01 (a typical path loss rate in the Internet) feedback delays of up to 100 slots yieldr = 0.

G. Performance Evaluation
We conclude this section by comparing the performance of the transmission policies introduced here with state-of-theart alternatives, namely (i) ARQ, (ii) random linear block codes and (iii) the low delay code construction from [1]. Note that the latter two are open-loop approaches, i.e. do not make use of feedback to trigger transmission of extra packets. Fig. 10 plots the measured end-to-end delay Vs. the feedback delay for these schemes. As expected, for the block code and low delay coding schemes the end-to-end delay is constant, and does not vary with the feedback delay, also the end-toend delay with the low delay code construction is around half of that for the block code (which is consistent with the results reported in [1]). It can be seen that with ARQ the end-to-end delay increases linearly with the feedback delay d, and for delays greater than 40 slots the end-to-end delay with ARQ is larger than with any of the other approaches. However, for lower feedback delays the end-to-end delay with ARQ is lower than for the two open-loop coding schemes. The class of transmission policies introduced here provides a balance between ARQ and the low delay code construction. Namely, when the feedback delay is low its end-to-end delay performance is similar to that of ARQ (which is known to be delay optimal when the feedback delay is zero) and as the feedback delay becomes large its end-to-end performance is similar to that of the low delay code construction. For intermediate values of feedback delay, the proposed class of transmission policies offers lower end-to-end delay than any of the competing approaches.
As noted above, the use of prediction introduces a tradeoff between delay and rate, since prediction errors lead to transmission of redundant coded packets. In comparison ARQ, which is purely reactive and involves no prediction, is capacity achieving but at the cost of increased end-to-end delay compared to when prediction is used (see Fig. 10). The trade-off between delay and rate seems like a fundamental one since predictions allow lower delay to be achieved, but prediction errors are inevitable when losses are stochastic. Fig. 11 explores this trade-off in more detail. For a fixed packet loss rate p this figure plots the achieved transmission rates of information packets as the arrival rateā approaches capacity, times and the figure also shows the 95% confidence intervals namelyā = 1 − p − ǫ where ǫ is indicated on the x-axis of the plot. It can be seen that when ǫ ≫ 0, the achieved transmission rate equals the arrival rate for all four schemes. However, as the arrival rate approaches capacity (ǫ → 0) the achieved transmission rate falls below the arrival rate for all schemes apart from ARQ. Since we use a fixed block size, the block code is not capacity achieving and this behaviour is to be expected. Similarly, the low delay code construction incurs an overhead at the end of a connection. Interestingly, observe that the achieved transmission rate with the transmission policy introduced here is higher than either of these schemes, that is the loss in capacity due to redundant coded transmissions is lower.

V. GENERALISING TO NETWORKS OF FLOWS
In this section we first make the observation that the policy P introduced in Section IV-C can also be seen as an approximate dual-subgradient update for an associated convex optimisation problem. This connection allows us to exploit convex optimisation results to extend policy P to networks with multiple flows sharing multiple lossy paths. We illustrate this using a simple multipath example.

A. Relating Policy P with Convex Optimization
Assuming, for simplicity, that the arrival queue Q t is persistently backlogged, then transmission policy P is, This is a natural threshold-based policy, namely a coded packet is sent whenever the virtual receiver queue is larger than γ, combined with use of predictorQ r k to mitigate the impact of the feedback delay d. where it will be helpful to think of s as the average transmission rate of information packets. Letting c = 1 − s (which can be thought of as the average transmission rate of coded packets) the constraint s ≤ 1−p ensures that sp ≤ c(1−p), so enough coded packets are sent to recover from packet losses. This optimisation has the trivial solution s = 1 − p, but this is not where our interest lies. Rather, we focus on the relationship between this optimisation and policy P . Optimisation C is convex and, provided 0 ≤ p < 1, then the interior of the feasible set is non-empty, i.e. the Slater condition is satisfied and so strong duality holds. The Lagrangian is L(s, λ) := −s + λ(s − (1 − p)) and the standard dual subgradient update for solving the optimisation is: where step size 1/γ > 0 and equality (a) follows by dropping the terms in L(s, λ) that do not depend on s (and so do not affect the s that minimises L(s, λ)), and noting that the solution must lie at an extreme point, i.e. 0 or 1.
Defining q k = γλ k , then this dual subgradient update can be rewritten equivalently as: The similarity of this update with transmission policy P is immediately apparent, including the fact that s k and c k are {0, 1} valued. However, it can be seen that there are also some important differences: (i) the average loss rate p is used, rather than the loss rate process {X k }; (ii) scaled multiplier q k is a real-valued quantity, whereas packet queue Q k is integer valued; (iii) feedback delay d is ignored. Nevertheless, despite these differences, recent results on approximate convex optimisation in [20] can be used to establish a strong connection between the update generated by policy P and the optimal solution to problem C.
Letting ǫ k = (Q r k − Q r k )/γ and δ k = (S k · X k − C k (1 − X k ))−(S k p−C k (1−p)) = X k −p, we can write transmission policy P equivalently as: We now recall the following, which corresponds to [20, Theorem 1], Theorem 3. Consider the convex optimisation: min x∈X f (x) s.t. g(x) + δ ≤ 0, j = 1, . . . , m where f : X → R, g : X → R m are convex functions, X is a bounded convex subset of R n and δ ∈ R m . Let dual function h(λ, δ) := inf x∈X f (x) + λ T (g(x) + δ) and consider the update where µ k = λ k + ǫ k with λ 1 ∈ R m + and {ǫ k } a sequence of points from R m such that µ k 0 for all k. Suppose the Slater condition is satisfied and that δ k is an ergodic stochastic process with expected value δ and E( δ k −δ 2 2 ) = σ 2 δ for some finite σ 2 δ . Further, suppose that 1 k k i=1 ǫ i 2 ≤ ǫ for all k and some ǫ ≥ 0. Then, δ . Applying this to optimisation C and identifying (47)-(50) as the perturbed update (51), we obtain the following: Proof. To apply Theorem 3 we need to show for ǫ k = (Q r k − Q r k )/γ and δ k = X k − p that: (i) 1 k k i=1 ǫ i 2 ≤ ǫ; and (ii) δ k has finite variance σ 2 δ ≤ 1. Observing that γ|Q r k − Q r k | ≤ γd, then (i) follows immediately with ǫ = d/γ. Since X k ∈ {0, 1} then |δ k | ≤ 1 and it follows immediately that δ k has finite variance σ 2 δ ≤ 1. Observe that the bound on E[S k ] in Lemma 3 is not particularly useful when γ is small, nor the bound on We nonetheless previously showed that much tighter bounds can be derived for policy P . On the other hand, the approach used to derive Lemma 3 is indeed rather interesting, because it can be used to show that transmission policy P can be embedded as a building block within solution updates for general convex optimisation problems, and not only will behave sensibly, but can be analysed via Theorem 3 and related results from the area of approximate convex optimisation. We illustrate this in more detail in the next section using multipath communications as an illustrative example.

B. Example: Multipath Transmission
Consider the following lossy network multi-commodity flow setup, which is a variant of the setup in [21]. Let G = (V, E) denote a graph with vertices V and edges E ⊂ V × V . Time is slotted, edges have unit capacity and p e denotes the packet loss rate on edge e, with losses being i.i.d. across slots. The network carries a set F ⊂ V × V of flows, with flow f = (s, d) ∈ F having source/transmitter s and destination/receiver d. Each source s has a single destination 5 d, but multiple paths may be used to transmit packets from each source to the corresponding destination.
Let P f ⊂ 2 E denote the set of usable paths between source s and destination d. In general P f will be a subset of all possible paths from s to d, determined by delay requirements, routing protocols etc. We assume paths in P f have no loops and that the time taken to send a packet from source s to destination d is the same 6 for all paths in P f . Let g i,e denote the number of hops along path i between the source and edge e, with g i,e = ∞ for edges not on path i. Associate with path i ∈ P f the vector a f,i ∈ R r k+1 ∈ arg min where Q f,k = λ f,k /α and α > 0 is a design parameter. Since s f,k+1 is the solution of a linear programme its value lies at an extreme point and so the updates can be equivalently replaced by s f,k+1 ∈ arg min s∈{0, i∈P f r f,i } (−1/α+Q f,k )s. Similarly, since r k+1 is the solution of a linear programme is an extreme point of set {r ∈ R n + : Ar ≤ 1}. When A is unimodular then the extreme points (and so r k+1 ) are integervalued. More generally, we can always use randomised timesharing to select a vector R k+1 with {0, 1} valued elements such that E[R k+1 ] = r k+1 .
Replacing Q f,k with virtual receiver queue Q r f,k yields the update: i.d random variables with E[X i,k ] = p i and with X i,j taking value 1 when a packet is erased on path i in slot k and 0 otherwise. By Theorem 3 this update converges to a ball around the solution of optimisation (53)-(54), with the size of the ball decreasing as scaling/step-size parameter α is decreased.
We can map this update onto the following physical setup. S f,k is the number of information packets from flow f to be transmitted in slot k. Note thatŜ f,k+1 might take a value greater than 1, if multiple link transmit slots are available to flow f . Selecting R f,i,k = 1 corresponds to allocating transmission slot k on link i to a packet from flow f . When S f,k = 0 (there is not an information packet to be sent) a coded packet is transmitted, C f,i,k = 1). The occupancy of Virtual receiver queue Q r f,k+1 increases when an information packet is lost, and decreases upon receiving a coded packet. S f,k+1 is selected according to a threshold rule, namely nonzero when Q r f,k < 1/α and a transmission slot is available (i.e. when i∈P f R f,i > 0). When there is feedback delay we can replace Q r f,k in this threshold rule with predictionQ r f,k . We illustrate the application of this update to the simple multipath topology shown in Fig. 12 which has three paths between source s and destination d. These paths are shared by three flows. In this case update (59)-(60) simplifies to element slots and the figure also shows the 95% confidence intervals f of R i,k+1 taking value 1 (corresponding to transmitting a packet from flow f on link i in slot k + 1) when the receiver backlog Q r f,k (1 − p i ) for flow f is the largest one amongst the three flows. Fig. 13 shows the performance obtained as we vary the packet loss rate from 0.0 to 0.4 over the first path, and we keep the other two with a fixed erasure probability of 0.1. Note that flows sharing the same path see the same loss probability. Fig. 13a shows the individual rates for each flow and it can be seen, the available capacity is equally shared between the flows. Fig. 13b shows the aggregate rate, which is obtained by summing the individual flow throughputs. It can be seen that the rate of the proposed multipath scheduler almost reaches the system capacity.
We now compare the application end-to-end delay of our proposed scheme, with that exhibited by a legacy solution (ARQ scheme) when the feedback delay increases. The ob- , as was done for the single link scenario. We still use three links and three different paths (m = n = 3), but now fix the packet loss rate to be 0.2 over all paths (recall that all flows are equally affected by such erasures). We also assume an arrival rate of a = 3 4 , again for the three flows. The traditional ARQ scheme assumes that the scheduler uses a Round-Robin approach to distribute flow transmissions across the paths. It can be seen that the behaviour is much the same as that observed over a single path and, in particular, that as the feedback delay increases, the proposed scheme clearly outperforms ARQ, yielding much lower delays.

VI. CONCLUSIONS
In this paper we have proposed a joint coding/scheduling scheme to be used over packet erasure paths. We have posed an optimization problem, which is solved by means of discrete decisions, and the source node can decide: to send a native packet, to transmit a coded packet, or to do nothing. We have shown that this discrete decision method yields an optimum behavior, ensuring in addition system stability. We have assessed the validity of the model, by means of an extensive simulation-based analysis, in which we have considered the impact of having delayed feedback.
Knowing the status of the decoder after some delay has been usually overlooked when studying the performance of coding solutions. For ideal feedback channels it is well known that ARQ yields the best performance. However, under realistic situations, the obtained results have shown that the joint coder/scheduler clearly outperforms legacy solutions. The proposed approach shows the same throughput as the one seen for the ARQ case, while it does not increase the end-to-end delay.
We have also proposed some practical bounds for the corresponding queue lengths, which were afterwards used to analyze the overhead caused by the transmission of unneeded (dummy) packets. The simulation results show that they are indeed rather tight, and that the proposed predictor for the queue occupancy behaves quite accurately. Hence, they can be exploited to take better coding/scheduling decisions in different setups. Last, we have also studied the proposed model over a multi-path communication scenario, where it again outperforms a legacy solution based on ARQ.