Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Improving the performance of heterogeneous data centers through redundancy

Improving the performance of heterogeneous data centers through redundancy We analyze the performance of redundancy in a multi-type job and multi-type server sys- tem. We assume the job dispatcher is unaware of the servers’ capacities, and we set out to study under which circumstances redundancy improves the performance. With redundancy an arriving job dispatches redundant copies to all its compatible servers, and departs as soon as one of its copies completes service. As a benchmark comparison, we take the non-redundant system in which a job arrival is routed to only one randomly selected compatible server. Ser- vice times are generally distributed and all copies of a job are identical, i.e., have the same service requirement. In our first main result, we characterize the sufficient and necessary stability conditions of the redundancy system. This condition coincides with that of a system where each job type only dispatches copies into its least-loaded servers, and those copies need to be fully served. In our second result, we compare the stability regions of the system under redundancy to that of no redundancy. We show that if the server’s capacities are sufficiently heterogeneous, the stability region under redundancy can be much larger than that without redundancy. We apply the general solution to particular classes of systems, including redundancy-d and nested models, to derive simple conditions on the degree of heterogeneity required for redundancy to improve the stability. As such, our result is the first in showing that redundancy can improve the stability and hence performance of a system when copies are non-i.i.d.. Key words: redundancy models; load balancing; stochastic stability; processor sharing. 1 Introduction The main motivation of studying redundancy models comes from the fact that both empirical ([1, 2, 9, 30]) and theoretical ([12, 14, 19, 22, 23, 29]) evidence show that redundancy might improve the performance of real-world applications. Under redundancy, a job that arrives to the system dispatches multiple copies into the servers, and departs when a first copy completes service. By allowing for redundant copies, the aim is to minimize the latency of the system by exploiting the variability in the queue lengths and the capacity of the different servers. Most of the theoretical results on redundancy systems consider the performance analysis when either FCFS or Processor-Sharing (PS) service policies are implemented in the servers. Under the arXiv:2003.01394v2 [cs.NI] 15 Dec 2020 assumption that all the copies of a job are i.i.d. (independent and identically distributed) and ex- ponentially distributed, [3, 5, 14] show that the stability condition of the system is independent of the number of redundant copies and that performance (in terms of delay and number of jobs in the system) improves as the number of copies increases. However, [12] showed that the assumption that copies of a job are i.i.d. can be unrealistic, and that it might lead to theoretical results that do not reflect the results of replication schemes in real-life computer systems. The latter has triggered interest to consider other modeling assumptions for the correlation structure of the copies of a job. For example, for identical copies (all the copies of a job have the same size), [3] showed that under both FCFS and PS service policies, the stability region of the system with homogeneous servers decreases as the number of copies increases. The above observation provides the motivation for our study: to understand when redundancy is beneficial. In order to do so, we analyze a general multi-type job and multi-type server system. A dispatcher needs to decide to which server(s) to route each incoming job. We assume that there is no signaling between the dispatcher and the servers, that is, the dispatcher is oblivious to the capacities of the servers and unaware of the states of the queues. The latter can be motivated by (i) design constraints, (ii) (slowly) fluctuating capacity of a server due to external users, or (iii) the impossibility of exchanging information among dispatchers and servers. The only information that is available to the dispatcher is the type of job and its set of compatible servers. However, we do allow signaling between/among servers, which is needed in order to cancel the copies in redundancy schemes. In the mathematical analysis we consider two different models: the redundancy model where the dispatcher sends a copy to all the compatible servers of the job type, and the Bernoulli model where a single copy is send to a uniformly selected compatible server of the job type. From a dis- patchers viewpoint, the comparison between these two policies is reasonable under the assumption that the dispatcher only knows the type of the job and the set of its compatible servers. Hence, we do not compare analytically the performance of redundancy with other routing policies – such as Join the Shortest Queue, Join the Idle Server, Power of d, etc. – that have more information on the state of the system. We hence aim to understand when having redundant copies is beneficial for the performance of the system in this context. Observe that the answer is not clear upfront as adding redundant copies has two opposite effects: on the one hand, redundancy helps exploiting the variability across servers’ capacities, but on the other hand, it induces a waste of resources as servers work on copies that do not end up being completely served. To answer the above question, we analyze the stability of an arbitrary multi-type job and multi- type server system with redundancy. Job service requirements are generally distributed, and copies are identical. The scheduling discipline implemented by servers is PS, which is a common policy in server farms and web servers, see for example [16, Chapter 24]. In our main result, we derive sufficient and necessary stability conditions for the redundancy system. This general result allows us to characterize when redundancy can increase the stability region with respect to Bernoulli routing. To the best of our knowledge, our analytical results are the first showing that, when copies are non-i.i.d., adding redundancy to the system can be beneficial from the stability point of view. We believe that our result can motivate further research in order to thoroughly understand when redundancy is beneficial in other settings. For example, for different scheduling disciplines, dif- ferent correlation structures among copies, different redundancy schemes, etc. In Section 8 we investigate through numerics some of these issues, namely, the performance of redundancy when the scheduling discipline is FCFS and Random Order of Service (ROS), and the performance gap between redundancy and a variant of Join the Shortest Queue policy according to which each job is dispatched to the compatible server that has the least number of jobs. We briefly summarize the main findings of the paper: 2 • The characterization of sufficient and necessary stability conditions of any general redun- dancy system with heterogeneous server capacities and arrivals, under mild assumptions on the service time distribution. • We prove that when servers are heterogeneous enough (conditions stated in Section 6), redundancy has a larger stability region than Bernoulli. • By exploring numerically these conditions, we observe that the degree of heterogeneity needed in the servers for redundancy to be better, decreases in the number of servers, and increases in the number of redundant copies. The rest of the paper is organized as follows. In Section 2 we discuss related work. Section 3 describes the model, and introduces the notion of capacity-to-fraction-of-arrivals ratio that plays a key role in the stability result. Section 4 gives an illustrative example in order to obtain intuition about the structure of the stability conditions. Section 5 states the stability condition for the re- dundancy model. Section 6 provides conditions on the heterogeneity of the system under which redundancy outperforms Bernoulli. The proof of the main result is given in Section 7. Simulations are given in Section 8, and concluding remarks are given in Section 9. For the sake of readability, proofs are deferred to the Appendix. 2 Related work When copies of a job are i.i.d. and exponentially distributed, [5, 14] have shown that redundancy with FCFS employed in the servers does not reduce the stability region of the system. In this case, the stability condition is that for any subset of job types, the sum of the arrival rates must be smaller than the sum of service rates associated with these job types. In [27], the authors consider i.i.d. copies with highly variable service time distributions. They focus on redundancy-d systems where each job chooses a subset of d homogeneous servers uniformly at random. The authors show that with FCFS, the stability region increases (without bound) in both the number of copies, d, and in the parameter that describes the variability in service times. In [20], the authors investigate when it is optimal to replicate a job. They show that for so- called New-Worse-Than-Used service time distributions, the best policy is to replicate as much as possible. In [13], the authors investigate the impact that scheduling policies have on the performance of so-called nested redundancy systems with i.i.d. copies. The authors show that when FCFS is implemented, the performance might not improve as the number of redundant copies increases, while under other policies proposed in the paper, such as Least-redundant-first or Primaries-first, the performance improves as the number of copies increases. Anton et al. [3] study the stability conditions when the scheduling policies PS, Random Order of Service (ROS) or FCFS are implemented. For the redundancy-d model with homogeneous server capacities and i.i.d. copies, they show that the stability region is not reduced if either PS or Random Order of Service (ROS) is implemented. When instead copies belonging to one job are identical, [3] showed that (i) ROS does not reduce the stability region, (ii) FCFS reduces the stability region and (iii) PS dramatically reduces the stability region, and this coincides with the stability region of a system where all copies need to be fully served, i.e.,  < . In [28], the authors show that the stability result for PS in a homogeneous redundancy-d system with identical copies extends to generally distributed service times. In the present paper, we extend [3, 28] by characterizing the stability condition under PS with identical copies to the general setting of heterogeneous servers, generally distributed service times, and arbitrary redundancy structures. Hellemans et al. [18] consider identical copies that are generally distributed. For a redundancy- d model with FCFS, they develop a numerical method to compute the workload and response time 3 Table 1: The stability condition of redundancy models under different modeling assumptions. In bold square, the modeling assumptions we consider for the present paper. Service time Homogeneous servers Heterogeneous servers distribution i.i.d. copies identical copies i.i.d. copies identical copies Exponential General red., [14] Redundancy-d, [3] General red.,[14] FCFS Redundancy-d, [27] Scaled Bernoulli (Asymptotic regime) Exponential Redundancy-d, [3] Redundancy-d, [3] PS Redundancy-d, [28] Redundancy-d, [28] General red. General (Necessary condition) (Light-tailed) Exponential Redundancy-d, [3] Redundancy-d, [3] ROS distribution when the number of servers tends to infinity, i.e., the mean-field regime. The authors can numerically infer whether the system is stable, but do not provide any characterization of the stability region. In a recent paper, Hellemans et al. [17] extend this study to include many replication policies, and general correlation structure among the copies. Gardner at al. [12] introduce a new dependency structure among the copies of a job, the S&X model. The service time of each copy of a job is decoupled into two components: one related to the inherent job size of the task, that is identical for all the copies of a job, and the other one related to the server’s slowdown, which is independent among all copies. The paper proposes and analyzes the redundant-to-idle-queue scheme with homogeneous servers, and proves that it is stable, and performs well. In Table 1 we summarize the stability results presented above, organized by service policy, service time distribution, servers’ capacities and redundancy correlation structure. In brackets we specify the additional assumptions that the authors considered in their respective paper. In the bold square, we outline the modeling assumptions we consider for the present paper. To the best of our knowledge, no analytical results were obtained so far for performance measures when PS is implemented, servers are heterogeneous and copies are identical or of any other non i.i.d. structure. 3 Model description We consider a K parallel-server system with heterogeneous capacities  , for k = 1; : : : ; K . Each server has its own queue, where Processor Sharing (PS) service policy is implemented. We denote by S = f1; : : : ; Kg the set of all servers. Jobs arrive to the system according to a Poison process of rate . Each job is labelled with a type c that represents the subset of compatible servers to which type-c jobs can be sent: i.e., c = fs ; : : : s g, where n  K , s ; : : : ; s 2 S and s 6= s , for all i 6= l. A job is with 1 n 1 n i probability p of type c, where p = 1. We denote by C the set of all types in the system, c c c2C i.e, C = fc 2 P (S) : p > 0g, where P (S) contains all the possible subsets of S. Furthermore, we denote byC(s) the subset of types that have server s as compatible server, that is,C(s) = fc 2 C : s 2 cg. For instance, the N -model is a two-server system with jobs of types c = f2g and c = f1; 2g, see Figure 1 b). Thus, C = ff2g;f1; 2gg, C(1) = ff1; 2gg and C(2) = ff2g;f1; 2gg, with p ; p > 0. f2g f1;2g Job sizes are distributed according to a general random variable X with cumulative distribution function F and unit mean. Additionally, we assume that 1. F has no atoms. 4 2. F is a light tailed distribution in the following sense, lim supE[(X a)1 jX > a] = 0: (1) fXa>rg r!1 a0 Remark 1. These technical conditions have been used previously in the literature to prove stochas- tic stability from fluid limits arguments (see [24] and [26]) in the context of processor sharing net- works and cannot be avoided easily. However, it can be seen (as observed in [26]) that Equation (1) also implies supE[(X a)jX > a]   < 1; (2) a0 which is a usual light tail condition (see [11]). Hence, Equations (1) and (2) though exclude heavy tail distributions like Pareto, include large sets of distributions as phase type (which are dense in the set of all distributions on R ), distributions with bounded support, exponential and hyper-exponential distributions. We consider two load balancing policies, which determine how the jobs are dispatched to the servers. Note that both load balancers are oblivious to the capacities of the servers. • Bernoulli routing: a type-c job is send with uniform probability to one of its compatible servers in c. • Redundancy model: a type-c job sends identical copies to its jcj compatible servers. That is, all the copies of a job have exactly the same size. The job (and corresponding copies) departs the system when one of its copies completes service. In this paper, we will study the stability condition under both load balancing policies. We call the system stable when the underlying process is positive Harris recurrent, and unstable when the process is transient. A stochastic process is positive Harris recurrent if there exists a petite-set C for which P ( < 1) = 1 where  is the stopping time of C , see e.g., [4, 6, 25] for the C C corresponding definitions. We note that when the state descriptor is Markovian, positive Harris recurrent is equivalent to positive recurrent. R R We define  as the value of  such that the redundancy model is stable if  <  and unstable R B if  >  . Similarly, we define  for the Bernoulli routing system. We aim to characterize R B when  >  , that is, when does redundancy improve the stability condition compared to no redundancy. For Bernoulli,  can be easily found. Under Bernoulli routing, a job chooses a server uni- formly at random, hence, type-c jobs arrive at server s at rate p =jcj. Thus, the Bernoulli system reduces to K independent servers, where server s receives arrivals at rate ( ) and has c2C(s) jcj a departure rate  , for all s 2 S. The stability condition is hence, ( ) <  = min : (3) s2S c2C(s) jcj In order to characterize  , we need to study the system under redundancy in more detail. For that, we denote by N (t) the number of type-c distinct jobs that are present in the redundancy system at time t and N (t) = (N (t); c 2 C). Furthermore, we denote the number of copies per server by M (t) := N (t), s 2 S, and M (t) = (M (t); : : : ; M (t)). For the j- s c 1 K c2C(s) th type-c job, let b denote the service requirement of this job, for j = 1; : : : ; N (t), c 2 C. cj c Let a (t) denote the attained service in server s of the j-th type-c job at time t. We denote by cjs A (t) = (a (t)) a matrix on R of dimension N (t)jcj. Note that the number of type-c jobs c cjs js + c increases by one at rate p , which implies that a row composed of zeros is added to A (t). When c c 1 2 3 4 1 2 1 2 1 2 3 4 a) b) c) d) Figure 1: From left to right, the redundancy-d model (for K = 4 and d = 2), the N -model, the W -model and the WW -model. one element a (t) in matrix A (t) reaches the required service b , the corresponding job departs cjs c cj and all of its copies are removed from the system. Hence, row j in matrix A (t) is removed. We ~ ~ further let  (M (t)) be the capacity that each of the copies in server s obtains when in state M (t), which under PS is given by,  (M (t)) := . The cumulative service that a copy in server s M (t) gets during the time interval (v; t) is (v; t) :=  (M (x))dx: s s x=v In order to characterize the stability condition, we define the capacity-to-fraction-of-arrivals ratio of a server in a subsystem: Definition 1 (Capacity-to-fraction-of-arrival ratio). For any given set of servers S  S and its ~ ~ associated set of job types C = fc 2 C : c  Sg, the capacity-to-fraction-of-arrival ratio of ~ ~ P ~ ~ server s 2 S in this so-called S-subsystem is defined by ; where C(s) = C \C(s) is the c2C(s) subset of types inC that are served in server s. Some common models A well-known structure is the redundancy-d model, see Figure 1 a). Within this model, each job has d out of K compatible servers, where d is fixed. That is, p > 0 for all c 2 P (S) withjcj = d, K K and p = 0 otherwise, so that there are jCj = types of jobs. If additionally, p = 1= c c d d for all c 2 C, we say that the arrival process of jobs is homogeneously distributed over types. We will call this model the redundancy-d model with homogeneous arrivals. The particular case where server capacities are also homogeneous, i.e.,  =  for all k = 1; : : : ; K; will be called the redundancy-d model with homogeneous arrivals and server capacities. 0 0 In [21] the nested redundancy model was introduced, where for all c; c 2 C, either i) c  c or 0 0 ii) c  c or iii) c\ c = ;. First of all, note that the redundancy-d model does not fit in the nested structure. The smallest nested system is the so called N -model (Figure 1 b)): this is a K = 2 server system with types C = ff2g;f1; 2gg. Another nested system is the W -model (Figure 1 c)), that is, K = 2 servers and types C = ff1g;f2g;f1; 2gg. In Figure 1 d), a nested model with K = 4 servers and 7 different jobs types, C = ff1g;f2g;f3g;f4g; f1; 2g;f3; 4g;f1; 2; 3; 4gg is given. This model is referred to as the WW -model. 4 An illustrative example Before formally stating the main results in Section 5.1, we first illustrate through a numerical example some of the key aspects of our proof, and in particular the essential role played by the 6 a)  = 1:8 b)  = 2:1 c)  = 7:5 d)  = 9 Figure 2: Trajectory of the number of copies per server with respect to time for a K = 4 redundancy-2 system with exponentially distributed job sizes. Figures a) and b) consider ho- mogeneous capacities  = 1 for k = 1; : : : ; 4 and homogeneous arrival rates per type, p = 1=6 for all c 2 C, with a)  = 1:8 and b)  = 2:1. Figures c) and d) consider heterogeneous server capacities ~ = (1; 2; 4; 5) and arrival rates per type p~ = (0:25; 0:1; 0:1; 0:2; 0:2; 0:15) for typesC, c) with  = 7:5 and d)  = 9. capacity-to-fraction-of-arrival ratio defined in Definition 1. In Figure 2 we plot the trajectories of the number of copies per server with respect to time for a K = 4 redundancy-2 system (Figure 1 a)), that is C = ff1; 2g;f1; 3g;f1; 4g;f2; 3g;f2; 4g;f3; 4gg. Our proof techniques will rely on fluid limits, and therefore we chose large initial points. Figures 2 a) and b) show the trajectories when servers and arrivals of types are homogeneous for  = 1:8 and  = 2:1, respectively. Figures 2 c) and d) consider a heterogeneous system (parameters see the legend) for  = 7:5 and = 9, respectively. The homogeneous example (Figure 2 a) and b)) falls within the scope of [3]. There it is shown that the stability condition is  < . We note that this condition coincides with the stability condition of a system in which all the d copies need to be fully served. In Figure 2 a) and b), the value for  is chosen such that they represent a stable and an unstable system, respectively. As formally proved in [3], at the fluid scale, when the system is stable the largest queue length decreases, whereas in the unstable case the minimum queue length increases. It thus follows, that in the homogeneous case, either all classes are stable, or unstable. The behavior of the heterogeneous case is rather different. The parameters corresponding to Figures 2 c) and d) are such that the system is stable in c), but not in d). In Figure 2 c) we see that the trajectories of all queue lengths are not always decreasing, including the maximum queue length. In Figure 2 d), we observe that the number of copies in servers 3 and 4 are decreasing, whereas those of servers 1 and 2 are increasing. When studying stability for the heterogeneous setting, one needs to reason recursively. First, 7 P c2C(s)p assume that each server s needs to handle its full load, i.e.,  . Hence, one can simply P s compare the servers capacity-to-fraction-of-arrival ratios,  = p , to see which server is s c c2C(s) the least-loaded server and could hence potentially empty first. In this example, server 4 has the maximum capacity-to-fraction-of-arrival ratio, and, in fluid scale, will reach zero in finite time, and remain zero, since  = p = 5=(p + p + p ) = 11:11 is larger than 4 c f1;4g f2;4g f3;4g c2C(4) = 7:5. Whenever, at fluid scale, server 4 is still positive, the other servers might either increase or decrease. However, the key insight is that once the queue length of server 4 reaches 0, the fluid behavior of the other classes no longer depend on the jobs that also have server 4 as compatible server. That is, we are sure that all jobs that have server 4 as compatible server, will be fully served in server 4, since server 4 is in fluid scale empty and all the other servers are overloaded. Therefore, jobs with server 4 as compatible server can be ignored, and we are left with a subsystem formed by servers f1; 2; 3g and without the job types served by server 4. Now again, we consider the maximum capacity-to-fraction-of-arrival ratio in order to determine the least-loaded server, but now for the subsystem f1; 2; 3g. This time, server 3 has the maximum capacity-to-fraction- of-arrival ratio, which is 4=(p + p ) = 10. Since this value is larger than  = 7:5, it is a f1;3g f2;3g sufficient condition for server 3 to empty. Similarly, once server 3 is empty, we consider the subsystem with servers 1 and 2 only. Hence, there is only one type of jobs, f1; 2g. Now server 2 is the least-loaded server and its capacity-to- fraction-of-arrival ratio is 2=p = 8. This value being larger than the arrival rate, implies that f1;2g server 2 (and hence server 1, because there is only one job type) will be stable too. Indeed, in Figures 2 c) we also observe that as soon as the number of copies in server 3 is relatively small compared to that of server 1 and server 2, the number of copies in both server 1 and server 2 decreases. We can now explain the evolution observed in Figure 2 d) when  = 9. The evolution for servers 4 and 3 can be argued as before: both their capacity-to-fraction-of-arrival ratios are larger than  = 9, hence they empty in finite time. However, the capacity-to-fraction-of-arrival ratio of the subsystem with servers 1 and 2, which is 8, is strictly smaller than the arrival rate. We thus observe that, unlike in the homogeneous case, in the heterogeneous case some servers might be stable, while others (here server 1 and 2) are unstable. Proposition 1 formalizes the above intuitive explanation, by showing that the stability of the system can be derived recursively. The capacity-to-fraction-of-arrival ratio allows us now to reinterpret the homogeneous case depicted in Figure 2 a) and b). In this case, the capacity-to-fraction-of-arrival ratio of all the servers is the same, which implies (i) that either all servers will be stable, or all unstable, and (ii) from the stability viewpoint is as if all copies received service until completion. 5 Stability condition 5.1 Multi-type job multi-type server system In this section we discuss the stability condition of the general redundancy system with PS. In order to do so, we first define several sets of subsystems, similar to as what we did in the illustrative example of Section 4. The first subsystem includes all servers, that is S = S. We denote by L the set of servers 1 1 with highest capacity-to-fraction-of-arrival ratio in the system S = S. Thus, s~ L = s 2 S : s = arg max P : 1 1 s~2S p c2C d=2 d=2 d=2 1 2 3 4 11 22 33 44 11 22 33 44 S S S 1 2 3 L L L 1 2 3 a) b) c) Figure 3: K = 4 server system under redundancy-2. In a) subsystem S , in b) subsystem S and 1 2 in c) subsystem S . For i = 2; : : : ; K , we define recursively i1 S := Sn[ L ; i l l=1 C := fc 2 C : c  S g; i i C (s) := C \C(s); i i ( ( )) s~ L := s 2 S : s = arg max P : i i s~2S i c c2C (s~) The S -subsystem will refer to the system consisting of the servers in S , with only jobs of types i i in the set C . The C (s) is the subset of types that are served in server s in the S -subsystem. We i i i let C = C. The L represents the set of servers s with highest capacity-to-fraction-of-arrival ratio 1 i in the S -subsystem, or in other words, the least-loaded servers in the S -subsystem. Finally, we i i denote by i := arg max fC : C 6= ;g the last index i for which the subsystem S is not i=1;:::;K i i i empty of job types. Remark 2. We illustrate the above definitions by applying them to the particular example con- sidered in Section 4. The first subsystem consists of servers S = S = f1; 2; 3; 4g and all job types, see Figure 3 a). The capacity-to-fraction-of-arrival ratios in the S subsystem are: f2:2; 3:07; 8:8; 11:1g, and thusL = f4g. The second subsystem is formed by S = f1; 2; 3g and 1 2 job types that are compatible with server 4 can be ignored, that is, C = ff1; 2g;f1; 3g;f2; 3gg, see Figure 3 b). The capacity-to-fraction-of-arrival ratios for servers in the S subsystem are given byf2:8; 4:4; 10g, and thusL = f3g. The third subsystem consists of servers S = f1; 2g and job 2 3 types that are compatible with servers 3 or 4 can be ignored, that is, C = ff1; 2gg, see Figure 3 c). The capacity-to-fraction-of-arrival ratios for servers in the S subsystem are given by f4; 8g. Hence,L = f2g. Then, S = f1g, but C = ;, so that i = 3. 3 4 4 The value of the highest capacity-to-fraction-of-arrival ratio in the S -subsystem is denoted by s~ CAR := maxf g; for i = 1; : : : ; i : s~2S p c2C (s~) Note that CAR = ; for any s 2 L : i i c2C (s) In the following proposition we characterize the stability condition for servers in terms of the capacity-to-fraction-of-arrival ratio corresponding to each subsystem. It states that servers that have highest capacity-to-fraction-of-arrival ratio in subsystem S can be stable if and only if all servers in S ; : : : ; S are stable as well. The proof can be found in Section 7. 1 i1 Proposition 1. For a given i  i , servers s 2 L are stable if  < CAR , for all l = 1; : : : ; i. i l Servers s 2 L are unstable if there is an l = 1; : : : ; i such that  > CAR . i l Corollary 2. The redundancy system is stable if  < CAR , for all i = 1; : : : ; i : The redundancy system is unstable if there exists an  2 f1; : : : ; i g such that  > CAR . We note that CAR , l = 1; : : : ; i, are not necessarily ordered with respect to l. From the corollary, we hence obtain that the stability region under redundancy is given by = min CAR : (4) i=1;:::;i We now write an equivalent representation of the stability condition (proof see Appendix). Denote by R(c) the set of servers where type-c jobs achieve maximum capacity-to-fraction-of- arrival ratio, or in other words, the set of least-loaded servers for type c: R(c) := fs : 9i; s.t. c 2 C (s) and s 2 L g: i i Note that there is a unique subsystem S for which this happens, i.e., R(c)  L for exactly one i i i. We note that for a type-c job, if c contains at least a server that was removed in the ith iteration, then R(c)  L . We further letR := [ R(c). i c2C Corollary 3. The redundancy system is stable if  p <  , for all s 2 R. The redun- c s c:s2R(c) dancy system is unstable if there exists an s 2 R such that  p >  . c s c:s2R(c) From the above corollary, we directly observe that the stability condition for the redundancy system coincides with the stability condition corresponding to K individual servers where each type-c job is only dispatched to its least-loaded servers. 5.2 Particular redundancy structures In this subsection we discuss the stability condition for some particular cases of redundancy: redundancy-d and nested systems. Redundancy-d We focus here on the redundancy-d structure (defined in Section 3) with homogeneous arrivals, i.e. p = for all c 2 C. c K ( ) In case the servers capacities are homogeneous,  =  for all k, the model fits in the setting of [3] where it was proved to be stable if d < K . This would also follow from Corollary 2: Since arrivals are homogeneous, the arrival rate to each server is d=K , thus the capacity-to- fraction-of-arrival ratio at every server is K=d. This implies that L = S, i = 1 and R(c) = c for all c 2 C. From Corollary 2, we obtain that the system is stable if d < K . For heterogeneous servers capacities, which was not studied in [3], we have the following: Corollary 4. Under redundancy-d with homogeneous arrivals and  < : : : <  , the system is 1 K i1 ( ) d1 stable if for all i = d; : : : ; K ,  <  . The system is unstable if there exists i 2 fd; : : : ; Kg ( ) i1 ( ) d1 such that  >  . ( ) In the homogeneous case, it is easy to deduce that the stability condition, d < K , decreases as d increases. However, in the heterogeneous case, both the numerator and denominator are non- monotone functions of d, and as a consequence it is not straightforward how the stability condition depends on d. This dependence on d will be numerically studied in Section 6.1. 10 Nested systems In this section we consider two nested redundancy systems. 5.2.1 N -model The simplest nested model is the N -model. This is a K = 2 server system with capacities ~ = f ;  g and types C = ff2g;f1; 2gg, see Figure 1 (b). A job is of type f2g with probability p 1 2 and of typef1; 2g with probability 1 p. The stability condition is  <  where: 2 1 >  ; 0  p 2 1 2 =(1 p);  p 2 1 2 : 2 =p; < p  1: 1 2 The above is obtained as follows: The capacity-to-fraction-of-arrival ratio of the system is  =(1 p) and  , respectively for server 1 and server 2. First assume  =(1 p) >  . Then L = f1g 2 1 2 1 and the second subsystem is composed of server S = f2g and C = ff2gg, with arrival rate 2 2 p to server 2. Hence the capacity-to-fraction-of-arrival ratio of server 2 in the S -subsystem is =p. From Corollary 2, it follows that  = minf =(1 p);  =pg. On the other hand, if 2 1 2 =(1 p) <  , then L = f2g, and S = f1g, but C = ;. Thus,  =  . Lastly, if 1 2 1 2 2 2 =(1 p) =  ,L = f1; 2g, thus S = ; andC = ;. Hence,  =  . 1 2 1 2 2 2 We observe that the stability condition  , is a continuous function reaching the maximum value  =  +  at p =  =( +  ). It thus follows that for p =  =( +  ), redundancy 1 2 2 1 2 2 1 2 achieves the maximum stability condition. We note however that in this paper our focus is not on finding the best redundancy probabilities, but instead whether given the probabilities p –which are determined by the characteristics of the job types and matchings – the system can benefit from redundancy. 5.2.2 W -model The W -model is a K = 2 server system with capacities ~ = f ;  g and typesC = ff1g;f2g;f1; 2gg, 1 2 see Figure 1 c). A job is of type f1g with probability p , type f2g with probability p and of f1g f2g type f1; 2g with probability p . W.l.o.g., assume (1 p )=  (1 p )= , that is, the f1;2g f2g 1 f1g 2 load on server 1 is larger than or equal to that on server 2. The stability condition is then given by: =(1 p ); p 2 f1g f1g 1 2 =p ; p  ; f1g f1g 1 2 if (1 p )= > (1 p )= . And, 1 2 f2g f1g =  =(1 p ) f1g if (1 p )= = (1 p )= . Similar to the N -model, the above can be obtained from 1 2 f2g f1g Corollary 2. When p =  =( +  ), maximum stability  =  +  is obtained. f1g 1 1 2 1 2 6 When does redundancy improve stability In this section, we compare the stability condition of the general redundancy system to that of the Bernoulli routing. Each job type has its own compatible servers, denoted by c. Hence, given the compatible servers and the arrival rates of each type of jobs, we study whether redundancy can improve the stability condition. 11 R From Corollary 2, it follows that  = min CAR . Together with (3), we obtain the i=1;:::;i i following sufficient and necessary conditions for redundancy to improve the stability condition. Corollary 5. The stability condition under redundancy is larger than under Bernoulli routing if and only if s s P P min f g  minf g: i=1;:::;i ;s2L p s2S c2C (s) c2C(s) i jcj From inspecting the condition of Corollary 5, it is not clear upfront when redundancy would be better than Bernoulli. In the rest of the section, by applying Corollary 5 to redundancy-d and nested models, we will show that when the capacities of the servers are sufficiently heterogeneous, the stability of redundancy is larger than that of Bernoulli. In addition, numerical computations allow us to conclude that the degree of heterogeneity needed in the servers in order for redundancy to be beneficial, decreases in the number of servers, and increases in the number of redundant copies. 6.1 Redundancy-d In this section, we compare the stability condition of the redundancy-d model with homogeneous arrivals to that of Bernoulli routing. From (3), we obtain that ( ) = d min = K min  : (5) i=1;:::;K p i=1;:::;K c2C(s) ( ) From Corollary 4 , we obtain that  = min  . The following corollary is i=d;:::;K i1 i ( ) d1 straightforward. Corollary 6. Let  < : : : <  . The system under redundancy-d and homogeneous arrivals has 1 K a strictly larger stability condition than the system under Bernoulli routing if and only if ( ) K < min  : 1 i i1 i=d;:::;K d1 i1 The following is straightforward, since is increasing in i. d1 Corollary 7. Assume  < : : : <  and homogeneous arri-vals. The system under redundancy- 1 K d has a larger stability region than the Bernoulli routing if  d <  . 1 d Hence, if there exists a redundancy parameter d such that  d <  , then adding d redundant 1 d copies to the system improves its stability region. In that case, the stability condition of the system will improve by at least a factor . In Table 2, we analyze how the heterogeneity of the server capacities impacts the stability k1 of the system. We chose  =  , k = 1; : : : ; K , so that the minimum capacity equals 1. Hence, for Bernoulli,  = K . Under redundancy we have the following: For  = 1 the system is a redundancy-d system with homogeneous arrivals and server capacities, so that  = K=d, R B [3]. Thus,  <  in that case. For  > 1, that is, heterogeneous servers, we can apply Corollary 2 in order to find  , that is, use Equation (4). More precisely, we create recursively the i subsystems, calculate CAR for each i = 1; : : : ; i , so that  = min CAR . We i i=1:::;i i denote by  the value of  for which the stability region of the redundant system coincides with R B that of Bernoulli routing, i.e., the value of  such that  =  . For  <  (the area on the left-hand-side of the thick line in Table 2), Bernoulli has a larger stability region, while for  > (the area on the right-hand-side of the thick line in in Table 2), redundancy outperforms Bernoulli. First, we observe that, for a fixed d,  decreases as K increases, and is always less than = 2. Therefore, as the number of servers increases, the level of heterogeneity that is needed in the servers in order to improve the stability under redundancy decreases. Second, for fixed K , we also observe that  increases as d increases. This means that as the number of redundant copies d increases, the server capacities need to be more heterogeneous in order to improve the stability region under redundancy. Finally, focusing on the numbers in bold, we observe that when the number of servers K is large enough and the servers are heterogeneous enough (large ), the stability region increases in the number of redundant copies d. R B Table 2: The maximum arrival rates  and  in a redundancy-d system with homogeneous k1 arrivals and capacities  =  . = 1  = 1:2  = 1:4  = 2  = 3 K = 3 Red-2 1.5 2.16 2.94 6 9 1.41 BR 3 3 3 3 3 K = 4 Red-2 2 3.45 5.48 12 18 1.26 BR 4 4 4 4 4 K = 5 Red-2 2.5 5.18 9.14 20 30 1.19 BR 5 5 5 5 5 K = 10 Red-2 5 22.39 41.16 90 135 1.08 BR 10 10 10 10 10 K = 4 Red-3 1.33 2.30 3.65 10.66 36 1.44 BR 4 4 4 4 4 K = 5 Red-3 1.66 3.45 6.40 26.66 90 1.31 BR 5 5 5 5 5 K = 10 Red-3 3.33 17.19 60.23 320 1080 1.13 BR 10 10 10 10 10 In Table 3, we consider linearly increasing capacities on the interval [1; M ], that is  = M1 1 + (k 1), for k = 1; : : : ; K . In the area on the right-hand-side of the thick line, redundancy K1 outperforms Bernoulli. For this specific system, the following corollary is straightforward. Corollary 8. Under a redundancy-d system with homogeneous arrivals and capacities  = M1 MK 1 + (k 1), for k = 1; : : : ; K , the redundancy system has stability condition:  = , K1 d for d > 1, while  = K . Hence, the redundancy system outperforms the stability condition of the Bernoulli routing if and only if M  d. Simple qualitative rules can be deduced. If M  d, redundancy is a factor M=d better than Bernoulli. Hence, increasing M , that is, the heterogeneity among the servers, is significantly beneficial for the redundancy system. However, the stability condition of the redundancy system degrades as the number of copies d increases. 6.2 Nested systems 6.2.1 N -model The stability condition of the N -model with Bernoulli routing is given by the following expression: 2 minf ;  g; if p = 0 1 2 < + 2 1 2 =(1 p); if 0  p 1 2 2 1 2 =(1 + p); if < p  1: 1 2 The above set of conditions is obtained from the fact that under Bernoulli routing,  = minf2 =(1 1 B p);  =(p + (1 p))g. Note that  is a continuous function with a maximum  +  at the 2 1 2 2 1 B R point p = . Now, comparing  to  as obtained in Section 5.2.1 leads to the following: 1 2 13 R B Table 3: The maximum arrival rates  and  in a redundancy-d system with homogeneous M1 arrivals and capacities  = 1 + (k 1). K1 M = 1 M = 2 M = 3 M = 4 M = 6 K = 3 Red-2 1.5 3 4.5 6 9 BR 3 3 3 3 3 K = 4 Red-2 2 4 6 8 12 BR 4 4 4 4 4 K = 5 Red-2 2.5 5 7.5 10 15 BR 5 5 5 5 5 K = 10 Red-2 5 10 15 20 30 BR 10 10 10 10 10 K = 4 Red-3 1.33 2.66 4 5.33 8 BR 4 4 4 4 4 K = 5 Red-3 1.66 3.33 5 6.66 10 BR 5 5 5 5 5 K = 10 Red-3 3.33 6.66 10 13.33 20 BR 10 10 10 10 10 Corollary 9. Under an N -model, the stability condition under redundancy is larger than under 2 1 Bernoulli routing under the following conditions: If    , then p 2 ( ; 1). If 2 1 2 + 2 1 2 2 2 1 + 2 1 >  , then p 2 (0; ( ) )[ ( ; 1). 2 1 2 + 2 2 1 From the above we conclude that if  is larger than 2 , then redundancy is always better 1 2 than Bernoulli, independent of the arrival rates of job types. For the case  >  , we observe 2 1 that for  large enough, redundancy will outperform Bernoulli. 6.2.2 W -based nested systems We consider the following structure of nested systems: W (see Figure 1 c) ), WW (Figure 1 d)) and WWWW . The latter is a K = 8 server system that is composed of 2 WW models and an additional job type c = f1; : : : ; 8g for which all servers are compatible. For all three models, we assume that a job is with probability p = 1=jCj of type c. In Table 4, we analyze how heterogeneity in the server capacities impacts the stability. First B R of all, note that  = K . For redundancy, the value of  is given by (4), which depends on the server capacities. In the table, we present these values for different values of the server capacities. k1 In the upper part of the table, we let  =  for k = 1; : : : ; K . We denote by  the value of R B for which  =  . We observe that as the number of servers duplicate, the  decreases, and is always smaller than 1.5. So that, as the number of servers increases, the level of heterogeneity that is needed in order for redundancy to outperform Bernoulli decreases too. M1 In the second part of the table we assume  = 1 + (k 1) for k = 1; : : : ; K . We observe K1 that when M  K the stability condition under redundancy equals  = jCj, which is always larger than  = K . However, as the number of servers increases, the maximum capacity of the servers, M , needs to increase M in order for redundancy to outperform Bernoulli. 7 Proof of Proposition 1 In this section, we prove that the condition in Proposition 1 is sufficient and necessary for the respective subsystem to be stable. As we observe in Section 4, there are two main issues concern- ing the evolution of redundancy systems with heterogeneous capacities. First of all, the number of copies in a particular server decreases, only if a certain subset of servers is already in steady state. Secondly, for a particular server s 2 S, the instantaneous departure of that server might be larger than  due to copies leaving in servers other than s. This makes the dynamics of the system complex. In order to prove Proposition 1, we therefore construct upper and lower bounds 14 R B Table 4: The maximum arrival rates  and  in nested systems. k1 =   = 1  = 1:2  = 1:4  = 2 K = 2 W -model 1.5 1.8 2.10 3 1.33 BR 2 2 2 2 K = 4 WW -model 2.33 4.03 4.90 7 1.19 BR 4 4 4 4 K = 8 WWWW -model 3.75 8.64 10.5 15 1.17 BR 8 8 8 8 M1 = 1 + (k 1) M = 1 M = 2 M = 4 M = 6 M = 8 K1 K = 2 W -model 1.5 3 3 3 3 BR 2 2 2 2 2 K = 4 WW -model 2.33 4.66 7 7 7 BR 4 4 4 4 4 K = 8 WWWW -model 3.75 7.14 10.71 12.85 15 BR 8 8 8 8 8 of our system for which the dynamics are easier to characterize. Proving that the upper bound (lower bound) is stable (unstable) directly implies that the original system is also stable (unstable). This will be done in Proposition 12 and Proposition 15. All proofs of this section can be found in Appendix B. Sufficient stability condition We define the Upper Bound (UB) system as follows. Upon arrival, each job is with probability p of type c and sends identical copies to all servers s 2 c. In the UB system, a type-c job departs the system only when all copies in the set of servers R(c) are fully served. We recall that the set R(c) denotes the set of servers where a type-c job achieves maximum capacity-to-fraction-of- arrivals ratio. When this happens, the remaining copies that are still in service (necessarily not in UB a server in R(c)) are immediately removed from the system. We denote by N (t) the number of type-c jobs present in the UB system at time t. We note that the UB system is closely related to the one in which copies of type-c jobs are only sent to servers in R(c). However, the latter system is of no use for our purposes as it is neither an upper bound nor a lower bound of the original system. We can now show the first implication of Proposition 1, that is, we prove that  < CAR , for all l = 1; : : : ; i, implies stability of the servers in the set L . We do this by analyzing the UB system for which stability of the servers L follows intuitively as follows: Given a server s 2 L i 1 and any type c 2 C (s), it holds that R(c)  L (c). Hence, a server in L will need to fully serve 1 1 all arriving copies. Therefore each server s, with s 2 L , behaves as an M/G/1 PS queue, which is stable if and only if its arrival rate of copies,  p , is strictly smaller than its departure c2C (s) rate,  . Assume now that for all l = 1; : : : ; i 1 the subsystems S are stable and we want to s l show that servers inL are stable as well. First of all, note that in the fluid limit, all types c that do not exist in the S -subsystem, i.e., c 2= C (s), will after a finite amount of time equal (and remain) i i zero, since they are stable. For the remaining types c that have copies in server s 2 L , i.e., s 2 c with s 2 L , it will hold that their servers with maximum capacity-to-fraction-of-arrivals ratio are R(c)  L . Due to the characteristics of the upper-bound system, all copies sent to these servers will need to be served. Hence, a server s 2 L behaves in the fluid limit as an M/G/1 PS queue with arrival rate  p and departure rate  . In particular, such a queue is stable if and c s c2C (s) P i only if  p <  . c s c2C (s) Proposition 10. For i  i , the set of servers s 2 L in the UB system is stable if  < CAR ; for all l = 1; : : : ; i: In the following, we prove that UB provides an upper bound on the original system. To do so, we show that every job departs earlier in the original system than in the UB system. In the 15 statement, we assume that in case a job has already departed in the original system, but not in the UB system, then its attained service in all its servers in the original system is set equal to its service requirement b . cj UB UB Proposition 11. Assume N (0) = N (0) and a (0) = a (0), for all c; j; s. Then, N (t) c cjs c c cjs UB UB N (t) and a (t)  a (t), for all c; j; s and t  0. cjs c cjs Together with Proposition 10, we obtain the following result for the original system. Proposition 12. For a given i  i , servers s 2 L are stable if  < CAR , for all l = 1; : : : ; i. i l Remark 3. In [3], the authors show that for the redundancy-d system with homogeneous arrivals and server capacities, the system where all the copies need to be served is an upper bound. We note that this upper bound coincides with our upper bound (in that case L = S). Nevertheless, the proof approach is different. In [3], see also [28], the proof followed directly, as each server in the upper bound system behaved as an M/G/1 PS queue. In the heterogeneous server setting studied here, the latter is no longer true. Instead, it does apply recursively when considering the fluid regime: In order to see a server as a PS queue in the fluid regime, one first needs to argue that the types that have copies in higher capacity-to-fraction-of-arrivals servers are 0 at a fluid scale. Remark 4. We note that the light-tail assumption on the service time distribution, see Section 3, is an assumption needed in order to prove Lemma 18 (see Appendix B for more details). Necessary stability condition In this section we prove the necessary stability condition of Proposition 1. Let us first define := minfl = 1; : : : ; i :  > CAR g : We note that for any i < ,  < CAR . Hence, the servers in L , with i <  are stable, see i i Proposition 10. We are left to prove that the servers in S cannot be stable. In order to do so, we construct a lower-bound system. In the S subsystem, the capacity-to-fraction-of-arrivals ratios are such that for all s 2 S , =( p )  CAR . We will construct a lower bound (LB) system in which the resulting s c c2C (s) capacity-to-fraction-of-arrivals ratio is CAR for all servers s 2 S . We use the superscript LB in the notation to refer to this system, which is defined as follows. First of all, we only want to focus LB on the S system, hence, we set the arrival rate p = 0 for types c 2 CnC , whereas the arrival LB rate for types c 2 C remain unchanged, i.e., p = p . The capacity of servers s 2 S in the LB-system is set to c2C (s) LB :=  P = CAR  ( p ); s~  c c2C (s~) c2C (s) where s ~2 L . Additionally, in the LB-system, we assume that each copy of a type-c job receives LB LB the same amount of capacity, which is equal to the highest value of  =M (t), s 2 c. We s s therefore define the service rate for a job of type c by LB LB LB s (N (t)) := max ; (6) LB s2c M (t) where c 2 C (instead of  () for a copy in server s in the original system). The cumulative amount of capacity that a type-c job receives is LB LB LB (v; t) :=  (N (x))dx; for c 2 C : c c x=v 16 Proposition 13. In the LB-system, the set of servers s 2 S is unstable if  > CAR . We now prove that LB is a lower bound for the original system. LB LB Proposition 14. Assume N (0) = N (0), for all c. Then, N (t)  N (t), for all c 2 C and c c st c c t  0. Combining Proposition 13 with Proposition 14, we obtain the following result for the original system. Proposition 15. Servers s 2 S are unstable if there is an l = 1; : : : ; i such that  > CAR . Remark 5. In the special case of redundancy-d with homogeneous arrivals and server capacities, [3] used a lower bound that consisted in modifying the service rate obtained per job type, as in (6). This lower bound coincides with our lower bound, since with homogeneous arrivals and servers LB it holds that  =  = . The difficulty when studying heterogeneous servers in a general redundancy structure, as we do in this paper, lies in the fact that the load received in each server is different. In order to show that the fluid limit of the server with the minimum number of copies is increasing (in the lower bound), we need to adequately modify the server capacities in order to make sure that the capacity-to-fraction-of-arrival rates in each of the servers is equal. 8 Numerical analysis We have implemented a simulator in order to assess the impact of redundancy. In particular, we evaluate the following: • For PS servers, we numerically compare the performance of redundancy with Bernoulli routing (in Section 6 this was done analytically for the stability conditions). • We compare redundancy to the Join the Shortest Queue (JSQ) policy according to which each job is dispatched to the compatible server that has the least number of jobs (ties are broken at random). In a recent paper, [7], it was shown that JSQ – with exponential ser- vice time distributions – combined with size-unaware scheduling disciplines such as FCFS, ROS or PS, is maximum stable, i.e., if there exists a static dispatching policy that achieves stability, so will JSQ. • We compare the performance between PS, FCFS and Random Order of Service (ROS), when the service time distribution is exponential and bounded Pareto. Our simulations consider a large number of busy periods (10 ), so that the variance and confidence intervals of the mean number of jobs in the system are sufficiently small. Exponential service time distributions: In Figure 4 we consider the W -model with expo- nential service time distributions. We set p = 0:35 and p +p = 0:65, and vary the value f1g f2g f1;2g of p . We consider either ~ = (1; 2) or ~ = (2; 1), The only redundant job type is f1; 2g, thus f1;2g as p increases, we can observe how increasing the fraction of redundant jobs affects the per- f1;2g formance. We also note that when p increases, the load in server 1 increases as well, whereas f1;2g the load in server 2 stays constant. In Figure 4 a) and b) we depict the mean number of jobs under redundancy, Bernoulli routing and JSQ when the server policy is PS. In Figure 4 c) we plot  , B J and  using the analysis of Section 5.2.2. and [7], respectively. We observe from Figure 4 a) and b) that when ~ = (1; 2), redundancy performs better than Bernoulli routing. This difference becomes larger as p increases. This is due to the fact that the f1;2g redundancy policy does better in exploiting the larger capacity of server 2 than Bernoulli, which 17 becomes more important as p increases. In addition, we note that for redundancy, Bernoulli f1;2g and JSQ, the mean number of jobs increases as p increases. The reason for this is that as f1;2g p increases, the load on server 1 increases. Since server 1 is the slow server, this increases the f1;2g mean number of jobs. In the opposite case, i.e., ~ = (2; 1), the mean number of jobs is non-increasing in p . f1;2g This is because as p increases, the load on server 1 increases. Since server 1 is now the fast f1;2g server, this has a positive effect on the performance (decreasing mean number of jobs). However, as p gets larger, the additional load (created by the copies) makes that the performance can be f1;2g negatively impacted. This happens for  = 2, where the mean number of jobs under redundancy is a U-shape function. We furthermore observe that in the ~ = (2; 1) case, redundancy outperforms Bernoulli for any value of p when  = 1:5. However, when  = 2, Bernoulli outperforms f1;2g redundancy when p > 0:49. This is due to the additional load, generated under redundancy, f1;2g that becomes more pronounced as p becomes larger. f1;2g We also observe in Figure 4, that under both ~ = (1; 2) and ~ = (2; 1), JSQ outperforms redundancy. For small values of p the difference is rather small, however it becomes larger as f1;2g p increases due to the additional load that redundancy creates. However that this improvement f1;2g does not come for free, as JSQ requires precise information of the queue lengths at all times. In Figure 4 c), we observe that redundancy consistently has a larger stability region than Bernoulli in the ~ = (1; 2) case and for p 2 [0; 0:5) in the ~ = (2; 1) case. We let f1;2g J J be the value of  such that JSQ is stable if  <  and unstable if  >  . Using [7], = max min : p 0; p =p c;s c;s c c;s We observe that the stability condition under redundancy coincides on a large region with that of JSQ, which, in view of the results of [7], implies that redundancy is in that region maximum stable. In Figure 5 we simulate the performance of the W model for different values of  , while keeping fixed p~ = (p ; p ; p ) and  = 1. In Figure 5 a) we plot the mean number of jobs f1g f2g f1;2g 1 and we see that for both configurations of p~, the performance of the redundancy with PS, Bernoulli and JSQ improve as  increases. The gap between redundancy and Bernoulli is significant in both R B J cases. The reason can be deduced from Figure 5 b), where we plot  ,  , and  , with respect to  . We observe in Figure 5 a) that redundancy and JSQ converge to the same performance as grows large. Intuitively, we can explain this by observing that for very large values of  , with 2 2 both redundancy and JSQ, all jobs of type p get served in server 2. We observe in Figure 5 b) f1;2g that the stability conditions with redundancy and JSQ are very similar. a)  = 1:5 b)  = 2 c) Figure 4: W -model with p = 0:35, p = 1p p . a) and b) depict the mean number f1g f2g f1g f1;2g of jobs under redundancy with PS (), Bernoulli routing () and JSQ () for  = 1:5 and  = 2. R B J c) depicts the stability regions  ,  and  . 18 General service time distributions: In Figure 6 a) we investigate the performance of redun- dancy with PS for several non-exponential distributions. In particular, we consider the following distributions for the service times: deterministic, hyperexponential, and bounded Pareto. With the hyperexponential distribution, job sizes are exponentially distributed with parameter  ( ) with 1 2 1(k=x) probability q (1 q). For Pareto the density function is , for k  x  q ~. We choose the (1(k=q~) ) parameters so that the mean service time equals 1. Namely for the hyperexponential distribution parameters are q = 0:2,  = 0:4 and  = 1:6, and for the bounded Pareto distribution are 1 2 = 0:5, q ~ = 6 and k = 1=q ~. In Figure 6 a), we plot the mean number of jobs as a function of  for the N , W , WW , and redundancy-2 (K = 5), and redundancy-4 (K = 5) models. The respective parameters p~ are chosen such that the system is stable for the simulated arrival rates. We observe that for the five systems, performance seems to be nearly insensitive to the service time distribution, beyond its mean value. Markov-modulated capacities: In Figure 6 b) we consider a variation of our model where servers’ capacities fluctuate over time. More precisely, we assume that each server has an ex- ponential clock, with mean . Every time the clock rings, the server samples a new value for S from Dolly(1,12), see Table 5 and sets its capacity equal to 1=S. The Dolly(1,12) distribution is a 12-valued discrete distribution that was empirically obtained by analyzing traces in Facebook and Microsoft clusters, see [1, 12]. In Figure 6 b) we plot the mean number of jobs for a K = 5 server system with redundancy- 2 and redundancy-4, and for the W -model under redundancy, and we compare it with Bernoulli routing. Arrival rates are equal for all classes. It can be seen that with Bernoulli routing, both redundancy-2 and redundancy-4 become equivalent systems, and hence their respective curves overlap. The general observation is that in this setting with identical servers, Bernoulli routing performs better than redundancy. Further research is needed to understand whether with heteroge- neous Markov-modulated servers, redundancy can be beneficial. Table 5: The Dolly(1,12) empirical distribution for the slowdown [1]. The capacity is set to 1=S. S 1 2 3 4 5 6 7 8 9 10 11 12 Prob 0.23 0.14 0.09 0.03 0.08 0.10 0.04 0.14 0.12 0.021 0.007 0.002 FCFS and ROS scheduling discipline: The stability condition under FCFS or ROS and identical copies is not known. An exception is the redundancy-d model with homogeneous arrivals and server capacities for which [3] characterizes the stability condition under ROS, FCFS and PS. There it was shown that ROS is maximum stable, i.e., the stability condition is  < K , and that under FCFS the stability condition is  < `, where ` is the mean number of jobs in service in a a) b) Figure 5: W -model with fixed parameters p~ and  = 1: a) depicts the mean number of jobs under R B redundancy (), Bernoulli routing () and JSQ (), and b) depicts the stability regions  , and  . 19 a) b) Figure 6: Mean number of jobs in the system with respect to : a) Non-exponential service times and models N , W , WW , and redundancy-2 (K = 5), and redundancy-4 (K = 5) models. We chose ~ = (1; 2) for the N and W model, ~ = (1; 2; 4; 6) for the WW model, and ~ = (1; 2; 4; 6; 8) for redundancy-d. b) Markov modulated server capacities in the W , and redundancy- 2 (K = 5), and redundancy-4 (K = 5) models. so-called associated saturated system. In addition, it was shown that for this specific setting, the stability region under PS is smaller than under FCFS and ROS. In Figure 7 a) and b) we consider a W -model and compare the performance for the different policies PS, FCFS and ROS. We take exponentially distributed service times. We plot the mean number of jobs with respect to p , with p = 0:35 and p = 1p p . In Figure 7 a) f1;2g f1g f2g f1g f1;2g we set  = 1, and in Figure 7 b) we set  = 2. The stability condition under PS is given in Figure 4 c). In the case of ~ = (1; 2), we observe that FCFS always outperforms ROS. Intuitively we can explain this as follows. Since p is kept fixed, as p increases, the load in server 1 f1g f1;2g increases. With FCFS, it is more likely that both servers work on the same copy, and hence that the fast server 2 “helps” the slow server 1 (with high load). With ROS however, both servers tend to work on different copies, and the loaded slow server 1 will take a long time serving copies that could have been served faster in the fast server 2. On the other hand, with ~ = (2; 1) and sufficiently large p , ROS outperforms FCFS. In this case, the loaded server 1 is the fast server, f1;2g and hence having both servers working on the same copy becomes ineffecient, which explains that the performance under ROS becomes better. As a rule of thumb, it seems that for a redundancy model, if slow servers are highly loaded, then FCFS is preferable, but if fast servers are highly loaded, then ROS is preferable. From Figures 7 a) and b) we further observe that for all values of p , FCFS and ROS f1;2g outperform PS, and that the gap increases when  increases. In Figure 7 c) we consider exponential and bounded Pareto (with = 0:5 and q ~ = 15) service time distributions and plot the mean number of jobs for different values of  , when  = 1:5, p~ = (0:35; 0:4; 0:25) and  = 2. As 2 1 before, with exponentially distributed service times, FCFS and ROS slightly outperform PS. In the case where jobs have bounded Pareto distributed service times, PS outperforms both FCFS and ROS. This seems to indicate that as the variability of the service time distribution increases, PS might become a preferable choice over FCFS and ROS in redundancy systems. Additionally, under PS we observe that the mean number of jobs is nearly insensitive to the service time distribution. The main insight we obtain from Figure 7 is that the stability and performance of heteroge- neous redundancy systems strongly depends on the employed service policy in the servers. We leave the stability analysis of other scheduling policies (such as FCFS or ROS) for future work as they require a different proof approach. 20 c) p~ = (0:35; 0:40; 0:25) a)  = 1 b)  = 2 Figure 7: Mean number of jobs with redundancy combined with PS, FCFS, and ROS. a) and b) for the W model with respect to p and exponentially distributed service times, with p = 0:35 f1;2g f1g and p = 1 p p . a)  = 1, b)  = 2. c) For the W model under exponentially and f2g f1g f1;2g bounded Pareto ( = 0:5, q ~ = 15) distributed service times, and with respect to  , for  = 1:5, p~ = (0:35; 0:4; 0:25) and  = 2. 9 Conclusion With exponentially distributed jobs, and i.i.d. copies, it has been shown that redundancy does not reduce the stability region of a system, and that it improves the performance. This happens in spite of the fact that redundancy necessarily implies a waste of computation resources in servers that work on copies that are canceled before being fully served. The modeling assumptions play thus a crucial role, and as argued in several papers, e.g. [12], the i.i.d. assumption might lead to insights that are qualitatively wrong. In the present work, we consider the more realistic situation in which copies are identical, and the service times are generally distributed. We have shown that redundancy can help improve the performance in case the servers capacities are sufficiently heterogeneous. To the best of our knowl- edge, this is the first positive result on redundancy with identical copies, and it illustrates that the negative result proven in [3] critically depends on the fact that the capacities were homogeneous. We thus believe that our work opens the avenue for further research to understand when re- dundancy is beneficial in other settings. For instance, it would be interesting to investigate what happens in case servers implement other scheduling policies. It is also important to consider other cross-correlation structures for the copies, in particular the S&X model recently proposed in the literature. Another interesting situation is when the capacities of the servers fluctuate over time. Other possible extension is to consider the cancel-on-start variant of redundancy, in which as soon as one copy enters service, all the others are removed. For conciseness purposes, in this paper we have restricted ourselves to what we considered one of the most basic, yet interesting and relevant setting. Acknowledgments The PhD project of E. Anton is funded by the French “Agence Nationale de la Recherche (ANR)” [Project ANR-15-CE25-0004 (ANR JCJC RACON)]. This work (in particular, research visits of E. Anton and M. Jonckheere) was partially funded by a STIC AMSUD GENE project. U. Ayesta received funding from the Department of Education of the Basque Government through the Con- solidated Research Group MATHMODE (IT1294-19). 21 References [1] Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. 2013. Effective Strag- gler Mitigation: Attack of the Clones.. In NSDI, Vol. 13. 185–198. [2] Ganesh Ananthanarayanan, Srikanth Kandula, Albert G Greenberg, Ion Stoica, Yi Lu, Bikas Saha, and Edward Harris. 2010. Reining in the Outliers in Map-Reduce Clusters using Mantri.. In OSDI’10 Proceedings of the 9th USENIX conference on Operating systems design and implementation. 265–278. [3] Elene Anton, Urtzi Ayesta, Matthieu Jonckheere, and Ina Maria Verloop. 2020. On the stability of redundancy models. To appear in Operations Research (2020). [4] Soeren Asmussen. 2002. Applied Probability and Queues. Springer. [5] Thomas Bonald and Celine ´ Comte. 2017. Balanced fair resource sharing in computer clus- ters. Performance Evaluation 116 (2017), 70–83. [6] Maury Bramson. 2008. Stability of Queueing Networks. Springer. [7] James Cruise, Matthieu Jonckheere, and Seva Shneer. 2020. Stability of JSQ in queues with general server-job class compatibilities. Queueing Systems 95 (2020), 271–279. [8] Jim G. Dai. 1996. A fluid limit model criterion for instability of multiclass queueing net- works. The Annals of Applied Probability 6 (1996), 751–757. [9] Jeffrey Dean and Luiz Andre ´ Barroso. 2013. The tail at scale. Commun. ACM 56, 2 (2013), 74–80. [10] Regina Egorova. 2009. Sojourn time tails in processor-sharing systems, Technische Univer- siteit Eindhoven. Ph.D. Dissertation. [11] Sergey Foss, Dmitry Korshunov, and Stan Zachary. 2013. An introduction to heavy-tailed and subexponential distributions (2nd ed.). Springer. [12] Kristen Gardner, Mor Harchol-Balter, Alan Scheller-Wolf, and Benny van Houdt. 2017. A Better Model for Job Redundancy: Decoupling Server Slowdown and Job Size. IEEE/ACM Transactions on Networking 25, 6 (2017), 3353–3367. [13] Kristen Gardner, Esa Hyytia, ¨ and Rhonda Righter. 2019. A Little Redundancy Goes a Long Way: Convexity in Redundancy Systems. Performance Evaluation (2019) (2019). [14] Kristen Gardner, Samuel Zbarsky, Sherwin Doroudi, Mor Harchol-Balter, Esa Hyytia, ¨ and Alan Scheller-Wolf. 2016. Queueing with redundant requests: exact analysis. Queueing Systems 83, 3-4 (2016), 227–259. [15] H. Christian Gromoll, Philippe Robert, and Bert Zwart. 2008. Fluid Limits for Processor Sharing Queues with Impatience. Math. Oper. Res. 33 (05 2008), 375–402. [16] Mor Harchol-Balter. 2013. Performance Modeling and Design of Computer Systems: Queue- ing Theory in Action. Cambridge University Press. [17] Tim Hellemans, Tejas Bodas, and Benny van Houdt. 2019. Performance Analysis of Work- load Dependent Load Balancing Policies. POMACS 3, 2 (2019), 35:1–35:35. 22 [18] Tim Hellemans and Benny van Houdt. 2018. Analysis of redundancy(d) with identical Repli- cas. Performance Evaluation Review 46, 3 (2018), 1–6. [19] Gauri Joshi, Emina Soljanin, and Gregory Wornell. 2015. Queues with redundancy: Latency- cost analysis. ACM SIGMETRICS Performance Evaluation Review 43, 2 (2015), 54–56. [20] Ger Koole and Rhonda Righter. 2007. Resource allocation in grid computing. Journal of Scheduling (2007). [21] Rhonda Righter Kristen Gardner, Esa Hyytia. ¨ 2018. A little redundancy goes a long way: convexity in redundancy systems. Preprint submitted to Elsevier (2018). [22] Kangwook Lee, Ramtin Pedarsani, and Kannan Ramchandran. 2017. On scheduling redun- dant requests with cancellation overheads. IEEE/ACM Transactions on Networking (TON) 25, 2 (2017), 1279–1290. [23] Kangwook Lee, Nihar B. Shah, Longbo Huang, and Kannan Ramchandran. 2017. The mds queue: Analysing the latency performance of erasure codes. IEEE Transactions on Informa- tion Theory 63, 5 (2017), 2822–2842. [24] Nam H. Lee. 2008. A sufficient condition for stochastic stability of an Internet congestion control model in terms of fluid model stability, UC San Diego. Ph.D. Dissertation. [25] Sean Meyn and Richard Tweedie. 1993. Generalized resolvents and Harris recurrence of Markov processes. Contemp. Math. 149 (1993), 227–250. [26] Fernando Paganini, Ao Tang, Andres ´ Ferragut, and Lachlan Andrew. 2012. Network Sta- bility under Alpha Fair Bandwidth Allocation with General File Size Distribution. IEEE Transactions. on Automatic Control 57, 3 (2012), 579–591. [27] Youri Raaijmakers, Sem Borst, and Onno Boxma. 2019. Redundancy scheduling with scaled Bernoulli service requirements. Queueing Systems Volume 93 (2019). Issue 1-2. [28] Youri Raaijmakers, Sem Borst, and Onno Boxma. 2020. Stability of Redundancy Systems with Processor Sharing. In Proceedings of the 13th EAI International Conference on Perfor- mance Evaluation Methodologies and Tools (VALUETOOLS ’20). Association for Comput- ing Machinery, New York, NY, USA, 120–127. [29] Nihar B. Shah, Kangwook Lee, and Kannan Ramchandran. 2016. When do redundant re- quests reduce latency? IEEE Transactions on Communications 64, 2 (2016), 715–722. [30] Ashish Vulimiri, Philip Brighten Godfrey, Radhika Mittal, Justine Sherry, Sylvia Ratnasamy, and Scott Shenker. 2013. Low latency via redundancy. In Proceedings of the ACM conference on Emerging networking experiments and technologies. ACM, 283–294. APPENDIX A Proofs of Section 5 Proof of Corollary 3 Let us consider s 2 R. Let i be such that s 2 L , which is unique since fL g is a partition i i i=1 of R. We will show that for this s and i, it holds that CAR = . Hence, together with c:s2R(c) Corollary 2 this concludes the result. P P First, note that CAR = . Hence, we need to prove that p = p , i c c c:s2R(c) c2C (s) p i c2C (s) or equivalently,fc : s 2 R(c)g = C (s). For any c 2 C(s), R(c) = L (c) with l  i. We note that C (s) = C(s)nfc 2 C(s) : R(c) = l i L (c) with l < ig. Therefore, for s 2 L , C (s) = fc 2 C : s 2 c ; c 2 C ; s 2 L (c)g = fc 2 l i i i i C : s 2 R(c)g. The last equality holds by definition ofR(c). 2 Proof of Corollary 4. The stability condition of such a system is given by Corollary 2. We note that each server K1 s 2 S receives C(s) = different job types, that is, by fixing a copy in server s, all possible d1 ( ) combinations of d 1 servers out of K 1. Thus, L = arg max f  g = K , S = 1 s2S s 2 1 K1 ( ) d1 K1 ( ) d1 S fKg and condition  <  . K K ( ) jS j1 We note each server s 2 S receives different job types, for i = 1; : : : ; i and thus, the d1 maximum capacity-to-fraction-of-arrivals ratio in the subsystem with servers S , only depends on the capacities of servers in S , that isL = arg max f g. Additionally since,  < : : : <  , i i s2S s 1 K one obtains that L = K i + 1, for i = 1; : : : ; K d + 1. The associated conditions are Ki+1 ( ) d1 <  for i = 1; : : : ; K i + 1. This set of conditions is equivalent to that in K Ki+1 ( ) Corollary 4. 2 B Proofs of Section 7 We first introduce some notation: We denote by E (t) = maxfj : U < tg the number of c cj type-c jobs that arrived during the time interval (0; t) and by U the instant of time at which the cj jth type-c job arrived to the system. We recall that b denotes its service realization. We denote cj by b the residual job size of the mth eldest type-c job in server s that is already in service at cms time 0. Sufficient stability condition Proof of Proposition 10 We now prove the stability of the UB system. For that, we first describe the dynamics of the UB number of type-c jobs in the UB system, denoted by N (t). We recall that a type-c job departs min only when all the copies in the set of servers R(c) are completely served. We let  (v; t) = R(c) min f (v; t)g be the minimum cumulative amount of capacity received by a copy in one of s~ s~2R(c) its serversR(c) during the interval (v; t). Therefore, UB N (0) E (t) X X UB 0 min N (t) = 1 f9s ~2 R(c) : b >  (0; t)g + 1 b >  (U ; t) : s~ cj cj c cms~ R(c) m=1 j=1 UB We denote the number of type-c copies in server s by M (t). We note that for a type-c job s;c in server s there are two possibilities: • if s 2 R(c), the copy of the type-c job leaves the server as soon as it is completely served. The cumulative amount of capacity that the copy receives during (v; t) is  (v; t). 24 • If s 2= R(c), the copy of the type-c job in server s leaves the system either if it is completely served or if all copies of this type-c job in the serversR(c) are served. We note that for any s ~2 R(c), s ~2 L , with l < i. Hence, the number of type-c jobs in server s 2 L is given by the following expression. If s 2 R(c), UB M (0) E (t) s;c c X X UB 0 M (t) = 1 b >  (0; t) + 1 (b >  (U ; t)) s cj s cj s;c cms m=1 j=1 and if s 2= R(c), UB M (0) s;c UB 0 0 M (t) = 1 f9s ~2 R(c) : b >  (0; t)g\ b >  (0; t) s~ s s;c cms~ cms m=1 E (t) + 1 b >  (U ; t) ; cj cj R(c);s j=1 min where  (v; t) = maxf (v; t);  (v; t)g. The first terms in both equations correspond to R(c);s R(c) the type-c jobs that where already in the system by time t = 0, the second terms correspond to the type-c jobs that arrived during the time interval (0; t). In the following we obtain the number of copies per server. Before doing so, we need to introduce some additional notation. Let D (s) = fc 2 C(s) : R(c)  L (c)g be the set of types in server s for which the set of servers where these types receive maximum capacity-to-fraction- l l i of-arrivals ratio isR(c)  L (c). If s 2 L , then, by definition,D (s) 6= ; if l  i andfD (s)g l i l=1 forms a partition of C(s). Furthermore, D (s) = C (s), for all s 2 L . Therefore, for a server i i s 2 L , the number of copies in the server is given by the following expression: i1 X X X X UB UB UB UB M (s) = M (t) = M (t) + M (t): s s;c s;c s;c l=1 c2C(s) c2D (s) c2C (s) The first term of the RHS of the equation corresponds to the type-c jobs in server s that have R(c)  L (c). The second term of the RHS corresponds to type-c jobs in server s that have UB UB R(c)  L (c). Parti-cularly, we note that in the UB system, M (t)  N (t), since s c c2C(s) copies might have left, while the job is still present. In order to prove the stability condition, we investigate the fluid-scaled system. The fluid- scaling consists in studying the rescaled sequence of systems indexed by parameter r. For r > 0, UB;r UB UB denote by M (t) the system where the initial state satisfies M (0) = rm (0), for all c 2 C c;s s;c s;c and s 2 S. We define, UB UB M (rt) M (rt) s;c UB;r UB;r s M (t) = ; and M (t) = s;c s r r In the following, we give the characterization of the fluid model. UB Definition 2. Non-negative continuous functions m () are a fluid model solution if they satisfy the functional equations i1 X X UB UB m (t) = m (0) 1 G   (0; t) + p 1 F   (x; t) dx R(c);s c R(c);s s s;c x=0 l=1 c2D (s) UB + m (0) (1 G(  (0; t))) + p (1 F (  (x; t)))dx ; (7) s c s s;c x=0 c2C (s) for s 2 L and i = 1; : : : ; i , where G() is the distribution of the remaining service requirements, F () the service time distribution of arriving jobs, and UB (v; t) =  (m ~ (x))dx; s s x=v min (v; t) = min f  (v; t)g; s~ R(c) s~2R(c) min (v; t) = maxf  (v; t);   (v; t)g: R(c);s s R(c) The existence and convergence of the fluid limit to the fluid model can now be proved. UB;r Proposition 16. The limit point of any convergent subsequence of (M (t); t  0) is almost surely a solution of the fluid model (7). Proof of Proposition 16 The proof is identical to the the proof of Theorem 5.2.1 in [10] (which min is itself based on Lemma 5 in [15]). We only need to ensure that   (v; t) and   (v; t) are R(c);s R(c) decreasing in v and continuous on v 2 [ (t) +; t], where (t) = sup(v 2 [0; t] : m (u) = 0). s s s min Let us verify that   (v; t) and   (v; t) are decreasing and continuous on v. We note R(c);s R(c) that the function  (; t) that gives the cumulative service that a copy in server s received during time interval (; t), is a Lipschitz continuous function, increasing for t <  and non decreasing for t >  , where  = infft > 0 : M (t) = 0g. s s s min If   (v; t) =   (v; t) and   (v; t) =   (v; t) for all v 2 [0; t) and some s ; s 2 S, s R(c);s s 1 2 1 2 R(c) min then both   (v; t) and   (v; t) are decreasing and continuous on v, since by definition R(c);s R(c) (v; t) is decreasing and continuous on v for all s 2 S. min Let us assume that for v 2 [0; t) is such that   (v; t) =   1 (v; t) for v  v and 0 0 s~ R(c) min 1 2 min (v ; t) =   2 (v ; t), for some s ~ ; s ~ 2 R(c). We first verify that   (v; t) is continu- s~ R(c) 0 R(c) ous on v = v . Since,   (v; t) and   (v; t) are continuous on v = v , then 1 2 0 0 s~ s~ min min lim   (x; t) =   1 (v; t) =   2 (v; t) = lim   (x; t): s~ s~ R(c) R(c) x !v x !v 0 0 min Therefore, we conclude that   (x; t) is continuous on v 2 [0; t). Analogously, one can verify R(c) min that   (x; t) is continuous on v 2 [0; t). R(c);s min We now verify that   (x; t) is decreasing on v 2 [0; t). Let us consider 0 < t < v < 1 0 R(c) min t < t. Then for   (v; t), R(c) min min (t ; t) =   1 (t ; t)    1 (t ; t)    2 (t ; t) =   (t ; t); 1 1 2 2 2 s~ s~ s~ R(c) R(c) min where the first inequality holds since   1 (v; t) is decreasing on v. We conclude that   (v; t) is s~ R(c) decreasing v. Let us verify that   (v; t) is decreasing on v. W.l.o.g. we assume that there exists v 2 R(c);s 0 min [0; t), such that   (v; t) =   (v; t) for v < v and   (v; t) =   (v; t) for t > v > v . 0 s 0 R(c);s R(c);s R(c) Then, min (t ; t) =   (t ; t)    (t ; t)    (t ; t) =   (t ; t) 1 1 s 1 s 2 2 R(c);s R(c);s R(c) where the first inequality holds since   1 (v; t) is decreasing on v. We conclude that   (x; t) s~ R(c);s is decreasing v. 2 We now give a further characterization of the fluid model (7). 26 P Proposition 17. Let i  i and assume  p <  for all l  i 1 and s 2 L . Then, c s l c2C (s) i1 UB there is a time T  0, such that for t  T and for s 2 [ L , m (t) = 0 and for s 2 L l i l=1 UB UB m (t) = m (0) (1 G(  (0; t))) + p (1 F (  (x; t)))dx ; (8) s c s;i s s;c x=0 c2C (s) with (v; t) :=  (m ~ (x))dx; s;i s;i x=v and  (m ~ (x)) := . s;i m (x) s;c c2C (s) Proof of Proposition 17 For simplicity in notation, we remove the superscript UB throughout the proof. First assume s 2 L . SinceD = ;, from Equation (7), we directly obtain m (t) = [m (0) (1 G(  (0; t))) + p (1 F (  (x; t)))dx] ; 8t > 0: s s;c s c s x=0 c2C (s) This expression coincides with the fluid limit of an M=G=1 PS queue with arrival rate P P p and server speed  . Since  p <  , we know that there exists a   such c s c s s c2C (s) c2C (s) 1 1 that m (t) = 0, for all t    . s s The remainder of the proof is by induction. Consider now a server s 2 L and assume there l1 ~ ~ ~ exists a time T such that m (t) = 0, for all t  T and s 2 [ L . Thus, for t  T , also s j j=1 l1 j m (t) = 0 for all s 2 [ L , c 2 D (s), j = 1; : : : ; l 1. We consider server s 2 L . From (7) s;c j l j=1 its drift is then given by: l1 X X X m (t) = m (t) + m (t) s s;c s;c j=1 c2C (s) c2D (s) = m (0) (1 G(  (0; t))) + p (1 F (  (x; t)))dx ; s;c s c s x=0 c2C (s) s s for all t  T . Now note that  (m ~ (t)) = = =  (m ~ (t)), where the s s;l m (t) m (t) s s;c c2C (s) l1 second equality follows from the fact that m (t) = 0 for all for all s 2 [ L , c 2 D (s), s;c j j=1 j = 1; : : : ; l 1. To finish the proof, (8) coincides with the fluid limit of an M=G=1 system with PS, arrival rate p and server speed  . Hence, if l < i, the standard PS queue is stable, and we are c s c2C (s) sure that it equals and remains zero in finite time. 2 Below we prove that the UB system is Harris recurrent. Note that the concept of Harris recurrence is needed here since the state space is obviously not countable, (as we need to keep track of residual service times). We first establish the fluid stability, that is, the fluid model is 0 in finite time. The latter is useful, as we can use the results of [24] that establish that under some suitable conditions, fluid stability implies Harris recurrency, see the lemma below. Lemma 18. If the fluid limit is fluid stable, then the stochastic system is Harris recurrent. Proof of Lemma 18 In [24], the authors consider bandwidth sharing networks (with processor sharing policies), and show that under mild conditions, the stability of the fluid model (describing 27 the Markov process of the number of per-class customers with their residual job sizes) is sufficient for stability (positive Harris recurrence). Our system, though slightly different from theirs satisfies the same assumptions, and as a consequence their results are directly applicable to our model. More precisely, given the assumptions on the service time distribution, our model satisfies the assumptions given in [24, Section 2.2] for inter-arrival times and job-sizes. (In particular exponential inter-arrival times satisfy the conditions given in [24, Assumption 2.2.2].) 2 Equation (8) coincides with the fluid limit of an M=G=1 PS system with arrival rate P P p and server speed  . If  < CAR ; or equivalently  p <  , for all c s l c s c2C (s) c2C (s) i i l = 1; : : : ; i; Equation (8) equals zero in finite time. Hence, from Lemma 18 we conclude that for servers s 2 L , the associated stochastic number of copies in server s is Harris recurrent, as stated in the corollary below. 2 Proof of Proposition 11 We assume that both systems are coupled as follows: at time t = 0, both systems start at the UB UB same initial state N (0) = N (0) and a (0) = a (0) for all c; j; s. Arrivals and service c cjs cjs times are also coupled. For simplicity in notation, we assume that when in the original system a type-c copy reaches its service requirement b, the attained service of its d 1 additional copies is fixed to b and the job remains in the system until the copy of that same job in the UB system is fully served at all servers inR(c). We prove this result by induction on t. It holds at time t = 0. We assume that for u  t it UB UB holds that N (t)  N (t) and a (t)  a (t) for all c; j; s. We show that this inequality c cjs c cjs holds for t . UB We first assume that at time t, it holds that N (t) = N (t) for some c 2 C. The inequality is violated only if there is a job for which the copy in the UB system is fully served at all servers R(c), but none of the copies in the original system is completed. That means, there exist a j such UB that a (t) < a (t) = b for all s 2 c. However, this can not happen, since by hypothesis cjs j cjR(c) UB a (t)  a (t) for all s 2 c. cjs cjs UB We now assume that at time t, a (t) = a (t) for some c; j; s. There are now two cases. If cjs cjs this copy (and job) has already left in the original system, then a (t) = a (t ) = b and hence cjs cjs cj + UB + a (t )  a (t ). If instead the copy has not left in the original system, then by hypothesis it cjs cjs UB UB s s holds that N (t)  N (t) and thus, M (t)  M (t) and  . That means that c s UB c s M (t) M (t) the copy in the original system has a higher service rate at time t than the same copy in the UB + UB + system. Hence, a (t )  a (t ). 2 cjs cjs Necessary stability condition Proof of Proposition 13 In order to show that the LB system is unstable, we investigate the fluid-scaled system. For LB;r LB LB r > 0, denote by N (t) the system where the initial state satisfies N (0) = rn (0), for all c c c 2 C. We write for the fluid-scaled number of jobs per type LB N (rt) LB;r c N (t) = : In the following we give the characterization of the fluid model. 28 LB Definition 3. Non-negative continuous functions n () are a fluid model solution if they satisfy the functional equations LB n (t) = 0; c 2 CnC LB LB LB LB n (t) = n (0) 1 G   (0; t) + p 1 F   (x; t) dx ; c 2 C c c c c x=0 where G() is the distribution of the remaining service requirements of initial jobs, F () the service time distribution of arriving jobs and LB LB LB (v; t) =  (~n (x))dx; with c 2 C : c c x=v LB;r The existence and convergence of the fluid-scaled number of jobs N (t) to the fluid model LB ~n (t) can be proved as before. The statement of Proposition 16, indeed directly translates to the LB;r LB process N (t), since  (v; t) is both decreasing and continuous in v. Therefore, it is left out. LB(t) LB LB Next, we characterize the fluid model solution ~n in terms of m (t) = n (t). s c2C(s) c LB LB We show that if the initial condition for all servers is such that m (0)= = (0) for all s 2 S , s s LB LB then m (t)= = (t) for all s 2 S , where (t) is given below. s s LB Lemma 19. Let us assume that the initial condition is such that n (0) = 0 for all c 2 CnC and LB LB LB for c 2 C , n (0) are such that m (0)= = (0) for all s 2 S . Let c s s LB LB (t) = (0)(1 G(  (0; t))) + (1 F (  (x; t)))dx; (9) CAR x=0 LB LB LB where   (v; t) =  ( (x))dx, with  ( (t)) = . x=v (t) LB Then, n (t) = 0 for all t  0 and c 2 CnC , and LB LB m (t)= = (t); s s for all t  0 and s 2 S . Proof of Lemma 19 From Definition 3, we obtain that for each server s 2 S , LB m (t) 1 s LB = n (t) LB LB s s c2C (s) LB t n (0) p c LB LB = 1 G   (0; t) + 1 F   (x; t) dx : c c LB LB s s x=0 c2C (s) We recall that (t) is defined as LB LB (t) = (0) 1 G   (0; t) + 1 F   (x; t) dx : CAR x=0 LB m (0) We let the initial condition be such that = (0) for all s 2 S and we will prove by LB contradiction that for all t > 0, LB m (t) = (t); for all s 2 S . LB 29 LB LB Let us assume that t is the first time such that there exists s ~2 S such that (t ) 6= m (t )= . 0  0 0 s~ s~ P LB LB P n (0) m (0) p c s Since = = (0) and = 1=CAR ; this implies that there LB LB LB c2C (s) c2C (s) s s s LB LB exist c ~ 2 C and t , 0  v  t < t such that   (v; t ) 6=   (v; t ). However, since 1 1 0 1 1 c~ LB LB (t) = m (t)= for all t < t , this implies that  (~n(t)) = 1= (t) for all t < t and 0 c 0 s s LB LB c 2 C(s), and hence   (v; t) =   (v; t), for all t < t . We have hence reached a contradiction, c~ which concludes the proof. 2 We note that Equation (9) corresponds to the fluid limit of an M=G=1 system with PS, arrival rate =CAR and server speed 1. Assuming  > CAR , it follows that the fluid limit (t), and LB hence m (t); s 2 S , diverges. Now, by using similar arguments as in Dai [8], the fact that the limit diverges implies that the correspon-ding stochastic process can not be tight, and hence cannot be stable. 2 Proof of Proposition 14 LB The number of type-c jobs in the system is given by N (t) = 0; for c 2 CnC , and for c 2 C , 2 3 LB N (0) E (t) X X LB 0 LB LB 4 5 N (t) = 1 b >  (0; t) + 1 b >  (U ; t) : cj cj c cms c c m=1 j=1 LB We note that for all c 2 CnC the result is direct since p = 0 for all c 2 CnC . Then, let us LB LB ~ ~ ~ ~ consider c 2 C . For any N and N such that N  N , the following inequalities hold: P P P ( p ) ( p ) =( p ) s c c s c c2C (s) c2C (s) c2C (s) P P P (N ) = = = M ( p )M N + N s c s c c c2C (s) c2C(s)nC (s) c2C (s) P P P LB ( p ) =( p ) CAR  ( p ) c s c  c c2C (s) c2C (s) c2C (s) P P max LB LB N N s2c M c s c2C (s) c2C (s) LB LB =  (N ): LB The second last inequality holds since CAR   =( p ) for all s 2 S and N  N s c  c c2C (s) LB LB for all c 2 C . We note that N = M (t). It follows from straight forward sample- c s c2C (s) LB path arguments that N (t)  N (t) for all t  0 and c 2 C . 2 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Mathematics arXiv (Cornell University)

Improving the performance of heterogeneous data centers through redundancy

Mathematics , Volume 2020 (2003) – Mar 3, 2020

Loading next page...
 
/lp/arxiv-cornell-university/improving-the-performance-of-heterogeneous-data-centers-through-BBitJWhcGW
ISSN
2476-1249
eISSN
ARCH-3343
DOI
10.1145/3428333
Publisher site
See Article on Publisher Site

Abstract

We analyze the performance of redundancy in a multi-type job and multi-type server sys- tem. We assume the job dispatcher is unaware of the servers’ capacities, and we set out to study under which circumstances redundancy improves the performance. With redundancy an arriving job dispatches redundant copies to all its compatible servers, and departs as soon as one of its copies completes service. As a benchmark comparison, we take the non-redundant system in which a job arrival is routed to only one randomly selected compatible server. Ser- vice times are generally distributed and all copies of a job are identical, i.e., have the same service requirement. In our first main result, we characterize the sufficient and necessary stability conditions of the redundancy system. This condition coincides with that of a system where each job type only dispatches copies into its least-loaded servers, and those copies need to be fully served. In our second result, we compare the stability regions of the system under redundancy to that of no redundancy. We show that if the server’s capacities are sufficiently heterogeneous, the stability region under redundancy can be much larger than that without redundancy. We apply the general solution to particular classes of systems, including redundancy-d and nested models, to derive simple conditions on the degree of heterogeneity required for redundancy to improve the stability. As such, our result is the first in showing that redundancy can improve the stability and hence performance of a system when copies are non-i.i.d.. Key words: redundancy models; load balancing; stochastic stability; processor sharing. 1 Introduction The main motivation of studying redundancy models comes from the fact that both empirical ([1, 2, 9, 30]) and theoretical ([12, 14, 19, 22, 23, 29]) evidence show that redundancy might improve the performance of real-world applications. Under redundancy, a job that arrives to the system dispatches multiple copies into the servers, and departs when a first copy completes service. By allowing for redundant copies, the aim is to minimize the latency of the system by exploiting the variability in the queue lengths and the capacity of the different servers. Most of the theoretical results on redundancy systems consider the performance analysis when either FCFS or Processor-Sharing (PS) service policies are implemented in the servers. Under the arXiv:2003.01394v2 [cs.NI] 15 Dec 2020 assumption that all the copies of a job are i.i.d. (independent and identically distributed) and ex- ponentially distributed, [3, 5, 14] show that the stability condition of the system is independent of the number of redundant copies and that performance (in terms of delay and number of jobs in the system) improves as the number of copies increases. However, [12] showed that the assumption that copies of a job are i.i.d. can be unrealistic, and that it might lead to theoretical results that do not reflect the results of replication schemes in real-life computer systems. The latter has triggered interest to consider other modeling assumptions for the correlation structure of the copies of a job. For example, for identical copies (all the copies of a job have the same size), [3] showed that under both FCFS and PS service policies, the stability region of the system with homogeneous servers decreases as the number of copies increases. The above observation provides the motivation for our study: to understand when redundancy is beneficial. In order to do so, we analyze a general multi-type job and multi-type server system. A dispatcher needs to decide to which server(s) to route each incoming job. We assume that there is no signaling between the dispatcher and the servers, that is, the dispatcher is oblivious to the capacities of the servers and unaware of the states of the queues. The latter can be motivated by (i) design constraints, (ii) (slowly) fluctuating capacity of a server due to external users, or (iii) the impossibility of exchanging information among dispatchers and servers. The only information that is available to the dispatcher is the type of job and its set of compatible servers. However, we do allow signaling between/among servers, which is needed in order to cancel the copies in redundancy schemes. In the mathematical analysis we consider two different models: the redundancy model where the dispatcher sends a copy to all the compatible servers of the job type, and the Bernoulli model where a single copy is send to a uniformly selected compatible server of the job type. From a dis- patchers viewpoint, the comparison between these two policies is reasonable under the assumption that the dispatcher only knows the type of the job and the set of its compatible servers. Hence, we do not compare analytically the performance of redundancy with other routing policies – such as Join the Shortest Queue, Join the Idle Server, Power of d, etc. – that have more information on the state of the system. We hence aim to understand when having redundant copies is beneficial for the performance of the system in this context. Observe that the answer is not clear upfront as adding redundant copies has two opposite effects: on the one hand, redundancy helps exploiting the variability across servers’ capacities, but on the other hand, it induces a waste of resources as servers work on copies that do not end up being completely served. To answer the above question, we analyze the stability of an arbitrary multi-type job and multi- type server system with redundancy. Job service requirements are generally distributed, and copies are identical. The scheduling discipline implemented by servers is PS, which is a common policy in server farms and web servers, see for example [16, Chapter 24]. In our main result, we derive sufficient and necessary stability conditions for the redundancy system. This general result allows us to characterize when redundancy can increase the stability region with respect to Bernoulli routing. To the best of our knowledge, our analytical results are the first showing that, when copies are non-i.i.d., adding redundancy to the system can be beneficial from the stability point of view. We believe that our result can motivate further research in order to thoroughly understand when redundancy is beneficial in other settings. For example, for different scheduling disciplines, dif- ferent correlation structures among copies, different redundancy schemes, etc. In Section 8 we investigate through numerics some of these issues, namely, the performance of redundancy when the scheduling discipline is FCFS and Random Order of Service (ROS), and the performance gap between redundancy and a variant of Join the Shortest Queue policy according to which each job is dispatched to the compatible server that has the least number of jobs. We briefly summarize the main findings of the paper: 2 • The characterization of sufficient and necessary stability conditions of any general redun- dancy system with heterogeneous server capacities and arrivals, under mild assumptions on the service time distribution. • We prove that when servers are heterogeneous enough (conditions stated in Section 6), redundancy has a larger stability region than Bernoulli. • By exploring numerically these conditions, we observe that the degree of heterogeneity needed in the servers for redundancy to be better, decreases in the number of servers, and increases in the number of redundant copies. The rest of the paper is organized as follows. In Section 2 we discuss related work. Section 3 describes the model, and introduces the notion of capacity-to-fraction-of-arrivals ratio that plays a key role in the stability result. Section 4 gives an illustrative example in order to obtain intuition about the structure of the stability conditions. Section 5 states the stability condition for the re- dundancy model. Section 6 provides conditions on the heterogeneity of the system under which redundancy outperforms Bernoulli. The proof of the main result is given in Section 7. Simulations are given in Section 8, and concluding remarks are given in Section 9. For the sake of readability, proofs are deferred to the Appendix. 2 Related work When copies of a job are i.i.d. and exponentially distributed, [5, 14] have shown that redundancy with FCFS employed in the servers does not reduce the stability region of the system. In this case, the stability condition is that for any subset of job types, the sum of the arrival rates must be smaller than the sum of service rates associated with these job types. In [27], the authors consider i.i.d. copies with highly variable service time distributions. They focus on redundancy-d systems where each job chooses a subset of d homogeneous servers uniformly at random. The authors show that with FCFS, the stability region increases (without bound) in both the number of copies, d, and in the parameter that describes the variability in service times. In [20], the authors investigate when it is optimal to replicate a job. They show that for so- called New-Worse-Than-Used service time distributions, the best policy is to replicate as much as possible. In [13], the authors investigate the impact that scheduling policies have on the performance of so-called nested redundancy systems with i.i.d. copies. The authors show that when FCFS is implemented, the performance might not improve as the number of redundant copies increases, while under other policies proposed in the paper, such as Least-redundant-first or Primaries-first, the performance improves as the number of copies increases. Anton et al. [3] study the stability conditions when the scheduling policies PS, Random Order of Service (ROS) or FCFS are implemented. For the redundancy-d model with homogeneous server capacities and i.i.d. copies, they show that the stability region is not reduced if either PS or Random Order of Service (ROS) is implemented. When instead copies belonging to one job are identical, [3] showed that (i) ROS does not reduce the stability region, (ii) FCFS reduces the stability region and (iii) PS dramatically reduces the stability region, and this coincides with the stability region of a system where all copies need to be fully served, i.e.,  < . In [28], the authors show that the stability result for PS in a homogeneous redundancy-d system with identical copies extends to generally distributed service times. In the present paper, we extend [3, 28] by characterizing the stability condition under PS with identical copies to the general setting of heterogeneous servers, generally distributed service times, and arbitrary redundancy structures. Hellemans et al. [18] consider identical copies that are generally distributed. For a redundancy- d model with FCFS, they develop a numerical method to compute the workload and response time 3 Table 1: The stability condition of redundancy models under different modeling assumptions. In bold square, the modeling assumptions we consider for the present paper. Service time Homogeneous servers Heterogeneous servers distribution i.i.d. copies identical copies i.i.d. copies identical copies Exponential General red., [14] Redundancy-d, [3] General red.,[14] FCFS Redundancy-d, [27] Scaled Bernoulli (Asymptotic regime) Exponential Redundancy-d, [3] Redundancy-d, [3] PS Redundancy-d, [28] Redundancy-d, [28] General red. General (Necessary condition) (Light-tailed) Exponential Redundancy-d, [3] Redundancy-d, [3] ROS distribution when the number of servers tends to infinity, i.e., the mean-field regime. The authors can numerically infer whether the system is stable, but do not provide any characterization of the stability region. In a recent paper, Hellemans et al. [17] extend this study to include many replication policies, and general correlation structure among the copies. Gardner at al. [12] introduce a new dependency structure among the copies of a job, the S&X model. The service time of each copy of a job is decoupled into two components: one related to the inherent job size of the task, that is identical for all the copies of a job, and the other one related to the server’s slowdown, which is independent among all copies. The paper proposes and analyzes the redundant-to-idle-queue scheme with homogeneous servers, and proves that it is stable, and performs well. In Table 1 we summarize the stability results presented above, organized by service policy, service time distribution, servers’ capacities and redundancy correlation structure. In brackets we specify the additional assumptions that the authors considered in their respective paper. In the bold square, we outline the modeling assumptions we consider for the present paper. To the best of our knowledge, no analytical results were obtained so far for performance measures when PS is implemented, servers are heterogeneous and copies are identical or of any other non i.i.d. structure. 3 Model description We consider a K parallel-server system with heterogeneous capacities  , for k = 1; : : : ; K . Each server has its own queue, where Processor Sharing (PS) service policy is implemented. We denote by S = f1; : : : ; Kg the set of all servers. Jobs arrive to the system according to a Poison process of rate . Each job is labelled with a type c that represents the subset of compatible servers to which type-c jobs can be sent: i.e., c = fs ; : : : s g, where n  K , s ; : : : ; s 2 S and s 6= s , for all i 6= l. A job is with 1 n 1 n i probability p of type c, where p = 1. We denote by C the set of all types in the system, c c c2C i.e, C = fc 2 P (S) : p > 0g, where P (S) contains all the possible subsets of S. Furthermore, we denote byC(s) the subset of types that have server s as compatible server, that is,C(s) = fc 2 C : s 2 cg. For instance, the N -model is a two-server system with jobs of types c = f2g and c = f1; 2g, see Figure 1 b). Thus, C = ff2g;f1; 2gg, C(1) = ff1; 2gg and C(2) = ff2g;f1; 2gg, with p ; p > 0. f2g f1;2g Job sizes are distributed according to a general random variable X with cumulative distribution function F and unit mean. Additionally, we assume that 1. F has no atoms. 4 2. F is a light tailed distribution in the following sense, lim supE[(X a)1 jX > a] = 0: (1) fXa>rg r!1 a0 Remark 1. These technical conditions have been used previously in the literature to prove stochas- tic stability from fluid limits arguments (see [24] and [26]) in the context of processor sharing net- works and cannot be avoided easily. However, it can be seen (as observed in [26]) that Equation (1) also implies supE[(X a)jX > a]   < 1; (2) a0 which is a usual light tail condition (see [11]). Hence, Equations (1) and (2) though exclude heavy tail distributions like Pareto, include large sets of distributions as phase type (which are dense in the set of all distributions on R ), distributions with bounded support, exponential and hyper-exponential distributions. We consider two load balancing policies, which determine how the jobs are dispatched to the servers. Note that both load balancers are oblivious to the capacities of the servers. • Bernoulli routing: a type-c job is send with uniform probability to one of its compatible servers in c. • Redundancy model: a type-c job sends identical copies to its jcj compatible servers. That is, all the copies of a job have exactly the same size. The job (and corresponding copies) departs the system when one of its copies completes service. In this paper, we will study the stability condition under both load balancing policies. We call the system stable when the underlying process is positive Harris recurrent, and unstable when the process is transient. A stochastic process is positive Harris recurrent if there exists a petite-set C for which P ( < 1) = 1 where  is the stopping time of C , see e.g., [4, 6, 25] for the C C corresponding definitions. We note that when the state descriptor is Markovian, positive Harris recurrent is equivalent to positive recurrent. R R We define  as the value of  such that the redundancy model is stable if  <  and unstable R B if  >  . Similarly, we define  for the Bernoulli routing system. We aim to characterize R B when  >  , that is, when does redundancy improve the stability condition compared to no redundancy. For Bernoulli,  can be easily found. Under Bernoulli routing, a job chooses a server uni- formly at random, hence, type-c jobs arrive at server s at rate p =jcj. Thus, the Bernoulli system reduces to K independent servers, where server s receives arrivals at rate ( ) and has c2C(s) jcj a departure rate  , for all s 2 S. The stability condition is hence, ( ) <  = min : (3) s2S c2C(s) jcj In order to characterize  , we need to study the system under redundancy in more detail. For that, we denote by N (t) the number of type-c distinct jobs that are present in the redundancy system at time t and N (t) = (N (t); c 2 C). Furthermore, we denote the number of copies per server by M (t) := N (t), s 2 S, and M (t) = (M (t); : : : ; M (t)). For the j- s c 1 K c2C(s) th type-c job, let b denote the service requirement of this job, for j = 1; : : : ; N (t), c 2 C. cj c Let a (t) denote the attained service in server s of the j-th type-c job at time t. We denote by cjs A (t) = (a (t)) a matrix on R of dimension N (t)jcj. Note that the number of type-c jobs c cjs js + c increases by one at rate p , which implies that a row composed of zeros is added to A (t). When c c 1 2 3 4 1 2 1 2 1 2 3 4 a) b) c) d) Figure 1: From left to right, the redundancy-d model (for K = 4 and d = 2), the N -model, the W -model and the WW -model. one element a (t) in matrix A (t) reaches the required service b , the corresponding job departs cjs c cj and all of its copies are removed from the system. Hence, row j in matrix A (t) is removed. We ~ ~ further let  (M (t)) be the capacity that each of the copies in server s obtains when in state M (t), which under PS is given by,  (M (t)) := . The cumulative service that a copy in server s M (t) gets during the time interval (v; t) is (v; t) :=  (M (x))dx: s s x=v In order to characterize the stability condition, we define the capacity-to-fraction-of-arrivals ratio of a server in a subsystem: Definition 1 (Capacity-to-fraction-of-arrival ratio). For any given set of servers S  S and its ~ ~ associated set of job types C = fc 2 C : c  Sg, the capacity-to-fraction-of-arrival ratio of ~ ~ P ~ ~ server s 2 S in this so-called S-subsystem is defined by ; where C(s) = C \C(s) is the c2C(s) subset of types inC that are served in server s. Some common models A well-known structure is the redundancy-d model, see Figure 1 a). Within this model, each job has d out of K compatible servers, where d is fixed. That is, p > 0 for all c 2 P (S) withjcj = d, K K and p = 0 otherwise, so that there are jCj = types of jobs. If additionally, p = 1= c c d d for all c 2 C, we say that the arrival process of jobs is homogeneously distributed over types. We will call this model the redundancy-d model with homogeneous arrivals. The particular case where server capacities are also homogeneous, i.e.,  =  for all k = 1; : : : ; K; will be called the redundancy-d model with homogeneous arrivals and server capacities. 0 0 In [21] the nested redundancy model was introduced, where for all c; c 2 C, either i) c  c or 0 0 ii) c  c or iii) c\ c = ;. First of all, note that the redundancy-d model does not fit in the nested structure. The smallest nested system is the so called N -model (Figure 1 b)): this is a K = 2 server system with types C = ff2g;f1; 2gg. Another nested system is the W -model (Figure 1 c)), that is, K = 2 servers and types C = ff1g;f2g;f1; 2gg. In Figure 1 d), a nested model with K = 4 servers and 7 different jobs types, C = ff1g;f2g;f3g;f4g; f1; 2g;f3; 4g;f1; 2; 3; 4gg is given. This model is referred to as the WW -model. 4 An illustrative example Before formally stating the main results in Section 5.1, we first illustrate through a numerical example some of the key aspects of our proof, and in particular the essential role played by the 6 a)  = 1:8 b)  = 2:1 c)  = 7:5 d)  = 9 Figure 2: Trajectory of the number of copies per server with respect to time for a K = 4 redundancy-2 system with exponentially distributed job sizes. Figures a) and b) consider ho- mogeneous capacities  = 1 for k = 1; : : : ; 4 and homogeneous arrival rates per type, p = 1=6 for all c 2 C, with a)  = 1:8 and b)  = 2:1. Figures c) and d) consider heterogeneous server capacities ~ = (1; 2; 4; 5) and arrival rates per type p~ = (0:25; 0:1; 0:1; 0:2; 0:2; 0:15) for typesC, c) with  = 7:5 and d)  = 9. capacity-to-fraction-of-arrival ratio defined in Definition 1. In Figure 2 we plot the trajectories of the number of copies per server with respect to time for a K = 4 redundancy-2 system (Figure 1 a)), that is C = ff1; 2g;f1; 3g;f1; 4g;f2; 3g;f2; 4g;f3; 4gg. Our proof techniques will rely on fluid limits, and therefore we chose large initial points. Figures 2 a) and b) show the trajectories when servers and arrivals of types are homogeneous for  = 1:8 and  = 2:1, respectively. Figures 2 c) and d) consider a heterogeneous system (parameters see the legend) for  = 7:5 and = 9, respectively. The homogeneous example (Figure 2 a) and b)) falls within the scope of [3]. There it is shown that the stability condition is  < . We note that this condition coincides with the stability condition of a system in which all the d copies need to be fully served. In Figure 2 a) and b), the value for  is chosen such that they represent a stable and an unstable system, respectively. As formally proved in [3], at the fluid scale, when the system is stable the largest queue length decreases, whereas in the unstable case the minimum queue length increases. It thus follows, that in the homogeneous case, either all classes are stable, or unstable. The behavior of the heterogeneous case is rather different. The parameters corresponding to Figures 2 c) and d) are such that the system is stable in c), but not in d). In Figure 2 c) we see that the trajectories of all queue lengths are not always decreasing, including the maximum queue length. In Figure 2 d), we observe that the number of copies in servers 3 and 4 are decreasing, whereas those of servers 1 and 2 are increasing. When studying stability for the heterogeneous setting, one needs to reason recursively. First, 7 P c2C(s)p assume that each server s needs to handle its full load, i.e.,  . Hence, one can simply P s compare the servers capacity-to-fraction-of-arrival ratios,  = p , to see which server is s c c2C(s) the least-loaded server and could hence potentially empty first. In this example, server 4 has the maximum capacity-to-fraction-of-arrival ratio, and, in fluid scale, will reach zero in finite time, and remain zero, since  = p = 5=(p + p + p ) = 11:11 is larger than 4 c f1;4g f2;4g f3;4g c2C(4) = 7:5. Whenever, at fluid scale, server 4 is still positive, the other servers might either increase or decrease. However, the key insight is that once the queue length of server 4 reaches 0, the fluid behavior of the other classes no longer depend on the jobs that also have server 4 as compatible server. That is, we are sure that all jobs that have server 4 as compatible server, will be fully served in server 4, since server 4 is in fluid scale empty and all the other servers are overloaded. Therefore, jobs with server 4 as compatible server can be ignored, and we are left with a subsystem formed by servers f1; 2; 3g and without the job types served by server 4. Now again, we consider the maximum capacity-to-fraction-of-arrival ratio in order to determine the least-loaded server, but now for the subsystem f1; 2; 3g. This time, server 3 has the maximum capacity-to-fraction- of-arrival ratio, which is 4=(p + p ) = 10. Since this value is larger than  = 7:5, it is a f1;3g f2;3g sufficient condition for server 3 to empty. Similarly, once server 3 is empty, we consider the subsystem with servers 1 and 2 only. Hence, there is only one type of jobs, f1; 2g. Now server 2 is the least-loaded server and its capacity-to- fraction-of-arrival ratio is 2=p = 8. This value being larger than the arrival rate, implies that f1;2g server 2 (and hence server 1, because there is only one job type) will be stable too. Indeed, in Figures 2 c) we also observe that as soon as the number of copies in server 3 is relatively small compared to that of server 1 and server 2, the number of copies in both server 1 and server 2 decreases. We can now explain the evolution observed in Figure 2 d) when  = 9. The evolution for servers 4 and 3 can be argued as before: both their capacity-to-fraction-of-arrival ratios are larger than  = 9, hence they empty in finite time. However, the capacity-to-fraction-of-arrival ratio of the subsystem with servers 1 and 2, which is 8, is strictly smaller than the arrival rate. We thus observe that, unlike in the homogeneous case, in the heterogeneous case some servers might be stable, while others (here server 1 and 2) are unstable. Proposition 1 formalizes the above intuitive explanation, by showing that the stability of the system can be derived recursively. The capacity-to-fraction-of-arrival ratio allows us now to reinterpret the homogeneous case depicted in Figure 2 a) and b). In this case, the capacity-to-fraction-of-arrival ratio of all the servers is the same, which implies (i) that either all servers will be stable, or all unstable, and (ii) from the stability viewpoint is as if all copies received service until completion. 5 Stability condition 5.1 Multi-type job multi-type server system In this section we discuss the stability condition of the general redundancy system with PS. In order to do so, we first define several sets of subsystems, similar to as what we did in the illustrative example of Section 4. The first subsystem includes all servers, that is S = S. We denote by L the set of servers 1 1 with highest capacity-to-fraction-of-arrival ratio in the system S = S. Thus, s~ L = s 2 S : s = arg max P : 1 1 s~2S p c2C d=2 d=2 d=2 1 2 3 4 11 22 33 44 11 22 33 44 S S S 1 2 3 L L L 1 2 3 a) b) c) Figure 3: K = 4 server system under redundancy-2. In a) subsystem S , in b) subsystem S and 1 2 in c) subsystem S . For i = 2; : : : ; K , we define recursively i1 S := Sn[ L ; i l l=1 C := fc 2 C : c  S g; i i C (s) := C \C(s); i i ( ( )) s~ L := s 2 S : s = arg max P : i i s~2S i c c2C (s~) The S -subsystem will refer to the system consisting of the servers in S , with only jobs of types i i in the set C . The C (s) is the subset of types that are served in server s in the S -subsystem. We i i i let C = C. The L represents the set of servers s with highest capacity-to-fraction-of-arrival ratio 1 i in the S -subsystem, or in other words, the least-loaded servers in the S -subsystem. Finally, we i i denote by i := arg max fC : C 6= ;g the last index i for which the subsystem S is not i=1;:::;K i i i empty of job types. Remark 2. We illustrate the above definitions by applying them to the particular example con- sidered in Section 4. The first subsystem consists of servers S = S = f1; 2; 3; 4g and all job types, see Figure 3 a). The capacity-to-fraction-of-arrival ratios in the S subsystem are: f2:2; 3:07; 8:8; 11:1g, and thusL = f4g. The second subsystem is formed by S = f1; 2; 3g and 1 2 job types that are compatible with server 4 can be ignored, that is, C = ff1; 2g;f1; 3g;f2; 3gg, see Figure 3 b). The capacity-to-fraction-of-arrival ratios for servers in the S subsystem are given byf2:8; 4:4; 10g, and thusL = f3g. The third subsystem consists of servers S = f1; 2g and job 2 3 types that are compatible with servers 3 or 4 can be ignored, that is, C = ff1; 2gg, see Figure 3 c). The capacity-to-fraction-of-arrival ratios for servers in the S subsystem are given by f4; 8g. Hence,L = f2g. Then, S = f1g, but C = ;, so that i = 3. 3 4 4 The value of the highest capacity-to-fraction-of-arrival ratio in the S -subsystem is denoted by s~ CAR := maxf g; for i = 1; : : : ; i : s~2S p c2C (s~) Note that CAR = ; for any s 2 L : i i c2C (s) In the following proposition we characterize the stability condition for servers in terms of the capacity-to-fraction-of-arrival ratio corresponding to each subsystem. It states that servers that have highest capacity-to-fraction-of-arrival ratio in subsystem S can be stable if and only if all servers in S ; : : : ; S are stable as well. The proof can be found in Section 7. 1 i1 Proposition 1. For a given i  i , servers s 2 L are stable if  < CAR , for all l = 1; : : : ; i. i l Servers s 2 L are unstable if there is an l = 1; : : : ; i such that  > CAR . i l Corollary 2. The redundancy system is stable if  < CAR , for all i = 1; : : : ; i : The redundancy system is unstable if there exists an  2 f1; : : : ; i g such that  > CAR . We note that CAR , l = 1; : : : ; i, are not necessarily ordered with respect to l. From the corollary, we hence obtain that the stability region under redundancy is given by = min CAR : (4) i=1;:::;i We now write an equivalent representation of the stability condition (proof see Appendix). Denote by R(c) the set of servers where type-c jobs achieve maximum capacity-to-fraction-of- arrival ratio, or in other words, the set of least-loaded servers for type c: R(c) := fs : 9i; s.t. c 2 C (s) and s 2 L g: i i Note that there is a unique subsystem S for which this happens, i.e., R(c)  L for exactly one i i i. We note that for a type-c job, if c contains at least a server that was removed in the ith iteration, then R(c)  L . We further letR := [ R(c). i c2C Corollary 3. The redundancy system is stable if  p <  , for all s 2 R. The redun- c s c:s2R(c) dancy system is unstable if there exists an s 2 R such that  p >  . c s c:s2R(c) From the above corollary, we directly observe that the stability condition for the redundancy system coincides with the stability condition corresponding to K individual servers where each type-c job is only dispatched to its least-loaded servers. 5.2 Particular redundancy structures In this subsection we discuss the stability condition for some particular cases of redundancy: redundancy-d and nested systems. Redundancy-d We focus here on the redundancy-d structure (defined in Section 3) with homogeneous arrivals, i.e. p = for all c 2 C. c K ( ) In case the servers capacities are homogeneous,  =  for all k, the model fits in the setting of [3] where it was proved to be stable if d < K . This would also follow from Corollary 2: Since arrivals are homogeneous, the arrival rate to each server is d=K , thus the capacity-to- fraction-of-arrival ratio at every server is K=d. This implies that L = S, i = 1 and R(c) = c for all c 2 C. From Corollary 2, we obtain that the system is stable if d < K . For heterogeneous servers capacities, which was not studied in [3], we have the following: Corollary 4. Under redundancy-d with homogeneous arrivals and  < : : : <  , the system is 1 K i1 ( ) d1 stable if for all i = d; : : : ; K ,  <  . The system is unstable if there exists i 2 fd; : : : ; Kg ( ) i1 ( ) d1 such that  >  . ( ) In the homogeneous case, it is easy to deduce that the stability condition, d < K , decreases as d increases. However, in the heterogeneous case, both the numerator and denominator are non- monotone functions of d, and as a consequence it is not straightforward how the stability condition depends on d. This dependence on d will be numerically studied in Section 6.1. 10 Nested systems In this section we consider two nested redundancy systems. 5.2.1 N -model The simplest nested model is the N -model. This is a K = 2 server system with capacities ~ = f ;  g and types C = ff2g;f1; 2gg, see Figure 1 (b). A job is of type f2g with probability p 1 2 and of typef1; 2g with probability 1 p. The stability condition is  <  where: 2 1 >  ; 0  p 2 1 2 =(1 p);  p 2 1 2 : 2 =p; < p  1: 1 2 The above is obtained as follows: The capacity-to-fraction-of-arrival ratio of the system is  =(1 p) and  , respectively for server 1 and server 2. First assume  =(1 p) >  . Then L = f1g 2 1 2 1 and the second subsystem is composed of server S = f2g and C = ff2gg, with arrival rate 2 2 p to server 2. Hence the capacity-to-fraction-of-arrival ratio of server 2 in the S -subsystem is =p. From Corollary 2, it follows that  = minf =(1 p);  =pg. On the other hand, if 2 1 2 =(1 p) <  , then L = f2g, and S = f1g, but C = ;. Thus,  =  . Lastly, if 1 2 1 2 2 2 =(1 p) =  ,L = f1; 2g, thus S = ; andC = ;. Hence,  =  . 1 2 1 2 2 2 We observe that the stability condition  , is a continuous function reaching the maximum value  =  +  at p =  =( +  ). It thus follows that for p =  =( +  ), redundancy 1 2 2 1 2 2 1 2 achieves the maximum stability condition. We note however that in this paper our focus is not on finding the best redundancy probabilities, but instead whether given the probabilities p –which are determined by the characteristics of the job types and matchings – the system can benefit from redundancy. 5.2.2 W -model The W -model is a K = 2 server system with capacities ~ = f ;  g and typesC = ff1g;f2g;f1; 2gg, 1 2 see Figure 1 c). A job is of type f1g with probability p , type f2g with probability p and of f1g f2g type f1; 2g with probability p . W.l.o.g., assume (1 p )=  (1 p )= , that is, the f1;2g f2g 1 f1g 2 load on server 1 is larger than or equal to that on server 2. The stability condition is then given by: =(1 p ); p 2 f1g f1g 1 2 =p ; p  ; f1g f1g 1 2 if (1 p )= > (1 p )= . And, 1 2 f2g f1g =  =(1 p ) f1g if (1 p )= = (1 p )= . Similar to the N -model, the above can be obtained from 1 2 f2g f1g Corollary 2. When p =  =( +  ), maximum stability  =  +  is obtained. f1g 1 1 2 1 2 6 When does redundancy improve stability In this section, we compare the stability condition of the general redundancy system to that of the Bernoulli routing. Each job type has its own compatible servers, denoted by c. Hence, given the compatible servers and the arrival rates of each type of jobs, we study whether redundancy can improve the stability condition. 11 R From Corollary 2, it follows that  = min CAR . Together with (3), we obtain the i=1;:::;i i following sufficient and necessary conditions for redundancy to improve the stability condition. Corollary 5. The stability condition under redundancy is larger than under Bernoulli routing if and only if s s P P min f g  minf g: i=1;:::;i ;s2L p s2S c2C (s) c2C(s) i jcj From inspecting the condition of Corollary 5, it is not clear upfront when redundancy would be better than Bernoulli. In the rest of the section, by applying Corollary 5 to redundancy-d and nested models, we will show that when the capacities of the servers are sufficiently heterogeneous, the stability of redundancy is larger than that of Bernoulli. In addition, numerical computations allow us to conclude that the degree of heterogeneity needed in the servers in order for redundancy to be beneficial, decreases in the number of servers, and increases in the number of redundant copies. 6.1 Redundancy-d In this section, we compare the stability condition of the redundancy-d model with homogeneous arrivals to that of Bernoulli routing. From (3), we obtain that ( ) = d min = K min  : (5) i=1;:::;K p i=1;:::;K c2C(s) ( ) From Corollary 4 , we obtain that  = min  . The following corollary is i=d;:::;K i1 i ( ) d1 straightforward. Corollary 6. Let  < : : : <  . The system under redundancy-d and homogeneous arrivals has 1 K a strictly larger stability condition than the system under Bernoulli routing if and only if ( ) K < min  : 1 i i1 i=d;:::;K d1 i1 The following is straightforward, since is increasing in i. d1 Corollary 7. Assume  < : : : <  and homogeneous arri-vals. The system under redundancy- 1 K d has a larger stability region than the Bernoulli routing if  d <  . 1 d Hence, if there exists a redundancy parameter d such that  d <  , then adding d redundant 1 d copies to the system improves its stability region. In that case, the stability condition of the system will improve by at least a factor . In Table 2, we analyze how the heterogeneity of the server capacities impacts the stability k1 of the system. We chose  =  , k = 1; : : : ; K , so that the minimum capacity equals 1. Hence, for Bernoulli,  = K . Under redundancy we have the following: For  = 1 the system is a redundancy-d system with homogeneous arrivals and server capacities, so that  = K=d, R B [3]. Thus,  <  in that case. For  > 1, that is, heterogeneous servers, we can apply Corollary 2 in order to find  , that is, use Equation (4). More precisely, we create recursively the i subsystems, calculate CAR for each i = 1; : : : ; i , so that  = min CAR . We i i=1:::;i i denote by  the value of  for which the stability region of the redundant system coincides with R B that of Bernoulli routing, i.e., the value of  such that  =  . For  <  (the area on the left-hand-side of the thick line in Table 2), Bernoulli has a larger stability region, while for  > (the area on the right-hand-side of the thick line in in Table 2), redundancy outperforms Bernoulli. First, we observe that, for a fixed d,  decreases as K increases, and is always less than = 2. Therefore, as the number of servers increases, the level of heterogeneity that is needed in the servers in order to improve the stability under redundancy decreases. Second, for fixed K , we also observe that  increases as d increases. This means that as the number of redundant copies d increases, the server capacities need to be more heterogeneous in order to improve the stability region under redundancy. Finally, focusing on the numbers in bold, we observe that when the number of servers K is large enough and the servers are heterogeneous enough (large ), the stability region increases in the number of redundant copies d. R B Table 2: The maximum arrival rates  and  in a redundancy-d system with homogeneous k1 arrivals and capacities  =  . = 1  = 1:2  = 1:4  = 2  = 3 K = 3 Red-2 1.5 2.16 2.94 6 9 1.41 BR 3 3 3 3 3 K = 4 Red-2 2 3.45 5.48 12 18 1.26 BR 4 4 4 4 4 K = 5 Red-2 2.5 5.18 9.14 20 30 1.19 BR 5 5 5 5 5 K = 10 Red-2 5 22.39 41.16 90 135 1.08 BR 10 10 10 10 10 K = 4 Red-3 1.33 2.30 3.65 10.66 36 1.44 BR 4 4 4 4 4 K = 5 Red-3 1.66 3.45 6.40 26.66 90 1.31 BR 5 5 5 5 5 K = 10 Red-3 3.33 17.19 60.23 320 1080 1.13 BR 10 10 10 10 10 In Table 3, we consider linearly increasing capacities on the interval [1; M ], that is  = M1 1 + (k 1), for k = 1; : : : ; K . In the area on the right-hand-side of the thick line, redundancy K1 outperforms Bernoulli. For this specific system, the following corollary is straightforward. Corollary 8. Under a redundancy-d system with homogeneous arrivals and capacities  = M1 MK 1 + (k 1), for k = 1; : : : ; K , the redundancy system has stability condition:  = , K1 d for d > 1, while  = K . Hence, the redundancy system outperforms the stability condition of the Bernoulli routing if and only if M  d. Simple qualitative rules can be deduced. If M  d, redundancy is a factor M=d better than Bernoulli. Hence, increasing M , that is, the heterogeneity among the servers, is significantly beneficial for the redundancy system. However, the stability condition of the redundancy system degrades as the number of copies d increases. 6.2 Nested systems 6.2.1 N -model The stability condition of the N -model with Bernoulli routing is given by the following expression: 2 minf ;  g; if p = 0 1 2 < + 2 1 2 =(1 p); if 0  p 1 2 2 1 2 =(1 + p); if < p  1: 1 2 The above set of conditions is obtained from the fact that under Bernoulli routing,  = minf2 =(1 1 B p);  =(p + (1 p))g. Note that  is a continuous function with a maximum  +  at the 2 1 2 2 1 B R point p = . Now, comparing  to  as obtained in Section 5.2.1 leads to the following: 1 2 13 R B Table 3: The maximum arrival rates  and  in a redundancy-d system with homogeneous M1 arrivals and capacities  = 1 + (k 1). K1 M = 1 M = 2 M = 3 M = 4 M = 6 K = 3 Red-2 1.5 3 4.5 6 9 BR 3 3 3 3 3 K = 4 Red-2 2 4 6 8 12 BR 4 4 4 4 4 K = 5 Red-2 2.5 5 7.5 10 15 BR 5 5 5 5 5 K = 10 Red-2 5 10 15 20 30 BR 10 10 10 10 10 K = 4 Red-3 1.33 2.66 4 5.33 8 BR 4 4 4 4 4 K = 5 Red-3 1.66 3.33 5 6.66 10 BR 5 5 5 5 5 K = 10 Red-3 3.33 6.66 10 13.33 20 BR 10 10 10 10 10 Corollary 9. Under an N -model, the stability condition under redundancy is larger than under 2 1 Bernoulli routing under the following conditions: If    , then p 2 ( ; 1). If 2 1 2 + 2 1 2 2 2 1 + 2 1 >  , then p 2 (0; ( ) )[ ( ; 1). 2 1 2 + 2 2 1 From the above we conclude that if  is larger than 2 , then redundancy is always better 1 2 than Bernoulli, independent of the arrival rates of job types. For the case  >  , we observe 2 1 that for  large enough, redundancy will outperform Bernoulli. 6.2.2 W -based nested systems We consider the following structure of nested systems: W (see Figure 1 c) ), WW (Figure 1 d)) and WWWW . The latter is a K = 8 server system that is composed of 2 WW models and an additional job type c = f1; : : : ; 8g for which all servers are compatible. For all three models, we assume that a job is with probability p = 1=jCj of type c. In Table 4, we analyze how heterogeneity in the server capacities impacts the stability. First B R of all, note that  = K . For redundancy, the value of  is given by (4), which depends on the server capacities. In the table, we present these values for different values of the server capacities. k1 In the upper part of the table, we let  =  for k = 1; : : : ; K . We denote by  the value of R B for which  =  . We observe that as the number of servers duplicate, the  decreases, and is always smaller than 1.5. So that, as the number of servers increases, the level of heterogeneity that is needed in order for redundancy to outperform Bernoulli decreases too. M1 In the second part of the table we assume  = 1 + (k 1) for k = 1; : : : ; K . We observe K1 that when M  K the stability condition under redundancy equals  = jCj, which is always larger than  = K . However, as the number of servers increases, the maximum capacity of the servers, M , needs to increase M in order for redundancy to outperform Bernoulli. 7 Proof of Proposition 1 In this section, we prove that the condition in Proposition 1 is sufficient and necessary for the respective subsystem to be stable. As we observe in Section 4, there are two main issues concern- ing the evolution of redundancy systems with heterogeneous capacities. First of all, the number of copies in a particular server decreases, only if a certain subset of servers is already in steady state. Secondly, for a particular server s 2 S, the instantaneous departure of that server might be larger than  due to copies leaving in servers other than s. This makes the dynamics of the system complex. In order to prove Proposition 1, we therefore construct upper and lower bounds 14 R B Table 4: The maximum arrival rates  and  in nested systems. k1 =   = 1  = 1:2  = 1:4  = 2 K = 2 W -model 1.5 1.8 2.10 3 1.33 BR 2 2 2 2 K = 4 WW -model 2.33 4.03 4.90 7 1.19 BR 4 4 4 4 K = 8 WWWW -model 3.75 8.64 10.5 15 1.17 BR 8 8 8 8 M1 = 1 + (k 1) M = 1 M = 2 M = 4 M = 6 M = 8 K1 K = 2 W -model 1.5 3 3 3 3 BR 2 2 2 2 2 K = 4 WW -model 2.33 4.66 7 7 7 BR 4 4 4 4 4 K = 8 WWWW -model 3.75 7.14 10.71 12.85 15 BR 8 8 8 8 8 of our system for which the dynamics are easier to characterize. Proving that the upper bound (lower bound) is stable (unstable) directly implies that the original system is also stable (unstable). This will be done in Proposition 12 and Proposition 15. All proofs of this section can be found in Appendix B. Sufficient stability condition We define the Upper Bound (UB) system as follows. Upon arrival, each job is with probability p of type c and sends identical copies to all servers s 2 c. In the UB system, a type-c job departs the system only when all copies in the set of servers R(c) are fully served. We recall that the set R(c) denotes the set of servers where a type-c job achieves maximum capacity-to-fraction-of- arrivals ratio. When this happens, the remaining copies that are still in service (necessarily not in UB a server in R(c)) are immediately removed from the system. We denote by N (t) the number of type-c jobs present in the UB system at time t. We note that the UB system is closely related to the one in which copies of type-c jobs are only sent to servers in R(c). However, the latter system is of no use for our purposes as it is neither an upper bound nor a lower bound of the original system. We can now show the first implication of Proposition 1, that is, we prove that  < CAR , for all l = 1; : : : ; i, implies stability of the servers in the set L . We do this by analyzing the UB system for which stability of the servers L follows intuitively as follows: Given a server s 2 L i 1 and any type c 2 C (s), it holds that R(c)  L (c). Hence, a server in L will need to fully serve 1 1 all arriving copies. Therefore each server s, with s 2 L , behaves as an M/G/1 PS queue, which is stable if and only if its arrival rate of copies,  p , is strictly smaller than its departure c2C (s) rate,  . Assume now that for all l = 1; : : : ; i 1 the subsystems S are stable and we want to s l show that servers inL are stable as well. First of all, note that in the fluid limit, all types c that do not exist in the S -subsystem, i.e., c 2= C (s), will after a finite amount of time equal (and remain) i i zero, since they are stable. For the remaining types c that have copies in server s 2 L , i.e., s 2 c with s 2 L , it will hold that their servers with maximum capacity-to-fraction-of-arrivals ratio are R(c)  L . Due to the characteristics of the upper-bound system, all copies sent to these servers will need to be served. Hence, a server s 2 L behaves in the fluid limit as an M/G/1 PS queue with arrival rate  p and departure rate  . In particular, such a queue is stable if and c s c2C (s) P i only if  p <  . c s c2C (s) Proposition 10. For i  i , the set of servers s 2 L in the UB system is stable if  < CAR ; for all l = 1; : : : ; i: In the following, we prove that UB provides an upper bound on the original system. To do so, we show that every job departs earlier in the original system than in the UB system. In the 15 statement, we assume that in case a job has already departed in the original system, but not in the UB system, then its attained service in all its servers in the original system is set equal to its service requirement b . cj UB UB Proposition 11. Assume N (0) = N (0) and a (0) = a (0), for all c; j; s. Then, N (t) c cjs c c cjs UB UB N (t) and a (t)  a (t), for all c; j; s and t  0. cjs c cjs Together with Proposition 10, we obtain the following result for the original system. Proposition 12. For a given i  i , servers s 2 L are stable if  < CAR , for all l = 1; : : : ; i. i l Remark 3. In [3], the authors show that for the redundancy-d system with homogeneous arrivals and server capacities, the system where all the copies need to be served is an upper bound. We note that this upper bound coincides with our upper bound (in that case L = S). Nevertheless, the proof approach is different. In [3], see also [28], the proof followed directly, as each server in the upper bound system behaved as an M/G/1 PS queue. In the heterogeneous server setting studied here, the latter is no longer true. Instead, it does apply recursively when considering the fluid regime: In order to see a server as a PS queue in the fluid regime, one first needs to argue that the types that have copies in higher capacity-to-fraction-of-arrivals servers are 0 at a fluid scale. Remark 4. We note that the light-tail assumption on the service time distribution, see Section 3, is an assumption needed in order to prove Lemma 18 (see Appendix B for more details). Necessary stability condition In this section we prove the necessary stability condition of Proposition 1. Let us first define := minfl = 1; : : : ; i :  > CAR g : We note that for any i < ,  < CAR . Hence, the servers in L , with i <  are stable, see i i Proposition 10. We are left to prove that the servers in S cannot be stable. In order to do so, we construct a lower-bound system. In the S subsystem, the capacity-to-fraction-of-arrivals ratios are such that for all s 2 S , =( p )  CAR . We will construct a lower bound (LB) system in which the resulting s c c2C (s) capacity-to-fraction-of-arrivals ratio is CAR for all servers s 2 S . We use the superscript LB in the notation to refer to this system, which is defined as follows. First of all, we only want to focus LB on the S system, hence, we set the arrival rate p = 0 for types c 2 CnC , whereas the arrival LB rate for types c 2 C remain unchanged, i.e., p = p . The capacity of servers s 2 S in the LB-system is set to c2C (s) LB :=  P = CAR  ( p ); s~  c c2C (s~) c2C (s) where s ~2 L . Additionally, in the LB-system, we assume that each copy of a type-c job receives LB LB the same amount of capacity, which is equal to the highest value of  =M (t), s 2 c. We s s therefore define the service rate for a job of type c by LB LB LB s (N (t)) := max ; (6) LB s2c M (t) where c 2 C (instead of  () for a copy in server s in the original system). The cumulative amount of capacity that a type-c job receives is LB LB LB (v; t) :=  (N (x))dx; for c 2 C : c c x=v 16 Proposition 13. In the LB-system, the set of servers s 2 S is unstable if  > CAR . We now prove that LB is a lower bound for the original system. LB LB Proposition 14. Assume N (0) = N (0), for all c. Then, N (t)  N (t), for all c 2 C and c c st c c t  0. Combining Proposition 13 with Proposition 14, we obtain the following result for the original system. Proposition 15. Servers s 2 S are unstable if there is an l = 1; : : : ; i such that  > CAR . Remark 5. In the special case of redundancy-d with homogeneous arrivals and server capacities, [3] used a lower bound that consisted in modifying the service rate obtained per job type, as in (6). This lower bound coincides with our lower bound, since with homogeneous arrivals and servers LB it holds that  =  = . The difficulty when studying heterogeneous servers in a general redundancy structure, as we do in this paper, lies in the fact that the load received in each server is different. In order to show that the fluid limit of the server with the minimum number of copies is increasing (in the lower bound), we need to adequately modify the server capacities in order to make sure that the capacity-to-fraction-of-arrival rates in each of the servers is equal. 8 Numerical analysis We have implemented a simulator in order to assess the impact of redundancy. In particular, we evaluate the following: • For PS servers, we numerically compare the performance of redundancy with Bernoulli routing (in Section 6 this was done analytically for the stability conditions). • We compare redundancy to the Join the Shortest Queue (JSQ) policy according to which each job is dispatched to the compatible server that has the least number of jobs (ties are broken at random). In a recent paper, [7], it was shown that JSQ – with exponential ser- vice time distributions – combined with size-unaware scheduling disciplines such as FCFS, ROS or PS, is maximum stable, i.e., if there exists a static dispatching policy that achieves stability, so will JSQ. • We compare the performance between PS, FCFS and Random Order of Service (ROS), when the service time distribution is exponential and bounded Pareto. Our simulations consider a large number of busy periods (10 ), so that the variance and confidence intervals of the mean number of jobs in the system are sufficiently small. Exponential service time distributions: In Figure 4 we consider the W -model with expo- nential service time distributions. We set p = 0:35 and p +p = 0:65, and vary the value f1g f2g f1;2g of p . We consider either ~ = (1; 2) or ~ = (2; 1), The only redundant job type is f1; 2g, thus f1;2g as p increases, we can observe how increasing the fraction of redundant jobs affects the per- f1;2g formance. We also note that when p increases, the load in server 1 increases as well, whereas f1;2g the load in server 2 stays constant. In Figure 4 a) and b) we depict the mean number of jobs under redundancy, Bernoulli routing and JSQ when the server policy is PS. In Figure 4 c) we plot  , B J and  using the analysis of Section 5.2.2. and [7], respectively. We observe from Figure 4 a) and b) that when ~ = (1; 2), redundancy performs better than Bernoulli routing. This difference becomes larger as p increases. This is due to the fact that the f1;2g redundancy policy does better in exploiting the larger capacity of server 2 than Bernoulli, which 17 becomes more important as p increases. In addition, we note that for redundancy, Bernoulli f1;2g and JSQ, the mean number of jobs increases as p increases. The reason for this is that as f1;2g p increases, the load on server 1 increases. Since server 1 is the slow server, this increases the f1;2g mean number of jobs. In the opposite case, i.e., ~ = (2; 1), the mean number of jobs is non-increasing in p . f1;2g This is because as p increases, the load on server 1 increases. Since server 1 is now the fast f1;2g server, this has a positive effect on the performance (decreasing mean number of jobs). However, as p gets larger, the additional load (created by the copies) makes that the performance can be f1;2g negatively impacted. This happens for  = 2, where the mean number of jobs under redundancy is a U-shape function. We furthermore observe that in the ~ = (2; 1) case, redundancy outperforms Bernoulli for any value of p when  = 1:5. However, when  = 2, Bernoulli outperforms f1;2g redundancy when p > 0:49. This is due to the additional load, generated under redundancy, f1;2g that becomes more pronounced as p becomes larger. f1;2g We also observe in Figure 4, that under both ~ = (1; 2) and ~ = (2; 1), JSQ outperforms redundancy. For small values of p the difference is rather small, however it becomes larger as f1;2g p increases due to the additional load that redundancy creates. However that this improvement f1;2g does not come for free, as JSQ requires precise information of the queue lengths at all times. In Figure 4 c), we observe that redundancy consistently has a larger stability region than Bernoulli in the ~ = (1; 2) case and for p 2 [0; 0:5) in the ~ = (2; 1) case. We let f1;2g J J be the value of  such that JSQ is stable if  <  and unstable if  >  . Using [7], = max min : p 0; p =p c;s c;s c c;s We observe that the stability condition under redundancy coincides on a large region with that of JSQ, which, in view of the results of [7], implies that redundancy is in that region maximum stable. In Figure 5 we simulate the performance of the W model for different values of  , while keeping fixed p~ = (p ; p ; p ) and  = 1. In Figure 5 a) we plot the mean number of jobs f1g f2g f1;2g 1 and we see that for both configurations of p~, the performance of the redundancy with PS, Bernoulli and JSQ improve as  increases. The gap between redundancy and Bernoulli is significant in both R B J cases. The reason can be deduced from Figure 5 b), where we plot  ,  , and  , with respect to  . We observe in Figure 5 a) that redundancy and JSQ converge to the same performance as grows large. Intuitively, we can explain this by observing that for very large values of  , with 2 2 both redundancy and JSQ, all jobs of type p get served in server 2. We observe in Figure 5 b) f1;2g that the stability conditions with redundancy and JSQ are very similar. a)  = 1:5 b)  = 2 c) Figure 4: W -model with p = 0:35, p = 1p p . a) and b) depict the mean number f1g f2g f1g f1;2g of jobs under redundancy with PS (), Bernoulli routing () and JSQ () for  = 1:5 and  = 2. R B J c) depicts the stability regions  ,  and  . 18 General service time distributions: In Figure 6 a) we investigate the performance of redun- dancy with PS for several non-exponential distributions. In particular, we consider the following distributions for the service times: deterministic, hyperexponential, and bounded Pareto. With the hyperexponential distribution, job sizes are exponentially distributed with parameter  ( ) with 1 2 1(k=x) probability q (1 q). For Pareto the density function is , for k  x  q ~. We choose the (1(k=q~) ) parameters so that the mean service time equals 1. Namely for the hyperexponential distribution parameters are q = 0:2,  = 0:4 and  = 1:6, and for the bounded Pareto distribution are 1 2 = 0:5, q ~ = 6 and k = 1=q ~. In Figure 6 a), we plot the mean number of jobs as a function of  for the N , W , WW , and redundancy-2 (K = 5), and redundancy-4 (K = 5) models. The respective parameters p~ are chosen such that the system is stable for the simulated arrival rates. We observe that for the five systems, performance seems to be nearly insensitive to the service time distribution, beyond its mean value. Markov-modulated capacities: In Figure 6 b) we consider a variation of our model where servers’ capacities fluctuate over time. More precisely, we assume that each server has an ex- ponential clock, with mean . Every time the clock rings, the server samples a new value for S from Dolly(1,12), see Table 5 and sets its capacity equal to 1=S. The Dolly(1,12) distribution is a 12-valued discrete distribution that was empirically obtained by analyzing traces in Facebook and Microsoft clusters, see [1, 12]. In Figure 6 b) we plot the mean number of jobs for a K = 5 server system with redundancy- 2 and redundancy-4, and for the W -model under redundancy, and we compare it with Bernoulli routing. Arrival rates are equal for all classes. It can be seen that with Bernoulli routing, both redundancy-2 and redundancy-4 become equivalent systems, and hence their respective curves overlap. The general observation is that in this setting with identical servers, Bernoulli routing performs better than redundancy. Further research is needed to understand whether with heteroge- neous Markov-modulated servers, redundancy can be beneficial. Table 5: The Dolly(1,12) empirical distribution for the slowdown [1]. The capacity is set to 1=S. S 1 2 3 4 5 6 7 8 9 10 11 12 Prob 0.23 0.14 0.09 0.03 0.08 0.10 0.04 0.14 0.12 0.021 0.007 0.002 FCFS and ROS scheduling discipline: The stability condition under FCFS or ROS and identical copies is not known. An exception is the redundancy-d model with homogeneous arrivals and server capacities for which [3] characterizes the stability condition under ROS, FCFS and PS. There it was shown that ROS is maximum stable, i.e., the stability condition is  < K , and that under FCFS the stability condition is  < `, where ` is the mean number of jobs in service in a a) b) Figure 5: W -model with fixed parameters p~ and  = 1: a) depicts the mean number of jobs under R B redundancy (), Bernoulli routing () and JSQ (), and b) depicts the stability regions  , and  . 19 a) b) Figure 6: Mean number of jobs in the system with respect to : a) Non-exponential service times and models N , W , WW , and redundancy-2 (K = 5), and redundancy-4 (K = 5) models. We chose ~ = (1; 2) for the N and W model, ~ = (1; 2; 4; 6) for the WW model, and ~ = (1; 2; 4; 6; 8) for redundancy-d. b) Markov modulated server capacities in the W , and redundancy- 2 (K = 5), and redundancy-4 (K = 5) models. so-called associated saturated system. In addition, it was shown that for this specific setting, the stability region under PS is smaller than under FCFS and ROS. In Figure 7 a) and b) we consider a W -model and compare the performance for the different policies PS, FCFS and ROS. We take exponentially distributed service times. We plot the mean number of jobs with respect to p , with p = 0:35 and p = 1p p . In Figure 7 a) f1;2g f1g f2g f1g f1;2g we set  = 1, and in Figure 7 b) we set  = 2. The stability condition under PS is given in Figure 4 c). In the case of ~ = (1; 2), we observe that FCFS always outperforms ROS. Intuitively we can explain this as follows. Since p is kept fixed, as p increases, the load in server 1 f1g f1;2g increases. With FCFS, it is more likely that both servers work on the same copy, and hence that the fast server 2 “helps” the slow server 1 (with high load). With ROS however, both servers tend to work on different copies, and the loaded slow server 1 will take a long time serving copies that could have been served faster in the fast server 2. On the other hand, with ~ = (2; 1) and sufficiently large p , ROS outperforms FCFS. In this case, the loaded server 1 is the fast server, f1;2g and hence having both servers working on the same copy becomes ineffecient, which explains that the performance under ROS becomes better. As a rule of thumb, it seems that for a redundancy model, if slow servers are highly loaded, then FCFS is preferable, but if fast servers are highly loaded, then ROS is preferable. From Figures 7 a) and b) we further observe that for all values of p , FCFS and ROS f1;2g outperform PS, and that the gap increases when  increases. In Figure 7 c) we consider exponential and bounded Pareto (with = 0:5 and q ~ = 15) service time distributions and plot the mean number of jobs for different values of  , when  = 1:5, p~ = (0:35; 0:4; 0:25) and  = 2. As 2 1 before, with exponentially distributed service times, FCFS and ROS slightly outperform PS. In the case where jobs have bounded Pareto distributed service times, PS outperforms both FCFS and ROS. This seems to indicate that as the variability of the service time distribution increases, PS might become a preferable choice over FCFS and ROS in redundancy systems. Additionally, under PS we observe that the mean number of jobs is nearly insensitive to the service time distribution. The main insight we obtain from Figure 7 is that the stability and performance of heteroge- neous redundancy systems strongly depends on the employed service policy in the servers. We leave the stability analysis of other scheduling policies (such as FCFS or ROS) for future work as they require a different proof approach. 20 c) p~ = (0:35; 0:40; 0:25) a)  = 1 b)  = 2 Figure 7: Mean number of jobs with redundancy combined with PS, FCFS, and ROS. a) and b) for the W model with respect to p and exponentially distributed service times, with p = 0:35 f1;2g f1g and p = 1 p p . a)  = 1, b)  = 2. c) For the W model under exponentially and f2g f1g f1;2g bounded Pareto ( = 0:5, q ~ = 15) distributed service times, and with respect to  , for  = 1:5, p~ = (0:35; 0:4; 0:25) and  = 2. 9 Conclusion With exponentially distributed jobs, and i.i.d. copies, it has been shown that redundancy does not reduce the stability region of a system, and that it improves the performance. This happens in spite of the fact that redundancy necessarily implies a waste of computation resources in servers that work on copies that are canceled before being fully served. The modeling assumptions play thus a crucial role, and as argued in several papers, e.g. [12], the i.i.d. assumption might lead to insights that are qualitatively wrong. In the present work, we consider the more realistic situation in which copies are identical, and the service times are generally distributed. We have shown that redundancy can help improve the performance in case the servers capacities are sufficiently heterogeneous. To the best of our knowl- edge, this is the first positive result on redundancy with identical copies, and it illustrates that the negative result proven in [3] critically depends on the fact that the capacities were homogeneous. We thus believe that our work opens the avenue for further research to understand when re- dundancy is beneficial in other settings. For instance, it would be interesting to investigate what happens in case servers implement other scheduling policies. It is also important to consider other cross-correlation structures for the copies, in particular the S&X model recently proposed in the literature. Another interesting situation is when the capacities of the servers fluctuate over time. Other possible extension is to consider the cancel-on-start variant of redundancy, in which as soon as one copy enters service, all the others are removed. For conciseness purposes, in this paper we have restricted ourselves to what we considered one of the most basic, yet interesting and relevant setting. Acknowledgments The PhD project of E. Anton is funded by the French “Agence Nationale de la Recherche (ANR)” [Project ANR-15-CE25-0004 (ANR JCJC RACON)]. This work (in particular, research visits of E. Anton and M. Jonckheere) was partially funded by a STIC AMSUD GENE project. U. Ayesta received funding from the Department of Education of the Basque Government through the Con- solidated Research Group MATHMODE (IT1294-19). 21 References [1] Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. 2013. Effective Strag- gler Mitigation: Attack of the Clones.. In NSDI, Vol. 13. 185–198. [2] Ganesh Ananthanarayanan, Srikanth Kandula, Albert G Greenberg, Ion Stoica, Yi Lu, Bikas Saha, and Edward Harris. 2010. Reining in the Outliers in Map-Reduce Clusters using Mantri.. In OSDI’10 Proceedings of the 9th USENIX conference on Operating systems design and implementation. 265–278. [3] Elene Anton, Urtzi Ayesta, Matthieu Jonckheere, and Ina Maria Verloop. 2020. On the stability of redundancy models. To appear in Operations Research (2020). [4] Soeren Asmussen. 2002. Applied Probability and Queues. Springer. [5] Thomas Bonald and Celine ´ Comte. 2017. Balanced fair resource sharing in computer clus- ters. Performance Evaluation 116 (2017), 70–83. [6] Maury Bramson. 2008. Stability of Queueing Networks. Springer. [7] James Cruise, Matthieu Jonckheere, and Seva Shneer. 2020. Stability of JSQ in queues with general server-job class compatibilities. Queueing Systems 95 (2020), 271–279. [8] Jim G. Dai. 1996. A fluid limit model criterion for instability of multiclass queueing net- works. The Annals of Applied Probability 6 (1996), 751–757. [9] Jeffrey Dean and Luiz Andre ´ Barroso. 2013. The tail at scale. Commun. ACM 56, 2 (2013), 74–80. [10] Regina Egorova. 2009. Sojourn time tails in processor-sharing systems, Technische Univer- siteit Eindhoven. Ph.D. Dissertation. [11] Sergey Foss, Dmitry Korshunov, and Stan Zachary. 2013. An introduction to heavy-tailed and subexponential distributions (2nd ed.). Springer. [12] Kristen Gardner, Mor Harchol-Balter, Alan Scheller-Wolf, and Benny van Houdt. 2017. A Better Model for Job Redundancy: Decoupling Server Slowdown and Job Size. IEEE/ACM Transactions on Networking 25, 6 (2017), 3353–3367. [13] Kristen Gardner, Esa Hyytia, ¨ and Rhonda Righter. 2019. A Little Redundancy Goes a Long Way: Convexity in Redundancy Systems. Performance Evaluation (2019) (2019). [14] Kristen Gardner, Samuel Zbarsky, Sherwin Doroudi, Mor Harchol-Balter, Esa Hyytia, ¨ and Alan Scheller-Wolf. 2016. Queueing with redundant requests: exact analysis. Queueing Systems 83, 3-4 (2016), 227–259. [15] H. Christian Gromoll, Philippe Robert, and Bert Zwart. 2008. Fluid Limits for Processor Sharing Queues with Impatience. Math. Oper. Res. 33 (05 2008), 375–402. [16] Mor Harchol-Balter. 2013. Performance Modeling and Design of Computer Systems: Queue- ing Theory in Action. Cambridge University Press. [17] Tim Hellemans, Tejas Bodas, and Benny van Houdt. 2019. Performance Analysis of Work- load Dependent Load Balancing Policies. POMACS 3, 2 (2019), 35:1–35:35. 22 [18] Tim Hellemans and Benny van Houdt. 2018. Analysis of redundancy(d) with identical Repli- cas. Performance Evaluation Review 46, 3 (2018), 1–6. [19] Gauri Joshi, Emina Soljanin, and Gregory Wornell. 2015. Queues with redundancy: Latency- cost analysis. ACM SIGMETRICS Performance Evaluation Review 43, 2 (2015), 54–56. [20] Ger Koole and Rhonda Righter. 2007. Resource allocation in grid computing. Journal of Scheduling (2007). [21] Rhonda Righter Kristen Gardner, Esa Hyytia. ¨ 2018. A little redundancy goes a long way: convexity in redundancy systems. Preprint submitted to Elsevier (2018). [22] Kangwook Lee, Ramtin Pedarsani, and Kannan Ramchandran. 2017. On scheduling redun- dant requests with cancellation overheads. IEEE/ACM Transactions on Networking (TON) 25, 2 (2017), 1279–1290. [23] Kangwook Lee, Nihar B. Shah, Longbo Huang, and Kannan Ramchandran. 2017. The mds queue: Analysing the latency performance of erasure codes. IEEE Transactions on Informa- tion Theory 63, 5 (2017), 2822–2842. [24] Nam H. Lee. 2008. A sufficient condition for stochastic stability of an Internet congestion control model in terms of fluid model stability, UC San Diego. Ph.D. Dissertation. [25] Sean Meyn and Richard Tweedie. 1993. Generalized resolvents and Harris recurrence of Markov processes. Contemp. Math. 149 (1993), 227–250. [26] Fernando Paganini, Ao Tang, Andres ´ Ferragut, and Lachlan Andrew. 2012. Network Sta- bility under Alpha Fair Bandwidth Allocation with General File Size Distribution. IEEE Transactions. on Automatic Control 57, 3 (2012), 579–591. [27] Youri Raaijmakers, Sem Borst, and Onno Boxma. 2019. Redundancy scheduling with scaled Bernoulli service requirements. Queueing Systems Volume 93 (2019). Issue 1-2. [28] Youri Raaijmakers, Sem Borst, and Onno Boxma. 2020. Stability of Redundancy Systems with Processor Sharing. In Proceedings of the 13th EAI International Conference on Perfor- mance Evaluation Methodologies and Tools (VALUETOOLS ’20). Association for Comput- ing Machinery, New York, NY, USA, 120–127. [29] Nihar B. Shah, Kangwook Lee, and Kannan Ramchandran. 2016. When do redundant re- quests reduce latency? IEEE Transactions on Communications 64, 2 (2016), 715–722. [30] Ashish Vulimiri, Philip Brighten Godfrey, Radhika Mittal, Justine Sherry, Sylvia Ratnasamy, and Scott Shenker. 2013. Low latency via redundancy. In Proceedings of the ACM conference on Emerging networking experiments and technologies. ACM, 283–294. APPENDIX A Proofs of Section 5 Proof of Corollary 3 Let us consider s 2 R. Let i be such that s 2 L , which is unique since fL g is a partition i i i=1 of R. We will show that for this s and i, it holds that CAR = . Hence, together with c:s2R(c) Corollary 2 this concludes the result. P P First, note that CAR = . Hence, we need to prove that p = p , i c c c:s2R(c) c2C (s) p i c2C (s) or equivalently,fc : s 2 R(c)g = C (s). For any c 2 C(s), R(c) = L (c) with l  i. We note that C (s) = C(s)nfc 2 C(s) : R(c) = l i L (c) with l < ig. Therefore, for s 2 L , C (s) = fc 2 C : s 2 c ; c 2 C ; s 2 L (c)g = fc 2 l i i i i C : s 2 R(c)g. The last equality holds by definition ofR(c). 2 Proof of Corollary 4. The stability condition of such a system is given by Corollary 2. We note that each server K1 s 2 S receives C(s) = different job types, that is, by fixing a copy in server s, all possible d1 ( ) combinations of d 1 servers out of K 1. Thus, L = arg max f  g = K , S = 1 s2S s 2 1 K1 ( ) d1 K1 ( ) d1 S fKg and condition  <  . K K ( ) jS j1 We note each server s 2 S receives different job types, for i = 1; : : : ; i and thus, the d1 maximum capacity-to-fraction-of-arrivals ratio in the subsystem with servers S , only depends on the capacities of servers in S , that isL = arg max f g. Additionally since,  < : : : <  , i i s2S s 1 K one obtains that L = K i + 1, for i = 1; : : : ; K d + 1. The associated conditions are Ki+1 ( ) d1 <  for i = 1; : : : ; K i + 1. This set of conditions is equivalent to that in K Ki+1 ( ) Corollary 4. 2 B Proofs of Section 7 We first introduce some notation: We denote by E (t) = maxfj : U < tg the number of c cj type-c jobs that arrived during the time interval (0; t) and by U the instant of time at which the cj jth type-c job arrived to the system. We recall that b denotes its service realization. We denote cj by b the residual job size of the mth eldest type-c job in server s that is already in service at cms time 0. Sufficient stability condition Proof of Proposition 10 We now prove the stability of the UB system. For that, we first describe the dynamics of the UB number of type-c jobs in the UB system, denoted by N (t). We recall that a type-c job departs min only when all the copies in the set of servers R(c) are completely served. We let  (v; t) = R(c) min f (v; t)g be the minimum cumulative amount of capacity received by a copy in one of s~ s~2R(c) its serversR(c) during the interval (v; t). Therefore, UB N (0) E (t) X X UB 0 min N (t) = 1 f9s ~2 R(c) : b >  (0; t)g + 1 b >  (U ; t) : s~ cj cj c cms~ R(c) m=1 j=1 UB We denote the number of type-c copies in server s by M (t). We note that for a type-c job s;c in server s there are two possibilities: • if s 2 R(c), the copy of the type-c job leaves the server as soon as it is completely served. The cumulative amount of capacity that the copy receives during (v; t) is  (v; t). 24 • If s 2= R(c), the copy of the type-c job in server s leaves the system either if it is completely served or if all copies of this type-c job in the serversR(c) are served. We note that for any s ~2 R(c), s ~2 L , with l < i. Hence, the number of type-c jobs in server s 2 L is given by the following expression. If s 2 R(c), UB M (0) E (t) s;c c X X UB 0 M (t) = 1 b >  (0; t) + 1 (b >  (U ; t)) s cj s cj s;c cms m=1 j=1 and if s 2= R(c), UB M (0) s;c UB 0 0 M (t) = 1 f9s ~2 R(c) : b >  (0; t)g\ b >  (0; t) s~ s s;c cms~ cms m=1 E (t) + 1 b >  (U ; t) ; cj cj R(c);s j=1 min where  (v; t) = maxf (v; t);  (v; t)g. The first terms in both equations correspond to R(c);s R(c) the type-c jobs that where already in the system by time t = 0, the second terms correspond to the type-c jobs that arrived during the time interval (0; t). In the following we obtain the number of copies per server. Before doing so, we need to introduce some additional notation. Let D (s) = fc 2 C(s) : R(c)  L (c)g be the set of types in server s for which the set of servers where these types receive maximum capacity-to-fraction- l l i of-arrivals ratio isR(c)  L (c). If s 2 L , then, by definition,D (s) 6= ; if l  i andfD (s)g l i l=1 forms a partition of C(s). Furthermore, D (s) = C (s), for all s 2 L . Therefore, for a server i i s 2 L , the number of copies in the server is given by the following expression: i1 X X X X UB UB UB UB M (s) = M (t) = M (t) + M (t): s s;c s;c s;c l=1 c2C(s) c2D (s) c2C (s) The first term of the RHS of the equation corresponds to the type-c jobs in server s that have R(c)  L (c). The second term of the RHS corresponds to type-c jobs in server s that have UB UB R(c)  L (c). Parti-cularly, we note that in the UB system, M (t)  N (t), since s c c2C(s) copies might have left, while the job is still present. In order to prove the stability condition, we investigate the fluid-scaled system. The fluid- scaling consists in studying the rescaled sequence of systems indexed by parameter r. For r > 0, UB;r UB UB denote by M (t) the system where the initial state satisfies M (0) = rm (0), for all c 2 C c;s s;c s;c and s 2 S. We define, UB UB M (rt) M (rt) s;c UB;r UB;r s M (t) = ; and M (t) = s;c s r r In the following, we give the characterization of the fluid model. UB Definition 2. Non-negative continuous functions m () are a fluid model solution if they satisfy the functional equations i1 X X UB UB m (t) = m (0) 1 G   (0; t) + p 1 F   (x; t) dx R(c);s c R(c);s s s;c x=0 l=1 c2D (s) UB + m (0) (1 G(  (0; t))) + p (1 F (  (x; t)))dx ; (7) s c s s;c x=0 c2C (s) for s 2 L and i = 1; : : : ; i , where G() is the distribution of the remaining service requirements, F () the service time distribution of arriving jobs, and UB (v; t) =  (m ~ (x))dx; s s x=v min (v; t) = min f  (v; t)g; s~ R(c) s~2R(c) min (v; t) = maxf  (v; t);   (v; t)g: R(c);s s R(c) The existence and convergence of the fluid limit to the fluid model can now be proved. UB;r Proposition 16. The limit point of any convergent subsequence of (M (t); t  0) is almost surely a solution of the fluid model (7). Proof of Proposition 16 The proof is identical to the the proof of Theorem 5.2.1 in [10] (which min is itself based on Lemma 5 in [15]). We only need to ensure that   (v; t) and   (v; t) are R(c);s R(c) decreasing in v and continuous on v 2 [ (t) +; t], where (t) = sup(v 2 [0; t] : m (u) = 0). s s s min Let us verify that   (v; t) and   (v; t) are decreasing and continuous on v. We note R(c);s R(c) that the function  (; t) that gives the cumulative service that a copy in server s received during time interval (; t), is a Lipschitz continuous function, increasing for t <  and non decreasing for t >  , where  = infft > 0 : M (t) = 0g. s s s min If   (v; t) =   (v; t) and   (v; t) =   (v; t) for all v 2 [0; t) and some s ; s 2 S, s R(c);s s 1 2 1 2 R(c) min then both   (v; t) and   (v; t) are decreasing and continuous on v, since by definition R(c);s R(c) (v; t) is decreasing and continuous on v for all s 2 S. min Let us assume that for v 2 [0; t) is such that   (v; t) =   1 (v; t) for v  v and 0 0 s~ R(c) min 1 2 min (v ; t) =   2 (v ; t), for some s ~ ; s ~ 2 R(c). We first verify that   (v; t) is continu- s~ R(c) 0 R(c) ous on v = v . Since,   (v; t) and   (v; t) are continuous on v = v , then 1 2 0 0 s~ s~ min min lim   (x; t) =   1 (v; t) =   2 (v; t) = lim   (x; t): s~ s~ R(c) R(c) x !v x !v 0 0 min Therefore, we conclude that   (x; t) is continuous on v 2 [0; t). Analogously, one can verify R(c) min that   (x; t) is continuous on v 2 [0; t). R(c);s min We now verify that   (x; t) is decreasing on v 2 [0; t). Let us consider 0 < t < v < 1 0 R(c) min t < t. Then for   (v; t), R(c) min min (t ; t) =   1 (t ; t)    1 (t ; t)    2 (t ; t) =   (t ; t); 1 1 2 2 2 s~ s~ s~ R(c) R(c) min where the first inequality holds since   1 (v; t) is decreasing on v. We conclude that   (v; t) is s~ R(c) decreasing v. Let us verify that   (v; t) is decreasing on v. W.l.o.g. we assume that there exists v 2 R(c);s 0 min [0; t), such that   (v; t) =   (v; t) for v < v and   (v; t) =   (v; t) for t > v > v . 0 s 0 R(c);s R(c);s R(c) Then, min (t ; t) =   (t ; t)    (t ; t)    (t ; t) =   (t ; t) 1 1 s 1 s 2 2 R(c);s R(c);s R(c) where the first inequality holds since   1 (v; t) is decreasing on v. We conclude that   (x; t) s~ R(c);s is decreasing v. 2 We now give a further characterization of the fluid model (7). 26 P Proposition 17. Let i  i and assume  p <  for all l  i 1 and s 2 L . Then, c s l c2C (s) i1 UB there is a time T  0, such that for t  T and for s 2 [ L , m (t) = 0 and for s 2 L l i l=1 UB UB m (t) = m (0) (1 G(  (0; t))) + p (1 F (  (x; t)))dx ; (8) s c s;i s s;c x=0 c2C (s) with (v; t) :=  (m ~ (x))dx; s;i s;i x=v and  (m ~ (x)) := . s;i m (x) s;c c2C (s) Proof of Proposition 17 For simplicity in notation, we remove the superscript UB throughout the proof. First assume s 2 L . SinceD = ;, from Equation (7), we directly obtain m (t) = [m (0) (1 G(  (0; t))) + p (1 F (  (x; t)))dx] ; 8t > 0: s s;c s c s x=0 c2C (s) This expression coincides with the fluid limit of an M=G=1 PS queue with arrival rate P P p and server speed  . Since  p <  , we know that there exists a   such c s c s s c2C (s) c2C (s) 1 1 that m (t) = 0, for all t    . s s The remainder of the proof is by induction. Consider now a server s 2 L and assume there l1 ~ ~ ~ exists a time T such that m (t) = 0, for all t  T and s 2 [ L . Thus, for t  T , also s j j=1 l1 j m (t) = 0 for all s 2 [ L , c 2 D (s), j = 1; : : : ; l 1. We consider server s 2 L . From (7) s;c j l j=1 its drift is then given by: l1 X X X m (t) = m (t) + m (t) s s;c s;c j=1 c2C (s) c2D (s) = m (0) (1 G(  (0; t))) + p (1 F (  (x; t)))dx ; s;c s c s x=0 c2C (s) s s for all t  T . Now note that  (m ~ (t)) = = =  (m ~ (t)), where the s s;l m (t) m (t) s s;c c2C (s) l1 second equality follows from the fact that m (t) = 0 for all for all s 2 [ L , c 2 D (s), s;c j j=1 j = 1; : : : ; l 1. To finish the proof, (8) coincides with the fluid limit of an M=G=1 system with PS, arrival rate p and server speed  . Hence, if l < i, the standard PS queue is stable, and we are c s c2C (s) sure that it equals and remains zero in finite time. 2 Below we prove that the UB system is Harris recurrent. Note that the concept of Harris recurrence is needed here since the state space is obviously not countable, (as we need to keep track of residual service times). We first establish the fluid stability, that is, the fluid model is 0 in finite time. The latter is useful, as we can use the results of [24] that establish that under some suitable conditions, fluid stability implies Harris recurrency, see the lemma below. Lemma 18. If the fluid limit is fluid stable, then the stochastic system is Harris recurrent. Proof of Lemma 18 In [24], the authors consider bandwidth sharing networks (with processor sharing policies), and show that under mild conditions, the stability of the fluid model (describing 27 the Markov process of the number of per-class customers with their residual job sizes) is sufficient for stability (positive Harris recurrence). Our system, though slightly different from theirs satisfies the same assumptions, and as a consequence their results are directly applicable to our model. More precisely, given the assumptions on the service time distribution, our model satisfies the assumptions given in [24, Section 2.2] for inter-arrival times and job-sizes. (In particular exponential inter-arrival times satisfy the conditions given in [24, Assumption 2.2.2].) 2 Equation (8) coincides with the fluid limit of an M=G=1 PS system with arrival rate P P p and server speed  . If  < CAR ; or equivalently  p <  , for all c s l c s c2C (s) c2C (s) i i l = 1; : : : ; i; Equation (8) equals zero in finite time. Hence, from Lemma 18 we conclude that for servers s 2 L , the associated stochastic number of copies in server s is Harris recurrent, as stated in the corollary below. 2 Proof of Proposition 11 We assume that both systems are coupled as follows: at time t = 0, both systems start at the UB UB same initial state N (0) = N (0) and a (0) = a (0) for all c; j; s. Arrivals and service c cjs cjs times are also coupled. For simplicity in notation, we assume that when in the original system a type-c copy reaches its service requirement b, the attained service of its d 1 additional copies is fixed to b and the job remains in the system until the copy of that same job in the UB system is fully served at all servers inR(c). We prove this result by induction on t. It holds at time t = 0. We assume that for u  t it UB UB holds that N (t)  N (t) and a (t)  a (t) for all c; j; s. We show that this inequality c cjs c cjs holds for t . UB We first assume that at time t, it holds that N (t) = N (t) for some c 2 C. The inequality is violated only if there is a job for which the copy in the UB system is fully served at all servers R(c), but none of the copies in the original system is completed. That means, there exist a j such UB that a (t) < a (t) = b for all s 2 c. However, this can not happen, since by hypothesis cjs j cjR(c) UB a (t)  a (t) for all s 2 c. cjs cjs UB We now assume that at time t, a (t) = a (t) for some c; j; s. There are now two cases. If cjs cjs this copy (and job) has already left in the original system, then a (t) = a (t ) = b and hence cjs cjs cj + UB + a (t )  a (t ). If instead the copy has not left in the original system, then by hypothesis it cjs cjs UB UB s s holds that N (t)  N (t) and thus, M (t)  M (t) and  . That means that c s UB c s M (t) M (t) the copy in the original system has a higher service rate at time t than the same copy in the UB + UB + system. Hence, a (t )  a (t ). 2 cjs cjs Necessary stability condition Proof of Proposition 13 In order to show that the LB system is unstable, we investigate the fluid-scaled system. For LB;r LB LB r > 0, denote by N (t) the system where the initial state satisfies N (0) = rn (0), for all c c c 2 C. We write for the fluid-scaled number of jobs per type LB N (rt) LB;r c N (t) = : In the following we give the characterization of the fluid model. 28 LB Definition 3. Non-negative continuous functions n () are a fluid model solution if they satisfy the functional equations LB n (t) = 0; c 2 CnC LB LB LB LB n (t) = n (0) 1 G   (0; t) + p 1 F   (x; t) dx ; c 2 C c c c c x=0 where G() is the distribution of the remaining service requirements of initial jobs, F () the service time distribution of arriving jobs and LB LB LB (v; t) =  (~n (x))dx; with c 2 C : c c x=v LB;r The existence and convergence of the fluid-scaled number of jobs N (t) to the fluid model LB ~n (t) can be proved as before. The statement of Proposition 16, indeed directly translates to the LB;r LB process N (t), since  (v; t) is both decreasing and continuous in v. Therefore, it is left out. LB(t) LB LB Next, we characterize the fluid model solution ~n in terms of m (t) = n (t). s c2C(s) c LB LB We show that if the initial condition for all servers is such that m (0)= = (0) for all s 2 S , s s LB LB then m (t)= = (t) for all s 2 S , where (t) is given below. s s LB Lemma 19. Let us assume that the initial condition is such that n (0) = 0 for all c 2 CnC and LB LB LB for c 2 C , n (0) are such that m (0)= = (0) for all s 2 S . Let c s s LB LB (t) = (0)(1 G(  (0; t))) + (1 F (  (x; t)))dx; (9) CAR x=0 LB LB LB where   (v; t) =  ( (x))dx, with  ( (t)) = . x=v (t) LB Then, n (t) = 0 for all t  0 and c 2 CnC , and LB LB m (t)= = (t); s s for all t  0 and s 2 S . Proof of Lemma 19 From Definition 3, we obtain that for each server s 2 S , LB m (t) 1 s LB = n (t) LB LB s s c2C (s) LB t n (0) p c LB LB = 1 G   (0; t) + 1 F   (x; t) dx : c c LB LB s s x=0 c2C (s) We recall that (t) is defined as LB LB (t) = (0) 1 G   (0; t) + 1 F   (x; t) dx : CAR x=0 LB m (0) We let the initial condition be such that = (0) for all s 2 S and we will prove by LB contradiction that for all t > 0, LB m (t) = (t); for all s 2 S . LB 29 LB LB Let us assume that t is the first time such that there exists s ~2 S such that (t ) 6= m (t )= . 0  0 0 s~ s~ P LB LB P n (0) m (0) p c s Since = = (0) and = 1=CAR ; this implies that there LB LB LB c2C (s) c2C (s) s s s LB LB exist c ~ 2 C and t , 0  v  t < t such that   (v; t ) 6=   (v; t ). However, since 1 1 0 1 1 c~ LB LB (t) = m (t)= for all t < t , this implies that  (~n(t)) = 1= (t) for all t < t and 0 c 0 s s LB LB c 2 C(s), and hence   (v; t) =   (v; t), for all t < t . We have hence reached a contradiction, c~ which concludes the proof. 2 We note that Equation (9) corresponds to the fluid limit of an M=G=1 system with PS, arrival rate =CAR and server speed 1. Assuming  > CAR , it follows that the fluid limit (t), and LB hence m (t); s 2 S , diverges. Now, by using similar arguments as in Dai [8], the fact that the limit diverges implies that the correspon-ding stochastic process can not be tight, and hence cannot be stable. 2 Proof of Proposition 14 LB The number of type-c jobs in the system is given by N (t) = 0; for c 2 CnC , and for c 2 C , 2 3 LB N (0) E (t) X X LB 0 LB LB 4 5 N (t) = 1 b >  (0; t) + 1 b >  (U ; t) : cj cj c cms c c m=1 j=1 LB We note that for all c 2 CnC the result is direct since p = 0 for all c 2 CnC . Then, let us LB LB ~ ~ ~ ~ consider c 2 C . For any N and N such that N  N , the following inequalities hold: P P P ( p ) ( p ) =( p ) s c c s c c2C (s) c2C (s) c2C (s) P P P (N ) = = = M ( p )M N + N s c s c c c2C (s) c2C(s)nC (s) c2C (s) P P P LB ( p ) =( p ) CAR  ( p ) c s c  c c2C (s) c2C (s) c2C (s) P P max LB LB N N s2c M c s c2C (s) c2C (s) LB LB =  (N ): LB The second last inequality holds since CAR   =( p ) for all s 2 S and N  N s c  c c2C (s) LB LB for all c 2 C . We note that N = M (t). It follows from straight forward sample- c s c2C (s) LB path arguments that N (t)  N (t) for all t  0 and c 2 C . 2

Journal

MathematicsarXiv (Cornell University)

Published: Mar 3, 2020

References