OpenLoop and ClosedLoop Markov Models 

The essential differences between openloop and closedloop Markov modelling methods can be illustrated by a simple 3state model {S_{0}, S_{1}, S_{2}} where S_{0} is the fullup condition, S_{1} is the partially failed condition, and S_{2} is a complete failure (a "shutdown"). Failure transitions S_{0}®S_{1} and S_{1}®S_{2} are exponentially distributed with rates "a" and "b" respectively. Periodically (every T hours) all partially failed systems are repaired. If a system experiences a shutdown it is repaired immediately and returned to service. Our objective is to determine the mean time between shutdowns as a function of the inspection/repair interval T. 

A closedloop Markov model for this system is shown below. 



The repair transition S_{1}®S_{0} is treated as exponential with rate c = 1/MTTR, where MTTR is the mean time to repair of partial faults. If T is sufficiently small the value of MTTR is approximately T/2, but to cover the whole range of possible T values we need to use 



Strictly speaking the transition S_{2}®S_{0} is also treated as exponential (with rate "d"), but since the rate of entering S_{2} is independent of the rate of leaving S_{2}, this rate always drops out of the final result. 

The steadystate solution of the closedloop model is found by solving the equilibrium equations 



which give



From this we can compute the rate of entry into S_{2} as follows 



(The symbol (dP2/dt)+ signifies the positive component of dP2/dt, because we want the rate of entry into S_{2}.) Setting c = 1/MTTR we have the result 



Notice that as T increases to infinity the rate approaches ab/(a+b), which we recognize as the correct rate assuming partial failures are never repaired. 

The openloop model for this system is shown below. 



The idea of this approach is to simulate the actual periodic repair process (rather than treating the transition S_{1}®S_{0} as exponential) by running the model without repairs for an interval of length T, and then inferring the effective shutdown rate from the probability of the shutdown state at time T. 

Assuming P2(0) = 0 (i.e., the partial failure state is empty at t = 0) the dynamic solution of the openloop model (with d = 0) is 



A simple method of approximating the MTBF of this system would be to set P0(0) = 1 and compute the value of P2(T). Then if we assumed this probability accumulated in S_{2} at a constant rate l, we could solve the equation P2(T) = 1  e^{l}T for l, which gives 



This value of lopenloop is plotted along with lclosedloop in the figure below. (For illustration purposes we have taken a = 0.002 and b = 0.001) 



As can be seen, the two values agree for small T, but they differ as T increases. We know the closedloop model gives the correct asymptotic rate as T increases (which can be verified using numerical simulation), so there is evidently something wrong with the simple openloop approach just described. The main problem is the assumption that the probability accumulates in S_{2} at a constant rate, which is not generally the case. The initial shutdown rate is quite low (because S_{1} starts out empty) and then increases. In general, two different probability density functions that give equal values of P2(T) can have different MTBFs, because of how the failures are distributed during the interval T. This means we cannot actually infer the true MTBF from the openloop value of P2(T) alone. For nonconstant failure rates the only way to determine the actual mean time for a system to fail is by integrating t times the failure density function d(t) using the formula 



Then the "effective failure rate" can be defined as 1/MTBF. Unfortunately the density function for a system with periodic repairs is somewhat complicated. The probability of State 2 as a function of time during a single inspection interval is 



where t = 0 at the start of this interval. Letting P_{j}[k] denote the value of P_{j}(0) at the start of the kth period, we have P2[k] = 1  P0[k] (because the partial failure state is empty at the start of each interval). Also, we have P0[0] = 1, and the above equation implies that 



for all k > 0, where Tj is the duration of the jth period. Fortunately, all the T_{j} except possibly T0 are of equal duration, which we will call simply T. Therefore, if we define 



we have 


for all k>0. For convenience let q0 denote q(T0) and q denote q(T). Then the probability density function for the initial period T0 is 



and for all subsequent periods the density function is 



The MTBF of the system is then given by 



Evaluating the integrals and summing the resulting geometric series we finally arrive at the result 



Taking a strict openloop approach (neglecting the repair transition shown as a dotted line in the openloop schematic), the system always starts at t = 0 at the beginning of the initial inspection/repair interval, and so we have T0 = T. The resulting effective shutdown rate is shown in the figure below along with the closedloop rate. 



The match is a little better than the previous openloop method  at least now they both have the correct asymptotic rate as T goes to infinity. However, the openloop method approaches the asymptote much more quickly than the closedloop method. Which method is correct? 

The answer is that both methods are correct, but they represent different things. The openloop method (without the dotted repair transition) always begins the first inspection/repair interval at t = 0. The MTBF of the system in those circumstances is as given by the openloop model with T0=T. However, in practice we don't resynchronize our inspection/repair cycle each time we have a shutdown. For example, if a system fails halfway through our inspection interval we fix it and return it to service, so it's initial inspection interval (on the way to it's next failure) is only half of the normal interval. Since a shutdown can occur anywhere during the interval, the actual value of T0 can be anything from 0 to T, with a mean value of T/2. 

The figure below shows the openloop rate, taking T0=T/2, along with the closed loop rate. As can be seen, the results of the openloop method with T0=T/2 are almost identical to the closedloop prediction. This result is confirmed by numerical simulation of the system with periodic inspections every T hours, assuming the inspection cycle is not resynchronized with each shutdown. 



Thus we can achieve fairly consistent results using the openloop with strictly periodic (as opposed to exponential) repairs. This shows that there is not much difference between periodic and exponential repair transitions. Of course, if we're willing to represent the repair transition from S_{1} to S_{0} as exponential with constant rate c = 1/MTTR, as we did with the closedloop model, then we can get exactly the same answer as we got using the closedloop model. On this basis the openloop model is as shown below. 



Since P_{2}(t) = 1  P_{0}(t)  P_{1}(t) the two governing equations are simply 



The characteristic roots of this system are 



With the intial condition P_{0}(0)=1 the solution is 



Consequently the cumulative probability function for state 2 is 



and the probability density function for entering state 2 is the derivative of this, i.e., 



The mean time to reach state 2 from state 0 is given by the integral 



Evaluating the integral give 



which is exactly equal to the reciprocal of the closedloop failure rate. 

In summary, the it is usually possible to represent periodic repairs as exponential transitions, and it is possible to achieve consistent results using either the closedloop or the openloop approach. However, the openloop approach generally requires much more effort, and for complicated systems it quickly becomes impractical. 
