Reliability With Mixed Periodic Repairs 

Consider a dual redundant system consisting of two components, each of which has a failure rate of l. One of the components is checked and (if necessary) repaired periodically once every t hours, and the other component is checked and repaired once every mt hours, where m is some positive integer. The overall system is considered failed if both components are failed simultaneously, at which point the system is immediately repaired or replaced with a completely healthy system. 

One simple way of evaluating the long term system failure rate is to approximate the repair transition for each state as continuous with constant rate equal to the reciprocal of the mean time to repair for that state. If the rate of entering the state is much less than the repair rate, then the system will enter the state at times that are almost uniformly distributed over the inspection interval, so the mean time to repair will be about half of the inspection interval. However, if the rate of entering the state is large compared with the repair rate, the mean time to repair will approach the full inspection interval. For a failure state j that is entered directly from the fullup state with a rate l_{j}, the initial probability density for entering the failure state is roughly 

_{} 

where Sl denotes the sum of the transition rates exiting the fullup state. Hence we can approximate the mean time of entry for systems that enter state j over an interval t as 

_{} 

and so the mean time to repair for these systems is approximately 

_{} 

where t is the inspection (and repair) interval. Naturally this is approximately t/2 for sufficiently small values of t. Using this formula, we can determine suitable values for the repair rates m_{1} and m_{2} of our dual redundant system, and the system can then be represented by the simple Markov model shown below. 


The steadystate system equations are 

_{} 

Solving these for P_{1} and P_{2}, and inserting into the conservation equation P_{0} + P_{1} + P_{2} = 1, we can then solve for P_{0}. From this we can express the system failure rate as 

_{} 

Does this simple approach accurately reflect the system response? We have substituted exponentiallydistributed continuous repairs for periodic discrete repairs, so it would be of interest to evaluate the original system precisely, with no approximations, to allow us to compare the exact result with this approximate formula. The exact system is as shown in the figure below. 


To solve for the exact longterm system failure rate, we first consider the timedependent response of the original system, leaving out the discrete repairs. The system equations are 

_{} 

In terms of the sum S(t) = P_{1}(t) + P_{2}(t) and the difference D(t) = P_{1}(t) – P_{2}(t), these equations can be separated and written in the equivalent form 

_{} 

These are the governing equations in between discrete repairs. Thus during the period following the jth repair of State 1 we have 

_{} 

Our strategy is to begin with S_{j}(0) = D_{j}(0) = 0, and evaluate the results for m consecutive intervals of duration t, and between each interval we will move all the accumulated probability from State 1 to State 0. The system failure rate is lS, and the average of the average rates for all m intervals represents our overall system failure rate. 

Noting that P_{1} = (S+D)/2 and P_{2} = (SD)/2, we can say that the values of S and D at the end of one interval are related to the values at the start of the next (after P_{1} has been set to zero while P_{2} is held constant) according to the formulas 

_{} 

The left hand relation signifies that S_{j}(0) = D_{j}(0) for all j. Making this substitution in the right hand relation, and also replacing D_{j}(t) with D_{j}(0)e^{lt}, we get 

_{} 

Now we can substitute for S_{j}(t) from the previous expression for S_{j}(t) at t = t, to give the recurrence relation for the values of S at the start of each interval 

_{} 

In general, given a linear recurrence of the form s_{j+1} = As_{j} + B with the initial condition s_{0} = 0, it’s easy to show that s_{j} = B(1A^{j})/(1A). Therefore, we can express the value of S at the start of the jth interval in closed form as 

_{} 

Furthermore, we know the average rate over the jth interval is 

_{} 

Carrying out the integration and inserting the previous expression for Sj(0), we get an explicit expression for the average system failure rate during the jth interval 

_{} 

The average system failure rate over all m intervals is 1/m times the sum of the individual average rates. Only one of the terms in the expression for l_{j} involves j, so the other terms all appear unchanged in the overall failure rate. For the remaining term, we can sum the geometric series and divide by m to give the final result 

_{} 

We can compare this with the ordinary Markov model approximation with continuous repair transitions discussed previously, making use of the estimated mean times to repair for determining the repair transition rates. The figure below shows the results for the case m = 10, i.e., the inspection interval for state 2 is ten times the inspection interval t for state 1. The left hand figure shows the system failure rates for the two methods, and the right hand figure shows the difference between them. 


Clearly the agreement is very close for this case, but interestingly the agreement is not nearly as good if we consider the symmetrical case, i.e., with m = 1, signifying that both components have the same inspection interval. A comparison for the two methods in this case is shown below. 


This shows that the overall system failure rate is appreciably greater with periodic repairs than with continuous repairs with (approximately) the same mean times to repair. Recall that the continuous repair model uses, for the jth state, a repair transition rate equal to the reciprocal of the estimated mean time to repair for a system entering that state, i.e., 

_{} 

where t is the duration of the inspection interval for the jth state and Sl is the sum of the transition rates exiting the upstream state. In our case we have Sl = 2l, since there are two transitions exiting the fullup state, each with rate l. Also, in the symmetrical case m = 1, the inspection intervals are t for both states. Therefore, inserting the rate 

_{} 

into the steadystate solution for the symmetrical Markov model with continuous repairs, we get the approximation 

_{} 

We can compare this with the exact periodic solution, which in the symmetrical case m = 1 reduces to 

_{} 

This result can also be derived by direct integration (as shown in another note), and we see that it does indeed differ from the prediction for continuous repairs. However, if we took Sl equal to 3l instead of 2l, the Markov model with continuous repairs would give exactly the same system failure rate as the system with periodic repairs. This corresponds to increasing slightly the mean time spent by the system in each of the partial failure states. Nevertheless, the setting Sl equal to 2l definitely gives a better fit in the asymmetric cases, so no single value is optimum for all cases. Presumably the system failure rate with periodic repairs is greater than the rate with continuous repairs (with the same mean repair times) due to the lack of statistical independence between the two component failures with synchronized inspections. If the periodic inspections were staggered, we would expect the results to more closely match the continuous repair model with Sl = 2l. Still, it’s remarkable that, in the symmetrical case, this nonindependence effect in the periodic model is exactly reproduced by setting Sl = 3l. 
