## Markov Models Of Dual-Redundant Systems

Many practical systems require numerous sub-functions to be performed in order to support the overall function, and they are designed to be dual-redundant on each sub-function.  Failure of the first component of any particular sub-function is annunciated, and repaired at a suitable rate to preclude complete failure.  In addition, there is a possibility of a single failure from the "full-up" condition leading to overall functional failure.  This type of system with n sub-functions is often represented by a Markov model as shown below.

This model neglects the possibility that multiple sub-functions may be degraded at any given time. In other words, if one component of a certain sub-function fails, it is assumed that the next transition is either a repair back to the full-up state or else a failure of the other component of that same sub-function, resulting in total system failure. Strictly speaking, it’s possible that a component of some other sub-function might fail before either of those two transitions occurs, and this would place the system in a state with two of the sub-functions partially failed. However, this would affect the overall system failure rate only if both components of that second sub-function failed within the repair interval of the first sub-function, and both of those failures would need to occur prior to the failure of the second component of the first sub-function. Thus (for example) the transition rate l1,n+1 from state 1 to state n+1 should actually be augmented by the rate of failing both components of sub-function 2, which at the end of the repair interval T would be just (l2,n+1)2T. Similar terms would be needed to account for the other sub-functions, but in many practical situations the repair intervals are small enough so that these second-order terms are negligible, especially since T is generally chosen so that the probability of even one fault occurring during that interval is quite low, and hence the probability of two is extremely low. This is why the simplified model shown above is often a useful representation of dual-redundant system reliability.

If we designate the “full up” state with the number 0, the n degraded states with the numbers 1 through n, and the final failure state with the number n+1, then the equations of the system can be written as

For the steady-state condition we have dPj/dt = 0 for all j, so we can solve the central equations to give the values of P1 through Pn as a function of P0

where li,j signifies the rate of transition from state i to state j.  Since the sum of all the state probabilities from P0 to Pn+1 is 1, we have

# The exact failure rate for entering state n+1 is therefore

Naturally this rate is independent of ln+1,0, because the rate is, by definition, a measure of the propensity to enter a particular state for entities that are not presently in that state, which is clearly independent of the rate of leaving that state.  Interestingly, if we define lj,j as infinite for each j, none of the state equations are affected, because the infinite "self-transition" flow Pjlj,j is both added to and subtracted from the jth equation, but this enables us to write the equation for the failure rate in the more unified form

This form emphasizes the fact that the overall rate for entering state n+1 is simply the weighted average of the individual transition rates lj,n+1 from each of the states 0, 1, 2, ..., n, with each rate weighted in proportion to the steady-state probability Pj of the respective state.

Since the time spent in the (n+1)th state is irrelevant to our result (because the reliability of the operational fleet does not depend on the length of time that inoperative systems are absent from the fleet while being repaired), we could simplify the analysis by deleting that state from the model, and point the total failure transitions directly to the full-up state.  This is illustrated for a simple system with just two partial failure states in the figure below.

From the equations

we have the steady-state relations

Substituting into the conservation equation P0 + P1 + P2 = 1 allows us to easily solve for the steady-state probabilities

The total failure rate can then be computed as

This immediately generalizes to give the formula for N partial-failure states as shown previously:

Suppose there are only two distinct repair rates, denoted by ma = 1/Ta and mb = 1/Tb, and we wish to express the overall rate as an explicit function of the repair times Ta and Tb.  Let the indices 1 through n signify the states with repair rate ma, and the indices n+1 to N signify the states with repair rate mb.  We can then re-write the above equation in the form

Assuming the failures rates are smaller than the repair rates, we can expand the fractions in powers of s/m, and to the first order we get

Collecting terms in Ta and Tb, we arrive at

where the Ai and Bi coefficients are given by the summations