Markov Modeling for Reliability

 

1.1  Introduction

 

The reliability of a complex system is typically analyzed in terms of a set of basic events, usually representing failures of independent components of the system. Typically each basic component has a specified failure rate and exposure time, from which the probability of each component being failed at the end of its respective exposure time can easily be computed. However, computing the probability of a combination of basic component failures can be more complicated, for several reasons. First, the reliability criteria may be expressed in terms of the long-term average probability, which involves integrating the sawtooth product of the individual probabilities with different periods. Second, the overall system failure may depend on the order (sequence) of component failures. Third, and most importantly, there may be specified inspection and repair intervals for combinations of components, so it’s necessary to account for the exposure time of each combination of failures, not just of the basic component failures.

 

Markov models provide a mathematical method of evaluating the reliability of systems in such a way that all of these complications are naturally and efficiently taken into account. Markov models (named after the mathematician Andrey Markov) have the defining property that the transition rates of a system depend only on its current state, not on its past history. This “memoryless” characteristic is often called the “Markov property”, and it applies to systems in which the transition rates from one state to another are either constant or else depend only on global time or mission phase, but not on how or when the system arrived at its current state. Many real-world physical systems can be accurately represented by suitably-defined Markov models.

 

One notable strength of Markov models for reliability analysis is that (unlike fault trees) they can easily account for a variety of inspection and repair strategies. This makes Markov models particularly useful for assessing the reliability of one or more devices with established maintenance intervals. With the advent of high-integrity “fault-tolerant” systems, the ability to account for repairs of partially failed (but still operational) systems has become increasingly important. Markov modeling is well-suited to the task of determining inspection and repair intervals needed to achieve a desired level of reliability.

 

A prominent example of the use of numerical probability analysis is in the field of aviation safety. Section 7 presents a detailed account of the regulatory requirements and how compliance with those requirements can be shown by means of suitably-defined Markov models.

 

Return to Markov Models and Reliability