Markov Models with Aging Components 

The original defining property of Markov models was that the state transitions were memoryless, which is to say, the transition times are all exponentially distributed with constant rate parameters. The time variable enters into the governing equations only in the differential sense, insofar as the derivatives of the state probabilities with respect to time appear. The equations are independent of the absolute value of time, so any arbitrary value can be taken as the time origin. For this reason, such models are said to be homogeneous in time. Moreover a single time parameter suffices for the coordination of the entire model, so such models possesses temporal coherence. 

More general models, in which the transition rate parameters are allowed to vary as functions of the time, are said to be nonhomogeneous in time (viz, they behave differently depending on the absolute value of the time). However, such models may still be based on a single coherent time parameter, so they are still not suitable for modeling the behavior of systems for which the state transitions vary as functions of the ages of the components. In a system with aging components, renewals of those components will generally result in a range of multiple distinct ages for the components of the system in any given state. Such a system could be regarded as homogeneous in time, because the behavior does not depend on the absolute value of time, but it does depend on the ages of the components relative to their renewal times. Of course, the “age” of each component can be regarded as a time variable, so the model is not timehomogeneous in that sense, but more significantly, no single time variable suffices to coordinate such a system, because each component must carry its own age – and in fact must carry a distribution of possible ages – as the overall system evolves. Therefore, even though the system is still (in the sense indicated) homogeneous in time, it does not possess temporal coherence. 

As an aside, the concept of multiple time parameters, one for each component (particle) of a system, has been discussed in the context of quantum mechanics. Recall that, in nonrelativistic quantum mechanics each coordinate of the position (and momentum) of each particle must be treated as a separate dimension, and the laws governing the behavior of the system operate within this higherdimensional “phase space”. In a sense, the spatial coordinates of the various components are allowed to be decoherent, but a single coherent time parameter is retained. It has often been suggested that, to do justice to a relativistic version of quantum mechanics, in which the time and space coordinates play comparable roles, we ought to allow each entity to have its own time dimension, so the phase space of n entities would have 4n instead of just 3n positional dimensions. Although this is a very compelling idea, no one has ever succeeded in developing it into a full theory. Roger Penrose commented that “a basic difficulty with allowing each particle its own separate time parameter is that then each particle seems to go off on its merry way off into a separate time dimension”. 

This same “basic difficulty” is encountered in Markov models used in reliability analysis, when the failure rates of the components are allowed to be functions of the ages of those components, so each one effectively has its own time coordinate. However, aside from Monte Carlo simulations, the usual approach is to treat these systems as inhomogeneous yet coherent in time, i.e., to assume that a single time parameter suffices to coordinate the system evolution. As a result, it can be applied only to system in which each component “goes off on its merry way”, with no interaction between them. In other words, the model must be “factorable” into the individual component histories, each of which can be separately evaluated, so the combined model is really just a formal aggregation of independently evolving components. To treat analytically the general renewal problem for systems with rates that depend on the ages of the components, the most important feature is not inhomogeneity, but rather decoherence. 

To illustrate, consider a simple system consisting of two redundant components, each of which may be in one of two states, healthy or failed, and suppose we allow for repairs of failed components at constant rates. If the failure rates of the components are constant, and if we stipulate that both components are repaired immediately if they are both failed, then the Markov model for the system is as shown below. 


The failures of the second components are shown returning to the fully renewed state, because the fully failed state is repaired immediately. The two state equations for this homogeneous model are 

_{} 

along with the gauge condition P_{0} + P_{1} + P_{2} = 1. This system of equations is easily solved for the transient behavior for any given initial condition. Also, by setting the derivatives to zero, we can solve for the steadystate probabilities 

_{} 

where 
_{} 

Now suppose that instead of constant failure rate parameters l_{1} and l_{2}, each of the components has a failure rate parameter that is a function of the age of that component since it was last renewed. We will let t_{1} and t_{2} denote these ages. For clarity, we will begin with discrete functions, and let the failure rate parameters be specified for four discrete ages, 0, Dt, 2Dt, and 3Dt. For each macrostate with n healthy components we must account for 4^{n} microstates, representing all possible combinations of ages for the n operational components. (The age of a failed component is irrelevant.) Hence the complete Markov model representing this system is as shown below. 


The blue arrows in this figure represent the aging transitions, each with a rate parameter of 1/Dt. Only a few of the failure and repair transitions are shown, but all the others follow the same pattern of the ones shown. The shaded yellow microstate represents the fully functional state in which the ages of the two operational components are t_{1} = 2Dt and t_{2} = Dt respectively. Therefore, the failure transition to the macrostate with the first component failed has a rate parameter of l_{1}(2Dt), and since the age of the second component doesn’t change at the instant of failure of the first, this transition flows into the microstate (shaded green in this drawing) with t_{2} = Dt. There are only two ways of exiting the green microstate (in addition to the aging transition). First, there could be a failure of the second component, in which case the system is totally renewed, so it flows back to the fullup state with ages t_{1} = t_{2} = 0, as shown by the red path. Second, the first component could be repaired, which would renew the first components but would not affect the age of the second. Hence this transition would flow into the fullup state with t_{1} = 0, t_{2} = Dt, as shown by the green path. The transitions from the yellow fullup state to the upper macrostate, and back, are defined similarly. 

This system consists of 24 microstates in all. We can evaluate the timedependent response of such a model by treating it as a Cauchy problem. In other words, we specify the density functions of all the states at some initial time, and then we have equations giving the rates of change of those density functions as a function of time. However, in the remainder of this note we will focus on determining the steadystate solution of such a model. It should be noted, however, that the time required for decoherent model to reach steadystate may be very great, so in practical applications (such as reliability analysis) consideration should be given to the timedependent solution. 

To solve for the steadystate probabilities and flow rates, we set the derivative of each state probability to zero, so the flow into each state equals the flow out of that state. This gives us 24 linear equations in the 24 unknown microstate probabilities which, together with the gauge equation (sum of all probabilities is unity) can readily be solved. It’s convenient to designate the microstates by subscripts denoting the ages (in increments of Dt) of the operational components, with asterisks (*) for the ages of the failed components. With this notation, the steadystate equation for the yellow microstate in the above diagram is 

_{} 

where q is the aging rate parameter, i.e., q = 1/Dt. There are nine equations of this type. The steadystate equation for the green microstate, taking into account that it is fed by all the fullup microstates with t_{2} = Dt, is 

_{} 

There are eight equations of this type, except that the boundary states lack one of the “q” aging transitions. The equation for the microstate immediately below the yellow microstate is similar to that for the yellow microstate, except that it receives flow from the purple state in the drawing above, and it receives no “aging input”, because it is in the zeroth row. Thus we have 

_{} 

There are six equations of this type. Finally, the equation for the single completely renewed fullup state is 

_{} 

The gauge condition for this model is 

_{} 

This can be used in place of the equation for the fullyrenewed state. 

Notice that equation (3) for the green microstate can be expressed in the form 

_{} 

More generally, for any of the microstates in the macrostate with the first component failed, we have 

_{} 

for any k greater than zero. (Boundary states are exceptions, because they lack one or the other of the aging transitions, but those will be treated appropriately when we sum these microstates to determine the probabilities of the macrostates.) 

Even with constant failure rates, this approach can be used to determine the steadystate distributions of ages within each of the states of the model. Let’s consider the same model again, but this time with constant rates l_{1}, l_{2} and a larger number of time increments. Letting a_{ij} , b_{j}, and c_{j} denote the probabilities of macrostates 0, 1, and 2 respectively, the steadystate system equations are 

_{} 

_{} 

_{} 

_{} 

_{} 

To check the validity of these equations, first notice that the sum of equations (10) and the sum of equations (11) give, respectively, the relations 

_{} 

which are just the steadystate system equations (m_{1} + l_{2})P_{1} = l_{1}P_{0} and (m_{2} + l_{1})P_{2} = l_{2}P_{0}. Also, summing all of equations (7) and (8) together, we get 

_{} 

Now, the sum of equations (8) is 

_{} 

and from this it follows that the sum of the “perimeter” probabilities is 

_{} 

Substituting this into (12) gives 

_{} 

which is just the sum of the basic system equations. 

Before algebraically solve these equations, it will be useful to solve the discrete system numerically by integrating the timedependent equations to give the age distributions for a system with constant failure rate parameters of l_{1} = (50)10^{6} per hour and l_{2} = (20)10^{6} per hour, and repair rate parameters of m_{1} = 1/100 and m_{2} = 2/1000. The age increments are approximately 10000 hours so, for example, the probability of the system being in the fullup state with the age of the first component in the range 2000030000 hours, and the age of the second component in the range 4000050000 hours, is 0.00594. 

State 0: 
0 1 2 3 4 5 6 7 8 9 10 
0 .08711 .04025 .03355 .02796 .02330 .01941 .01618 .01348 .01123 .00936 .00780 
1 .02563 .05124 .02368 .01973 .01644 .01370 .01142 .00952 .00793 .00661 .00551 
2 .01717 .01508 .03014 .01393 .01161 .00967 .00806 .00672 .00560 .00467 .00389 
3 .01145 .01010 .00887 .01773 .00819 .00683 .00569 .00474 .00395 .00329 .00274 
4 .00763 .00673 .00594 .00522 .01043 .00482 .00402 .00335 .00279 .00232 .00194 
5 .00509 .00449 .00396 .00349 .00307 .00614 .00283 .00236 .00197 .00164 .00137 
6 .00339 .00299 .00264 .00233 .00206 .00181 .00361 .00167 .00139 .00116 .00097 
7 .00226 .00200 .00176 .00155 .00137 .00121 .00106 .00212 .00098 .00082 .00068 
8 .00151 .00133 .00117 .00104 .00091 .00081 .00071 .00062 .00125 .00058 .00048 
9 .00100 .00089 .00078 .00069 .00061 .00054 .00047 .00042 .00037 .00073 .00034 
10 .00067 .00059 .00052 .00046 .00041 .00036 .00032 .00028 .00025 .00022 .00043 

State 1: 
0 1 2 3 4 5 6 7 8 9 10 
.00081 .00068 .00057 .00048 .00040 .00033 .00028 .00023 .00019 .00016 .00013 

State 2: 
0 1 2 3 4 5 6 7 8 9 10 
.00306 .00218 .00146 .00097 .00065 .00043 .00029 .00019 .00013 .00009 .00006 

Note that this solution was based on a finite model extending only out to 400000 hours, and we have printed only out to the first 100000 hours. Such systems can always be solved numerically, either by integrating the dynamic equations until they reach steady state, or by explicitly solving the steadystate system equations. It appears that the age distributions have an exponential form, as predicted, but we can also see that the boundary microstates are discontinuous from the rest of the states. We also see that the effect of this discontinuity appears to propagate along the diagonal of State 0, although the nondiagonal microstates seem to conform (at least roughly) to the expected joint exponential distribution. 

Notice that along each diagonal the ratio of successive terms is very closely equal to 1.70. This can be understood from the basic governing equations (9), which give 

_{} 

This is exact for the discrete case. In the following we will let w denote the reciprocal of this ratio, i.e., we define the parameter 

_{} 

To explicitly solve the entire set of equations (7)(11) we first observe that the sum of the equations is zero, so they are not independent. Also, the equations are homogeneous, so we can omit (7), and divide all the other variables by a_{00} to give the new set of normalized variables 

_{} 

We can solve the system equations (absent (7)) for these variables, and then determine the absolute values of a_{00} and the original variables by invoking the gauge equation, i.e., the requirement that the sum of all the probabilities is unity. 

For convenience we define the following parameters to denote certain frequently occurring quantities: 

_{} 

Now begin by using equations (8) and (10) to determine a recurrence relation for the values of b_{n}. We can write the third of equations (10) as 

_{} 

Making use of the ratio w, the quantity in parentheses on the right side can be written as 

_{} 

The sum on the right side of this equation appears in the second of equations (10), so we can make that replacement to give 

_{} 

Now we can substitute for a_{02} from equation (8), rearrange terms, and generalize the indices on the values of b_{n}, (noting that the same relation applies to all), to give the secondorder homogeneous recurrence 

_{} 

This is valid for all n = 0, 1, 2, … The characteristic roots of this recurrence are 

_{} 

Likewise the sequence of g_{n} values satisfies the recurrence 

_{} 

and the characteristic roots of this recurrence are 

_{} 

In terms of these roots, the values of b_{n} and g_{n} can be written explicitly as 

_{} 

Hence, given b_{0}, b_{1}, g_{0}, and g_{1}, we can compute all the b_{n} and g_{n} values, from which we can compute the values of a_{0n} and a_{n0} using equations (8), and finally all the remaining a_{mn} values using equations (9). It only remains to determine the four values b_{0}, b_{1}, g_{0}, and g_{1}. 

We haven’t yet made use of the first members of the equation sets (10) and (11), so we can use them (after normalization) to impose conditions on the four principal unknowns. Substituting for the a_{mn} values in the first normalized equations of sets (10) and (11), we have 

_{} 

The sums of the b_{n} and g_{n} values can be expressed in terms of geometric series using the preceding formulas. For example, the sum of all the b_{n} values is 

_{} 

and similarly for the g_{n} values. From this we get the following expressions for sums beginning from the index n = 1: 

_{} 

Inserting these sums into the previous equations, we get two equations in the four principal unknowns: 

_{} 

Repeating this for the second members of the normalized sets (10) and (11), and making use of (8) to eliminate the extra a values, gives two more equations in the four unknowns 

_{} 

These two equations can be simplified by using the previous two equations to eliminate the expressions for the summations. When this is done, the equations reduce to 

_{} 

Recalling the coefficients of the characteristic polynomials, we see that these equations can be written as 

_{} 

Making these substitutions into (16) and (17), we get two equations in the two unknowns 

_{} 

Solving these equations gives the result 

_{} 

where 
_{} 

For the numerical example whose solution was tabulated above, these formulas give 

_{} 

With these values, equations (14) and (15) can now be used to compute all of the b_{n} and g_{n} values, and then equations (8) and (9) give all the a_{mn} values, noting that a_{00} = 1. It only remains to determine the absolute value of a_{00}, which we do by imposing the gauge condition, i.e., setting the sum of all the probabilities to 1. This implies that the sum of all the normalized variables is 1/a_{00}. We sum all the variables as follows 

_{} 

Evaluating these summations, we get 

_{} 

Inserting the numerical values from our tabulated example, we get the value a_{00} = 0.08711…, in agreement with the table. Also, multiplying the b_{n}, g_{n}, and a_{mn} values by a_{00} gives values of b_{n}, c_{n}, and a_{mn} in agreement with the table. 

The results of this note are extended to cover continuous models (i.e., in the limit as Dt goes to zero) in the article entitled Age Distributions in Continuous Markov Models. 
