Probabilities with Variable Failure Rates 

Success is just failure that hasn't happened yet. 
Catrell Sprewell 

The probability density function δ(t) for the occurrence of a specific event at the time t is defined as δ(t) = dP(t)/dt where P(t) is the cumulative probability, i.e., the probability that the event has occurred by the time t. Letting t_{E} denote the time when event E occurs, the probability that the event will occur during the time interval from t_{1} to t_{2} is given by the integral 



The density function for a specific event generally decreases to zero as time passes, because the event will almost certainly have already occurred after a sufficient amount of time. However, the density function for the event to occur given that it has not already occurred may be fairly constant. For example, if we repeatedly toss a sixsided die, the probability of the first occurrence of ‘4’ is 1/6 for the first toss, but it drops for subsequent tosses, because the first occurrence is likely to have already occurred. Thus it is a priori extremely unlikely that the first ‘4’ will occur on (say) the 20th toss. However, given that the first ‘4’ has not already occurred, the probability of the first ‘4’ occurring on the 20th toss (or any toss) is always 1/6. 

This leads us to define the rate of occurrence, denoted by λ(t) for time t, as the probability density in the next increment of time after time t, given that the event has not already occurred by the time t. Thus letting dt denote the increment of time beginning at time t, and stipulating that P(0) = 0, we can express the definitions of the density function δ(t) and the rate function λ(t) as follows 



where the vertical line signifies “given”. Here the expression t_{E} Ď t signifies that the event E has not occurred by the time t. Recall that a basic property of probability is that, for any two events X and Y, we have 



where P{XY} signifies the conditional probability of X given Y. Making use of this identity, the expression for λ(t) can be written as 



The probability that event E has not occurred by the time t is simply 1 – P(t), so this shows that the rate of occurrence is related to the probability density function by 



In other words, the rate is simply the probability density normalized by the probability that the event has not yet occurred. This is often a useful parameter because, as mentioned above, many naturallyoccurring events have a constant (or nearly constant) rate. Remembering that δ(t) = dP/dt, the relation can be written as a linear firstorder differential equation 



If λ(t) is constant, this has the homogeneous solution P(t) = Ae^{−λt} for some constant A, and the particular solution P(t) = 1. The general solution is the sum of these two solutions. Setting A = −[1 − P(0)] to match the initial value, we have 



If P(0) = 0 this reduces to the familiar formula P(t) = 1 − e^{−λt} for the probability based on the exponential density function δ(t) = λe^{−λt}. The above equation can also be written in the form 



If the rate function λ(t) is not constant, consider a sequence of incremental time intervals, each of duration Δt, during each of which the rate has the virtually constant values λ_{0}, λ_{1}, λ_{2}, ... and so on. In that case we have the sequence of solutions 



Multiplying these together, the probability at the end of these increments is given by 



Thus in the limit as Δt goes to zero and NΔt goes to T, we have 



Consequently we have the result 



If the probability P(0) at the beginning of the flight is zero (as is typically the case), and if we make use of the approximation e^{x} ≈ 1 + x, this equation simply says that P(T) is essentially just the integral of λ(t) from t = 0 to T, which is as expected, because λ(t) is nearly identical to the probability density function ρ(t) in these conditions. In other words, the above equation is just a convoluted way of expressing the tautological fact that the probability for the flight is the integral of the probability density for the flight. 

Regulatory guidance for safety calculations sometimes splits up the total time interval T into several smaller intervals (also called phases) of duration T_{1}, T_{2}, ..., T_{n}, where T_{j} extends from t_{j−1} to t_{j}, and considers separate rate functions λ_{1}(t), λ_{2}(t), ..., λ_{n}(t) during these intervals respectively. Then by the above derivation we immediately have the two equivalent expressions 



However, these equations make little sense, because the rate functions are each defined and evaluated only over disjoint time intervals. In other words, the rate function λ_{1}(t) covers only t = t_{0} to t_{1}, and the rate function λ_{2}(t) covers only t = t_{1} to t_{2}, and so on. Thus these functions really comprise just a single overall rate function λ(t) covering the entire duration from t = 0 to T, and the probability is given by (3). There is no benefit in splitting the function in these terms. (If we wished to define individual rate functions beginning from time zero for each phase, then we could sum the integrals of λ_{i} for t = 0 to T_{i}.) 

Another odd feature of the treatment of these calculations in the regulatory guidance is that they present “two cases”, writing the above equation (with product of exponentials) twice, once with P(0) = 0 and once with P(0) ≠ 0. Furthermore, in the second “case”, they rewrite the above equation as 



Admittedly this is algebraically equivalent to (3), and may have been intended to emphasize that it includes the initial probability, but it’s rather convoluted and unnecessarily elaborate. By the way, the guidance contains a misstatement, saying that the above equation with P(0) ≠ 0 gives “the probability that the element fails during one certain mission”, whereas it actually represents the probability of the fault being present during that mission, including the probability that it failed prior to the start of this mission. Since the failure condition under consideration is generally a catastrophic (or at least hazardous) condition, the relevance of the case P(0) ≠ 0 is dubious. It would be applicable to individual component failures that potentially contribute to the top event and that could be already failed at the beginning of the mission, but the above equation is stated to be for the probability of the total failure condition, not for component failures. (The probabilities of component failures would contribute to the applicable λ(t) functions in the above equations.) 

The published regulatory guidance states that the probability per flight should always be calculated for the average flight length, but it also states that the probability for an average flight may vary, because one or more failed elements in the system can persist for multiple flights (latent, dormant, or hidden failures). The analysis must consider the relevant exposure times, e.g. time intervals between maintenance and operational checks/ inspections. In such cases the probability of the Failure Condition per flight increases with the number of flights during the latency period. To account for this, the guidance states that the probabilities per flight (assumed to be of average duration) for each flight during either the entire life of the aircraft or the least common multiple of the latency periods exposure times should be summed and then divided by the number of missions. If the system is verified to be fully healthy at the beginning of each flight, then the latency period is just one flight, then every flight of average duration would have the same probability, and hence this step in the calculation would be superfluous. In that case the average value of “probability per average flight” would simply be P(T_{ave}), i.e., the probability of failure in a flight of average duration. The last step in the guidance is to divide this by the average flight duration, giving P(T_{ave})/T_{ave}, and this is the value that is compared with the numerical threshold (e.g., 10^{−9}/hr for catastrophic failures) to determine compliance. 

The guidance does not mention (at least not explicitly) that, even if the system is verified to be fully healthy at the start of each flight, the overall probability of the failure condition occurring during the life of the airplane can also be affected by variations in flight length from the average. The only (tacit) acknowledgement of this is in the stated caveat that if P(T_{ave})/T_{ave} “is likely to be significantly different from the predicted average rate of occurrence of that failure condition during the entire operational life of all airplanes of that type, then a risk model that better reflects the failure condition should be used”. The only way that P(T_{ave})/T_{ave} can differ from the predicted longterm rate of occurrence is due to variations in flight length if the probability per flight is highly dependent on flight length. Recall that some failure modes contribute the same risk per mission, regardless of the duration of the mission, whereas the risk contributed by other failure modes varies in proportion to the mission duration. If we were limited to just these two kinds of failures, the probability of the jth mission with duration T_{j} could be written as P_{j} = P_{c} + kT_{j} where P_{c} is the constant contribution and k is the proportionality of the scaledependent contribution. If we add up the probabilities for N missions and then divide by N (as directed in the guidance), the result is P_{ave} = P_{c} + kT_{ave}, which is exactly equal to the probability for the mission of the average duration. In view of this, one might think that variations in flight length have no effect on the longterm average. However, if the probability per flight varies as the square or cube or, in general, the nth power of the flight length, then it does have an effect on the longterm probability of occurrence in the life of the airplane. In such cases, the actual longterm average probability of occurrence is increased by the factor (T^{n})_{ave}/T_{ave}^{n} as explained in the note on Regulating Risk. In most realistic cases this factor is close to 1, so it is often neglected, but it can be significant in extreme cases. 
