Model Based Reasoning for Fault Detection and Fault Diagnosis
This page examines modelbased reasoning approaches as part of A Guide to Fault Detection and Diagnosis.
One of the major distinctions in approaches to fault detection & diagnosis is whether or not explicit models are used, and what type of models are used. When models of the observed system are used as a basis for fault detection and diagnosis, this is often referred to as "model based reasoning". Defining the models then becomes a significant part of the application development effort. An “engine” combines the model knowledge with observed data to derive conclusions at run time. An Ishikawa “Fishbone” diagram used in Statistical Quality Control (SQC) is an example of a qualitative model. Although the diagnosis in that case is often done manually as the diagram is being built, those diagrams could (and should) be captured and subsequently used online in an automated system as well.
There are many advantages to using models, such as:
 Models can be used for prediction of the impacts of faults as well as diagnosis  the model is independent of the application.
 Application development is formalized; often easier to review and reuse with fewer errors for multiple instances of the same equipment
 Assumptions and limitations are likely to be clearer
 Especially for first principles models, they are likely to reflect physical laws rather than observed coincidences that might only be true under certain conditions.
There are many variations on model based reasoning. The models might represent normal operations or abnormal operations. They might be quantitative (based on numbers and equations) or qualitative (for instance, based on cause/effect models), causal or noncausal, “compiled” vs. “first principles”, probabilistic vs. deterministic, and so on. These variations are outlined next.
Models of abnormal vs. normal operation
For systems characterized by numerical variables, engineering models such as algebraic equations or differential equations can be used as models of normal operation. Neural nets, when trained as function approximators, can also be used as models of normal operation. Other forms of models defining normal operation include state transition diagrams. Fault detection then involves checking that these models are being followed in the observed sensor data. For instance, the equations in a mathematical model should be satisfied within some tolerance, or certain sequences should be followed when the model is a state transition diagram.
Models of normal operation are well suited for fault detection. Faults mean that assumptions used in the models no longer apply, so deviations from the model (“model residuals”) can be sensitive detectors of problems. If the model is fairly comprehensive, most significant faults will generate some noticeable deviation. The possible faults don’t even need to be defined or anticipated for this detection to work. One problem area, especially for quantitative models, is that the models of normal operation may only apply to a certain range of operating conditions and operating modes. If operating conditions change, the models become less accurate, so model residuals increase and may indicate problems that do not actually exist. (A “false positive”.)
In some applications, fault detection is all that is needed. This is the case when the model of normal operation directly corresponds to equipment that must be replaced when it is failed. Fault isolation doesn’t need to go into any more detail than needed to identify the replaceable part. Another example is in cases where one operator is managing a very large system, and the main problem is just getting them to notice the problem (because they know what to do once they’ve noticed the problem).
However, in most applications, further analysis is needed. Since the relationships between the faults and the model residuals are not part of the normal operation model, additional knowledge must be used for fault isolation, to pinpoint the root causes of problems. This is often done through some form of pattern matching or causal model of abnormal behavior.
In many cases, engineering models of normal behavior are complex and costly to develop and maintain. An alternative is to develop models of abnormal behavior. These are generally qualitative, and only need to capture the more extreme changes in behavior resulting from failures, rather than small, normal variations. For instance, fault propagation models describe the effects of faults by modeling cause and effect. Each effect may propagate further to additional effects. These causal models ultimately link root cause problems to observable symptoms. Thus, predictions can be made on the effects of a root cause fault. Conversely, diagnosis can invert this model, looking at the symptoms to then determine the possible causes.
Sometimes the effects of certain faults can be built into models of normal operation, hence combining aspects of both normal and abnormal operation models. This is natural in the case of state transition diagrams, for instance.
For quantitative models, faults often can have extent that can be estimated. Examples include sensor bias, leaks, heat exchanger fouling as indicated by heat transfer coefficients, and so on. In general this is not recommended. It can introduce too many degrees of freedom  places to assign “blame” for deviations of observed data from the normal model. There usually isn’t enough information available in the combination of model and sensor input to estimate the extent of very many faults. There are usually far too many possible faults, and modeling their effects at the level of accuracy of normal operations models is often difficult. Also, this approach is sensitive to unmodeled faults, since the system would account for discrepancies by assigning model vs. observed value errors to the modeled faults. An example of this approach using a Kalman filter with a very limited model is in Estimation of Flows and Temperatures in Process Networks.
One good hybrid approach is to use residuals from quantitative models of normal operation as inputs to a qualitative model of abnormal operation. For instance, a normal quantitative material balance indicating loss of material or presence of an unexpected component is an input for a qualitative fault model where a leak is an identified fault. Similarly, lower heat transfer than expected by a heat exchanger model can be an input to a fault model where fouling (buildup of insulating material impeding heat transfer) is a defined fault.
A related approach is to use estimation techniques such as data reconciliation for algebraic models, or Kalman filters for dynamic models. Then use the discrepancies between the estimated measurement values and observed measurement values (the “innovation” in Kalman filtering terms) as inputs for pattern matching, training with data acquired during known faults or through simulation. The difference from directly looking at model residuals is that the estimation step must be carried out. Examples of these approaches in the case of algebraic models is given in “Neural Nets for Fault Diagnosis Based on Model Errors or Data Reconciliation”.
Static vs. dynamic models
Dynamic models explicitly model behavior over time, while static models do not. In the case of numerical models, this is the difference between algebraic models vs. models based on differential equations or difference equations. Even qualitative models can incorporate dynamics. For instance, cause/effect models can include time delays. As another example, a state transition diagram or Petri net is a dynamic model, because it models changes in state that occur over time when triggered by events. Frequency response analysis is not common in fault diagnosis except for event generation, although it provides a way to describe some dynamic behavior through algebraic manipulations.
In general there can be timing errors when the system is not based on a full dynamic model, or even if a dynamic model is not exactly correct (which it never is). Synchronization of inputs, as discussed in the section on causal models, can help account for time delays with static models (whether qualitative or quantitative). That is one way to extend static models into dynamic models. Part of a dynamic model is included in the delays and lags built into the synchronization, even if the dynamics are just empirical approximations. This is analogous to the common practice in process control to use steady state models to determine process gains, and then extend the model by inserting time delays and lags.
Quantitative vs. qualitative models
Quantitative models are numerical models such as algebraic equations and differential or difference equations. For example, the “gross error” detection and diagnosis methods associated with traditional Data Reconciliation are based on quantitative, static models: algebraic equations (and inequality constraints).
Qualitative models generally do not include information on the magnitude of faults or their effects. Instead, they use terminology such as “high temperature”, often using variables that are binary or with just a few discrete values. A state transition diagram is another example of a qualitative model. The Seagate NerveCenter product, popular for network management, was based on state transition diagrams. Fault propagation models (cause/effect models of abnormal behavior) are common, discussed in the section on causal models.
Some techniques offer a blend of qualitative and quantitative modeling. For instance, those based on fuzzy logic are extensions of logic models, but they capture magnitude information in the form of “membership functions” for a discrete number of values such as “verylow”, “low”, “normal”, “high”, and so on.
Causal models
Causal models capture cause/effect information. For a review of these important model types, please go to the page:
Causal models
Compiled vs. first principles models
“First principles” models are often engineering design models. “Compiled” models are based mainly on data, or derived from more fundamental models or simulations. Please go to the following page:
Compiled vs. first principles models
Probabilistic vs. deterministic models  modeling uncertainty
Diagnostic systems inherently make assumptions on uncertainty. The only question is whether this uncertainty is explicit, or is hidden inside of “black box” techniques, or is just part of engineering judgment during tuning. Please go to the following page:
Probabilistic vs. deterministic models  modeling uncertainty
Bayesian models
Bayesian models are models of conditional probability and independence, inspired by Bayes Rule. Please go to the following page:
Bayesian models
Copyright 2010  2020, Greg Stanley
(Return to A Guide to Fault Detection and Diagnosis)
