Bayesian Models
This page examines Bayesian models, as part of the section on Model Based Reasoning that is part of the white paper A Guide to Fault Detection and Diagnosis.
Bayesian models are models of conditional probability and independence - the probability that some variable Y is true given that variable X is true. Each probabilistic variable is a node in a graph, where lack of an arc between nodes implies conditional independence. They are inspired by Bayes Rule, which inverts a probability model to determine the probability of each possible root cause, given the observed symptoms. This has the advantage of being based on the well-developed theory of probability.
A Bayesian Belief Network (BBN) represents variables as nodes linked in a directed graph, as in a cause/effect model. Conditional probabilities are specified for every node. Root causes just have an “a priori” probability. Run time calculation generates probability estimates for every node, and changes when any node receives a new observed value. Thus, Bayesian networks perform both prediction and diagnosis, like methods based on simpler causal models.
These are not necessarily cause/effect models. From a practical standpoint, people often start constructing the models by thinking in terms of cause/effect models. But, the models used at run time must be directed, acyclic graphs (acyclic even ignoring the directions of the arcs). So, prior to run time use, the model must be converted to this form. There are standard approaches for this, and it can be done automatically. But the resulting network is not necessarily the same, intuitive causal graph. In a sense, some of the causality has been stripped out. The algorithm processing makes no use of any causality assumption. As a result, time delays are harder to account for, other than through time synchronization approaches as discussed for other static models. (Another approach is to duplicate state variables at each point in time, with causal links in the direction of time - similar to the way that the discrete time Kalman filter can be derived.)
Bayesian systems start with prior estimates of failure probabilities for the root causes. Because of the built-in prediction that comes with Bayes models, the probability of any variable in the diagram always has a value based on the a priori root cause fault probabilities. This is a benefit that helps in ranking possible fault isolation conclusions when measurement data is inadequate to distinguish between several different faults. So, for instance, if one root cause is twice as likely to occur as another, in the absence of adequate data, that root cause is estimated to be twice as likely.
This basic idea of starting with “a priori” estimates and then updating them based on data is common in estimation theory. The Kalman filter for estimating states of continuous variables in dynamic systems is an example of this general “Bayesian” approach. However, its resulting computation of state estimates (optimal when assuming gaussian noise in the measurements, process, and initial conditions) is quite different than the techniques for probability estimation of discrete variables associated with Bayesian networks.
The original Bayes rule makes the single fault assumption. So even in the presence of multiple faults, a system based on a simple application of Bayes rule results in a single-fault diagnosis. Evidence in favor of one fault reduces the estimated probability of other faults. That is a benefit in the case of actually having only one fault (and a complete model with all faults accounted for). That forces a single answer to explain all the observed symptoms. But in the case of assuming and actually having multiple faults, it is less intuitive that positive evidence of one fault should have any impact on the probability of other independent faults, especially in the case of non-overlapping symptoms.
There are variations in the forms and run-time processing of BBNs. It is possible to build BBNs that handle multiple faults.
Many of the BBN variations are intended to reduce the computation complexity. The original Bayesian network formulation required an “NP-hard” calculation to get an exact solution, so the amount of computation increased exponentially with problem size. This did not scale well for many problems of even moderate size. Some simplifications reduce the amount of computation through approximate solutions and manipulations of the networks.
One disadvantage of Bayes networks is that many people are not comfortable specifying the conditional probabilities needed for each node. There is a lot of data to specify from personal experience or determine empirically. The extra effort might discourage the definition of intermediate conclusions that otherwise would be considered good model development practice. Some simplifications reduce the amount of data to be specified for the conditional probabilities (as in “Noisy OR gates”).
In practice, the runtime results are said to not be overly sensitive to those conditional probabilities. The diagnostic results as a ranked list of root causes depends more on the relative probabilities of node inputs compared to each other, than the absolute values of the conditional probabilities. A similar comment would apply to the a priori probabilities of the root causes - the values relative to each other matter more than the absolute values.
Strengths and weaknesses
In summary: Bayesian methods offer some advantages: (1) Accounting for a priori probability estimates (knowledge that some things were more likely to fail than others, and accounting for that in the final results) (2) Evidence combination following the laws of probability. In particular, you can better resolve conflicting data. Also, as you accumulate more positive evidence, your belief in an outcome really does increase. (3) Coming up with a ranked list of the potential causes of problems (4) Having a theoretical basis
But there are downsides: (1) people had to understand and specify a lot of conditional probabilities, and understand just what things like "noisy OR gates" really meant - not an easy sell. (2) Scalability concerns. (3) Naive Bayesian methods implicitly make a single-fault assumption. That's not realistic for many applications where there may be many ongoing faults, partly because of waiting a potentially long time to repair. That happens in large-scale applications like process plants, as well as in autonomous vehicles.
Comparisons with other techniques
As noted, quite a bit of extra probability information is required, compared to simpler causal models. But, this provides benefits in making use of a priori failure information for better diagnostic guidance for applications where data is limited. BBNs also make use of probabilities in handling conflicting data effectively.
BBNs share the usual benefits of model-based approaches vs. pure procedural techniques, such as ease of changing the model.
When using BBNs, if multiple failures are a consideration, be sure to check that the BBN can account for multiple faults.
The model type is a static one, not directly applicable to dynamic systems.
Like most other techniques, Bayesian methods assume the model is complete, so that an unmodeled fault will lead to diagnosis of another modeled fault instead.
Commercial products and applications
The Danish company Dezide (www.dezide.com) sells products based on Bayesian Belief Networks, focusing on “guided diagnosis and troubleshooting”, which means diagnosis through manual questions and answers. This includes use in call centers, and for self-help before customers reach a person in the call center.
External Links
Eugene Charniak, “Bayesian Networks without Tears”, AI Magazine, Winter, 1991, pp. 50-63
Copyright 2010 - 2020, Greg Stanley
(Return to Model Based Reasoning)
|