Rule-Based Approaches and Implementation
This page examines rule-based approaches and implementation of diagnostic systems, as part of the white paper A Guide to Fault Detection and Diagnosis.
Rule-based systems in most cases implement other approaches discussed earlier, more as a program control mechanism than a separate diagnostic technique. The term “expert system” has often been used as a synonym for “rule based system”, also called a “production rule system”. This isn’t really true -- practical expert systems have almost always included other tools and paradigms in addition to rule processing. But there is a long historical linkage between rule-based expert systems and diagnosis, so rule based approaches are reviewed here.
Rules are written in an if/then format. For example
“if A and B, then C”
“if C or D, then E”
The “IF” portion is also called the antecedent, and the “THEN” portion is also called the consequent.
Rules are entered in any order during application development. Then, at run time, a “rule engine” (“inference engine”) schedules the running of the rules to draw conclusions, seek data or respond to data input, or ask questions. The rules are processed at run time using one of two techniques: forward chaining or backward chaining.
Forward chaining starts with the data in the "IF" condition in the rules and works "forward" through the rules to draw conclusions. In this case, for diagnosis, data is scanned or manually entered (such as A, B, and D in the above example). Any rules using that data in the “IF” condition are then “fired”, to see if conclusions can be drawn. If any of those conclusions can be drawn, that fires further rules that use those conclusions in their “IF” condition. In the above example, C is an intermediate conclusion, and E might be a final diagnosis. Or, E might represent a prediction rather than a diagnostic conclusion.
Backward chaining starts with variables in the ”THEN” portion, and works backwards to determine possible causes. In the rule examples above, E might represent a high temperature indicating a loss of cooling problem. The backward chaining finds all rules with E in the conclusion. In this example, backward chaining will see from the second rule that the possible causes are C or D. It will then look further upstream at additional data (or ask questions) to determine if the root cause is A and B, or D.
Rules provide an alternate representation and engine for reasoning using binary-valued causal models. As seen in the above example, the results of forward chaining are same as using a causal model for prediction. Backward chaining provides another implementation of diagnosis when the diagnostic conclusions are in the “IF” portion of the rules.
Forward chaining can be used to implement procedural approaches such as a decision tree when the diagnostic conclusions are in the “THEN” portion of the rule.
Rules can also directly implement pattern matching, looking at fault signatures. For pattern matching in one step, each diagnostic conclusion corresponds to the THEN portion of one rule. In the “IF” portion of the rule, all of the required symptoms are combined with “AND” conditions. Each rule is an “AND” gate.
As long as the rule representation allows cycles and the use of variables, other discrete model types used for diagnosis can implemented. An example is a state transition diagram - the conditions or events defining each possible state transition are included in the “IF” portion of the rules, and the conclusion changes the current state.
Just as in the case of causal models or other techniques, data-driven quantitative models can be run in parallel, and their outputs used in the “IF” portion of the rules.
So, rule engines provide an alternative for implementing a variety of techniques. Rule engines may offer many additional capabilities that could be useful. For instance, existential quantification (“if there exists X such that...”) and universal quantification (“if X is true for every Y such that...”). These can be especially useful either in formulating generic libraries of rules associated with objects of a particular class, or in event-oriented reasoning. The most effective tools based directly on other approaches should have some equivalent way to implement these kinds of query rules.
Products using the terminology “business rules” are not necessarily full rule-based systems. ”Business rules” can be something as simple as validation checking for entries into a database. This does not require the full power of forward or backward chaining using a rule engine. These simple checks would not be considered a full rule-based expert system.
Rule-based system issues and solutions
Any system based purely on backward chaining suffers some of the same problems as limited (and failed) goal-seeking languages like Prolog - application developers had to outsmart its relentless goal-seeking to accomplish algorithms and other procedures that didn't fit the paradigm, making the code obscure. But commercial tools generally offer forward and backward chaining, and additional development tools for integrating procedural code.
The earliest expert system rule engines got a bad performance reputation because in their simplest, primitive form, the computational burden grew exponentially with the size of the system. Each time a new conclusion was drawn, the entire rule set was searched to see what else should be concluded. But this was overcome by using pointers internally in various forms to eliminate the need for constantly searching through the entire collection of rules. One example is the RETE algorithm that caches information and pointers to working memory elements associated with the rules. The rule engine that is a part of the G2 expert system took an approach of passing all conclusions and rule antecedents through variables. Links are established internally between rules and variables. In both cases, this eliminates ever having to search through inapplicable rules. So, performance is not necessarily a real issue.
There is a problem of detecting knowledge base inconsistencies and duplications, as well as completeness. (This can happen in many approaches.) To ameliorate this problem in many event-oriented diagnostic systems, when defining potential events like symptom events (e.g., T > 100) or fault events (e.g., valve stuck), we've associated them with specific domain objects or domain object classes (e.g., equipment like valves and sensors, or containers like buildings, or their respective classes), and given them a "category" - a unique name, like "high temperature". Then, by browsing for event types by either category, domain object, or domain object class, you can more easily find existing ones and recognize duplicate ones. This formalism helps not only during application development, but makes querying the event history very simple. A similar situtation applies to continuous variables like temperature - if represented, they can be created as attributes or otherwise associated with domain objects. It's not foolproof, but it helps in organizing and understanding, especially for applications with hundreds or thousands of domain objects. Domain representation is best done outside of rules. Rules should refer to it. While in principle one could just write rules to contain this information, it's easier to maintain systems supporting other structures like domain representation.
Real time expert systems
Many systems such as data networks and industrial processes change their state and even their structure over time. This is best handled by an object-oriented, real time expert system.
Keeping up with changing structure is best handled in an object oriented system, keeping track of objects and the relationship between them. If the rules can be stated in terms of instances of the class of the object and the related objects (e.g., related via containment, pipes, network connections, wires, signal flows, etc.), then rules and the objects representing them can be updated separately, so that changing structure can be accommodated.
Even within a fixed structure, real time applications need to monitor changes over time, and change their conclusions as variables in the monitored system variables change. The applications must respond to constantly changing input data that arrives automatically from other systems such as process control systems or network management systems. They also need to respond to unsolicited manual input, as well as responses to questions generated by the application.
Data grows stale over time -- the older it is, the more uncertainty you have about the true value. So, real time expert systems conclude changes in belief of data or conclusions over time. Approaches to this can include models of continuously growing uncertainty, or the simpler validity interval that simply lets data and conclusions expire after specified time period, falling to "unknown" until refreshed. A classic example of the need for this is attributed to real time expert system pioneer Robert L. Moore: a hypothetical expert system that drives a car. Based on video input, there might be an intermediate conclusion that a stop light is green and a final conclusion that it is OK to drive through the intersection. The input and the conclusions have a very limited lifetime. You would not want to be a passenger in a car driven based on data or conclusions that were, say, an hour old!
With each state change, you actually have to purposely forget some old information, realizing it no longer applies. Unlike the static expert system case, you can't just keep accumulating data and making conclusions that get better over time, because the state of the system changes from minute to minute. There can be long-term accumulation of knowledge about relatively static things, but care must be taken in assuming something is truly static. Consider an expert system for a security robot, for instance. You'd think that the buildings and such would be relatively invariant. But a robot can't really assume anything is truly fixed. At a mall with a security robot, walls are moved, storefronts open and close and may provide no access while un-rented, rooms merge or are split, clothing racks move, doors close, elevators fail, and so on.
The inference engine in a real time expert system keeps a time stamp for each data input and each conclusion. It propagates new information as it arrives. It ensures that all conclusions are still current before using them. Data acquisition tasks such as periodically scanning data, or acquiring data on demand, become important parts of the system. Applications generally also need to filter out noise. The ability to uses the best estimate available within a fixed deadline is important in hard real-time systems.
Comparisons with other techniques
As noted, it is possible to implement many of the model-based or procedural approaches in a rule-based system. Those applications will share the strengths and weaknesses of those techniques. However, there will usually be some extra overhead. So, for these alternate techniques, run-time performance will probably be slower when using a general-purpose expert system shell compared using an engine written in an underlying computer languages.
Dealing with uncertainty and conflicting data is supported in various ways for either rule engines or other implementations. These variations depend on the specific tools used, so we can’t make general statements that rules engines are necessarily better or worse than the alternatives. But be sure to understand how conflicting data is handled. There might be some form of evidence combination, or the most recent input may “win”.
Rule antecedents based on comparing measurements versus threshold values are sensitive to measurement errors when operating near the threshold. For instance, if a rule is based on IF (x < 50) .... , consider what happens when the measured value of x is close to 50. Depending on the standard deviation of the measurement of x, there is a significant probability that x is less than 50, and a significant probability that it is greater than 50. In reality, that variable shouldn’t be a major factor in the diagnosis when it is measured near 50 - both the “yes” and “no” possibilities are nearly equally likely, and both should be considered. But simple rule-based systems that don’t allow for evidence combination, fuzzy values, or probability calculations might ignore this. This problem of forcing a crisp decision from approximate measurements near a threshold arises for other techniques as well; for instance, decision trees.
One major objection to rule representation is that it is difficult to see the big picture, so that maintenance and changes over time become difficult. For instance, practically every rule needs to be reexamined when adding new faults or symptoms, to be sure that each diagnosis is still unique. Tools based on logic diagrams may be easier to work with, if care is taken to represent each variable in only one place. Causal models emphasize the links between every variable and conclusion, so they can be easier to work with, depending on the quality of the browsers and visualization for models and rules.
With the exception of tools like G2 designed specifically for real time, dealing with behavior over time can be difficult. The original rule engines were designed mainly for static systems, for instance, for medical diagnosis. But as the system state changes over time, various new values will arrive, and new conclusions will be reached depending on time delays in the monitored system, sampling rate for each variable, threshold selections and filtering. The results during transitions look like the inconsistent data problem and sensitivity to measurement errors near thresholds already discussed.
Available tools and applications
The real-time expert system shell and language G2 is available from Gensym at www.gensym.com. (Gensym was acquired by Versata in 2007.) The object-oriented G2 supported forward and backward chaining, with strong real-time features. The early version of the product was described in Process Control Using a Real Time Expert System. Various products were built on top of this, and they may still be available. Those tools included GDA, Integrity/Optegrity, SymCure/CDG, NeurOn-Line, ReThink, and others. However, the add-on products all implemented the core of their own engines mainly in the G2 language (Java-like, but more wordy, and with extensions such as real-time support and graphical language support), rather than in the rule engine portion of G2. Many early applications were covered in an IFAC Paper, and some more recent applications can be found by following the links listed above.
Several public domain rule engines based on forward chaining with the RETE algorithm include CLIPS, OPS5, and JBoss Drools.
IBM purchased ILOG in 2009, making ILOG Rules part of the Websphere product line. The marketing focuses on “business rules management”. The original product contains an implementation of the forward chaining RETE algorithm (as well as an alternate mechanism that just applies rules sequentially).
Many of these tools might be in a minimal “maintenance only” mode, without significant development or support.
Copyright 2010 - 2020, Greg Stanley
(Return to A Guide to Fault Detection and Diagnosis)
|