Neural Nets for Fault Detection and Diagnosis
This page examines Neural Networks, as part of the white paper A Guide to Fault Detection and Diagnosis.
Neural networks are nonlinear, multivariable models built from a set of input/output data. They can be used as event detectors, detecting events and trends. They can also be used as diagnostic models in modelbased reasoning, or used directly as classifiers for recognizing fault signatures. Since the original "neural net" is a brain, these are sometimes referred to as Artificial Neural Nets (ANNs).
Overview of Neural Nets
General structure
Neural networks are represented as a set of nodes and connections between them. The connections have weights associated with them, representing the “strength” of those connections. The nodes are loosely inspired by the biological neurons because they ‘fire” by responding to inputs and transmitting a result to nearby nodes through the connections. ANNs are organized into layers. At runtime, information is fed to the “input layer”  the set of input nodes for the ANN. Each input variable directly inputs to one node in the input layer. Those nodes output to multiple nodes in the next layer. Data is propagated through successive layers, with the final result available at the “output layer”. There is one output node for each output variable. The nodes between the input and output layers are called “hidden nodes”. A few neural network architectures feed signals from outputs back to inputs.
Each node does a computation based on its inputs. The output is based on a weighted sum of the inputs, using the weights specified in the connections. The output calculation also includes a nonlinear function. The nonlinear function is often a sigmoid  an “Sshaped” curve. The nonlinearity is a key part of the system  otherwise it would just be equivalent to a linear function that could be directly written as a matrix calculation.
Training phase and runtime phase
There is a "training phase" for developing the ANN model, and a runtime phase for using the model. In the training phase, a neural network "learns" the model from the data, given pairs of input & output data arrays (the "training set"). The result is a nonlinear "black box" model. This is analogous to building a regression model from data. In the runtime phase, we use the model with a new input data array to predict a new output array. This is analogous to using the regression model with new inputs.
Roles as function approximators or classifiers
Neural Networks can be used in several major roles: function approximation and classification. In function approximation, we are developing an approximation to the real mapping of input data to output data. Think of it as multivariable interpolation. It provides replacements for models of all types, used for control, simulation, and interpolation, as well as modelbased diagnosis. Nonlinear modeling used this way, in the absence of first principles models, is a special strength of neural nets.
The second major role for neural networks in in classification (pattern matching). For diagnosis, this means selecting which fault signature is the most likely. Input "features" (variables) are selected and collected into a vector. Each possible feature pattern belongs to exactly one of n "classes". For fault detection, a class corresponds to a symptom of a fault. For diagnosis, a class corresponds to a fault. The neural net has n outputs, corresponding to each of the n possible classes. In training, 1 is applied to the correct output for the input class, and 0 is applied to the other outputs. At runtime, a new input is presented, and an output near 1 indicates membership in that class.
Types of Neural Networks
There are many variations of neural net architectures. The "backpropagation network (BPN)" is a widely used "standard" network, named after its original technique for training. It has a layered structure of nodes and connections. Usually there are 3 layers  input, hidden, and output. Information is transmitted via connections with weights. It runs in a "feedforward" configuration: runtime data propagates from input through to the output with no feedback. Each layer contains nodes ("neurons") that take a weighted sum of its inputs, and applies a function to introduce nonlinearity. For BPNs, that nonlinear function generally is "sigmoidal" (Sshaped). A variation is an "autoassociative” net for nonlinear principal components analysis. It uses a total of 5 layers.
Another useful variation is a "Radial Basis Function Network (RBFN), or the more general ellipsoidal basis function network. With these approaches, the data points are grouped into clusters during the training phase. The RBFN has 3 layers. It is a feedforward network. The middle layer nodes represent multivariable Gaussian functions, each with a mean at a cluster center. The outputs are based on combining values associated with the cluster centers. It can be used for classification and for functional approximation. It has the advantage that it contains its own builtin error analysis to avoid extrapolation errors  if a data point is far from any of the cluster centers, it is immediately obvious.
Building applications with Neural Networks
The general approach to applying and using neural networks is: (1) Choose inputs & outputs (2) Acquire input/output training data (3) (Optionally) Preprocess data, including steps such as normalization (4) Train the network (5) Validate the network (6) Apply the network (7) Periodically retrain for adaptation
When selecting inputs and outputs, there should be a functional relationship between inputs and outputs.
When acquiring data, the data should cover the space of interest. Neural nets, like other empirical models, can extrapolate poorly. Extrapolation may be uncovered during the validation phase. Radial Basis Function nets can detect extrapolation at run time, but backpropagation nets cannot. The data can be "live" data. Or, it might be acquired by running a complex simulation. In that case, adding a little noise can be useful to force generalization and avoid numerical problems.
Optional preprocessing of data
Neural net software may include facilities for various kinds of preprocessing of the data, including normalizing each variable to fit within a fixed numerical range about an average value.
Linear techniques such as Principal Components Analysis may be used to preprocess the data. By using these techniques to build a linear model, the training of the neural net can focus on training just for the nonlinearities. Other preprocessing includes any needed filtering to reduce the noise when timevarying data is used.
Training
After acquiring the data, training involves using existing data to estimate weights in the connections. Training also includes experimenting with the number of nodes, number of layers, and so on. The quality & quantity of data determine the quality of the results. Large data sets reduce variance of predictions if a functional relationship exists.
The training is a nonlinear parameter estimation problem. Generally, the goal is some form of leastsquares fit to the training set (minimizing the sum of squares of prediction errors over the data). The traditional BPN network had a specific training method accomplishing a gradientbased search. However, standard optimization methods are better both for robustness and performance. They can take advantage of the years of experience in finetuning for solving nonlinear least squares problems. RBFN use a different technique.
Crossvalidation
Validation techniques are used during training to help quantify network performance.
The number of parameters to be estimated by the training technique is related to the number of layers, nodes and connections. The number of adjustable parameters (weights) must be chosen by the user or by an automated technique. This is analogous to choosing the model order in control systems design, or the order of a polynomial in curve fitting.
If there are too many parameters, this is called "overfitting". There is little "generalization" (nothing is "learned"). In the extreme case, at runtime, the model will exactly match the output data when fed the same input data as in the training data. But the results for any other data will be unpredictable. This extreme case is analogous to fitting a quadratic polynomial to 3 points.
If there are too few parameters, this is called "underfitting". In this case, too much information lost. This is analogous to using a linear curve fit when a quadratic is really needed. One goal of training is to achieve the right level of generalization. Crossvalidiation accomplishes this. Crossvalidation techniques separate the data into "testing" data and "training" data to choose the architecture. The general process for crossvalidation is to: (1) Pick an architecture (e.g., number of layers, number of hidden nodes) (2) Evaluate the architecture: • Split data randomly into training and testing subsets • Train the network using only the training data subset  training minimizes the training error • Evaluate network prediction quality only over the testing subset only  "testing error" • Repeat multiple times with different random splits of data, and average the results of the testing error. (Similar approaches exist to split the data n ways) (3) Repeat, and choose the architecture with the lowest testing error (Typically, error is at a minimum somewhere between underfitting and overfitting) (4) Train with the final architecture, using all the data
Once the structure is fixed and the weights are determined, the net is available for runtime use.
Adaptation
When planning for adaptive applications that lead to retraining, facilities are needed to maintain the data sets. This includes recognizing and adding new, novel cases; and forgetting old cases when newer ones are better. It also includes basic filtering or other signal preprocessing and rejecting outliers.
Adaptation can allow modeling a system that can change over time. However, there is the danger that slow degradation will be “learned” and thought to be normal, so that real problems will not be detected.
Comparisons with other techniques
Good application areas
Applications especially suitable for neural networks include those where a functional representation is known to exist between inputs and outputs, but it is difficult or timeconsuming to formulate “first principles” models. The models should include significant nonlinearities; otherwise linear modeling techniques could be used. A significant amount of data must be available for building the models.
Neural nets can model static or dynamic systems. For behavior over time, delayed inputs and outputs can be used.
Limitations in using neural networks
Why not just always use a neural network for quantitative models or classification? There are multiple reasons:
• Neural networks by themselves don’t take advantage of known models. As a result, the network has to learn more, and may generalize improperly. A model or partial model with a wider range of validity might be easily available, or generated. • There is a danger of extrapolation outside of training data • Training can be time consuming, and involve a lot of heuristic rules on selecting variables, choosing the number of nodes, and so on. • It may be difficult or time consuming to "cover" the possible operating regimes • A lot of testing may be required to build confidence in the network • Minor plant changes or operating regime changes may require extensive retraining and retesting • Many operating statuses change, leading to a large number of inputs to the net besides sensors e.g., controller statuses, or parallel equipment statuses • Lack of ability to explain the results. This is common to all “black box” techniques, but less of a problem for systems based on approaches using rules, causal models, or logic diagrams, • In systems where the neural net is used to build a model of normal operation, the neural net can provide sensitive detection of faults by detecting nonnormal operation. However, alternative technologies are then needed for fault isolation.
Neural networks complement other techniques, and may be used together
Neural networks complement traditional modeling, rulebased systems, optimization, regression, interpolation, and control. The focus is on nonlinear systems, vs. traditional linear techniques that may be more efficient for linear systems. This is useful because few systems are truly linear, especially under fault or other extreme conditions. Linearized models used in traditional methods often apply only within small operating regions.
Neural nets may be combined with engineering models to exploit the strengths of both. First principles (simulation) models can be incorporated by preprocessing or postprocessing. For instance, neural nets can learn the deviations from a model, or learn the deviations from estimators such as data reconciliation. Examples of these hybrid approaches are covered in Neural Nets for Fault Diagnosis Based on Model Errors or Data Reconciliation. Another approach is to use simulation models to generate training data for data driven techniques like neural nets. An example of using both simulation models and RBFN neural nets for HVAC diagnosis is described in NIST Final Report: A New Methodology for Fault Detection Observers in VAV Systems .
As noted already, in cases where neural nets are used to build a model of normal operation, these models can provide sensitive detection of faults as deviation from normal behavior. However, in this case, alternative technologies are then needed for fault isolation. For example, the fault isolation might be handled by a system based on causal fault propagation models.
Competitive technologies
The products from Smartsignal include a core technology that could be compared to RBFNs. Although the technology is quite different, it could be used in similar application areas.
Commercial products and applications
NeuCo, at www.neuco.com, sells products focusing on the electrical power generation industry that include embedded neural networks. The MaintenanceOpt product uses neural nets for early fault detection for equipment health monitoring and diagnosis. Deviation from a neural network model of normal operation triggers events for fault isolation using rules. The neural network is part of an overall system, not currently sold separately.
Gensym had a product called NeurOnLine. A brochure is available. It provided for general realtime use, including BPN, RBFN, and autoassociative networks, with extensive support for preprocessing available.
Copyright 2010  2013, Greg Stanley
(Return to A Guide to Fault Detection and Diagnosis)
