NEURAL NETWORKS FOR FAULT DIAGNOSIS BASED ON MODEL ERRORS OR DATA RECONCILIATION

Greg M. Stanley - Gensym Corporation

CENTRAL PROBLEM ADDRESSED

Explore several possible mechanisms for fault detection using combinations of several technologies:

- Neural networks

- Traditional models, often based on first principles

*Detect faults based on deviations from models*

- Data reconciliation

*Additional technique built upon traditional models*

Each technology has something to offer, and limitations

General analysis and case study with hydraulic systems

NEURAL NETWORKS FOR FAULT DIAGNOSIS BASED ON MODEL ERRORS OR DATA RECONCILIATION

I. Neural networks background

II. Model residual analysis

III. Data Reconciliation

IV. System model

V. Results

I. NEURAL NETWORKS BACKGROUND

Neural Networks for nonlinear modeling

Neural networks are nonlinear, multivariable models built from a set of input/output data

- Training phase - "learn" model from the data, given pairs of input & output data arrays ("training set")

Analogy: building regression model from data

- Run-time phase - use the model with new input array to predict the output array

Analogy: using the regression model with new inputs

Result is a nonlinear "black box" model

Analogy: linear models for regression, DMC, typical controller design methods are all "black box"

Basic Neural Net elements

Neural Networks roles

Functional approximation

- Approximate any mapping of input to output data

Think of it as multivariable interpolation

- Used for interpolation, control, simulation, etc., in place of other types of models

- Nonlinear modeling in the absence of first-principles models is a special strength of neural nets

Classification (pattern matching)

Neural networks for classification, pattern matching, fault detection

Input "features" are selected and collected into a vector

*examples: temperatures, qualities, statuses*

Each possible feature pattern belongs to exactly one of n "classes"

example: fault detection, where "classes" are normal, fault x, fault y, ...

There is a NN output for each of the n possible classes

In training, 1 is applied to the correct output for the input class, 0 to the other outputs

At runtime, a new input is presented, and an output near 1 indicates membership in that class

A special strength of neural nets

Neural Net for classification at run time

Neural Networks vs. other techniques

Complements traditional modeling, rule-based systems, optimization, regression, interpolation, and control

Focus is on nonlinear systems, vs. traditional linear techniques which may be more efficient for linear systems

- very few systems are truly linear, especially under fault or other extreme conditions

- linearization for traditional methods often applies only within small operating regions

First principles (simulation) models can be worked in by pre-processing or post-processing

e.g., model the differences from first-principles models with a neural net

Neural nets can model static or dynamic systems

*e.g., feed delayed inputs as well as current inputs*

Applications areas for neural nets

Dynamic and static process modeling

Quality prediction & control

Nonlinear and adaptive control

Inferential "soft" sensing

Fault detection and diagnosis

Multivariable pattern recognition

Data validation and rectification

Time series prediction

Process optimization

Automated decision-making

"Backpropagation Network (BPN)"

The "standard" network, widely-used

One of 4 available in NeurOn-Line

Named after a particular training technique

*Somewhat of a misnomer, but in common use*

Implies layered structure of nodes and connections

usually 3 layers (input, hidden, output)

"feedforward" - runtime data propagates from input through output with no feedback

Information transmitted via connections with weights

Each node takes weighted sum of inputs, then may apply a function to introduce nonlinearity

Usually the nonlinear function is sigmoidal (S-shaped)

Any NuerOn-Line layer can apply linear or sigmoidal functions

Training and applying a neural net

(1) Choose inputs & outputs

(2) Acquire input/output training data

(3) Train the network

(4) Validate the network

(5) Apply the network

(6) Periodic retraining for adaptation

Training & applying (1) : Choose inputs & outputs

Avoid irrelevant inputs if possible

Functional relationship between inputs & outputs should exist

Inputs can be calculated, model residuals, etc.

Training & applying (2): Acquire input/output training data

Data should "cover" space of interest

Neural nets, like other empirical models, extrapolate poorly

Extrapolation may be uncovered during validation

Radial Basis Function nets can warn about extrapolation at run time; backpropagation nets can't

Quality & quantity of data determine quality of result

Signal to noise ratio important

Large data sets reduce variance of predictions if a functional relationship exists

Validation techniques in NeurOn-Line can quantify network performance

Training & applying (3): Train the network

Nonlinear parameter estimation

Generally least-squares fit to training set (sum of squares of prediction errors over the data)

NeurOn-Line uses standard optimization methods, rather than earlier backpropagation techniques - faster

NeurOn-Line has shortcut methods for Radial Basis Function methods

Training & applying (4): Validate the network

Number of parameters to be estimated by the training technique is related to the number of layers, nodes & connections

The number of adjustable parameters (weights) must be chosen by the user or by an automated technique

like choosing the model order in control, or the order of a polynomial in curve fitting

Too many parameters: overfitting, no "generalization"

like fitting quadratic polynomial to 3 points

Too few parameters: underfitting, too much information lost

like using a linear curve fit when quadratic is really needed

Want to achieve the right level of generalization

Cross-validation techniques separate "testing" data and "training" data to choose architecture

Cross-Validation

Pick an architecture (typically, # of hidden nodes)

Evaluate the architecture

Split data randomly into training and testing subsets

Train the network using only the training data subset - training minimizes the training error

Evaluate network prediction quality only over the testing subset only - "testing error"

Repeat multiple times with different random split of data, and average the results of the testing error

Similar approaches exist to split the data n ways

Repeat, choose the architecture with the lowest testing error

Typically at a minimum between underfitting & overfitting

Train with the final architecture, using all the data

Cross validation - high-level view

Some details of cross-validation

Training & applying (5): Apply the network & retrain as needed

Weights, architecture fixed while running

Cases requiring extrapolation should be flagged

Further data acquisition & periodic retraining & adaptation

NeurOn-Line provides support for maintaining data set

- Adding new, novel cases

- Forgetting old cases when newer ones are better

- Rejecting outliers

- Filtering or other signal pre-processing

Recognizing NeurOn-Line applications

Difficult-to-formulate models needed for system improvement

- Poorly-understood systems

- Lack of experts

- Nonlinearities

Data available

Functional relationship exists between inputs & outputs

NeurOn-Line current limitations

- Data collection, network evalution 1 second

- No hard limits on size, best performance for input dimension < 100, number of examples < 1000

NeurOn-Line

G2-based neural network package for online applications

Support for maintaining set of training data and building an adaptive neural network model, recognizing novelty

Real-time pre-processing of data (filtering, feature calculations)

Support for run-time use of the NN model

Various network types supported

- standard feedforward, sigmoidal

- radial basis functions. ellipsoidal basis functions

- Principal Component Analysis preprocessing option

- Autoassociative nets for nonlinear principal components analysis

- rho nets

Training via optimization methods

Cross-validation for testing against "overfitting"

Graphical language for development

- GDA-based for signal processing, responding to events, sequential control

NeurOn-Line architecture

G2 is the overall developer & end user environment

Integrated with G2 and GDA (Gensym Diagnostic Assistant)

Numerically-intensive training done in external C program

Communication via remote procedure calls and file transfer

Why not just use a neural network?

Doesn't take advantage of process knowledge

- Network has to learn more, may generalize improperly

- Danger of extrapolation outside of training data

- May be difficult/time consuming to "cover" the possible operating regimes

- A lot of testing may be required to build confidence in the network

- Minor plant changes or operating regime changes may require extensive retraining and retesting

- Many operating statuses change, leading to a large number of inputs to the net besides sensors

*e.g., controller statuses, parallel equipment statuses*

A model or partial model with a wide range of validity may be easily available, or generated

- Validity may go well beyond available training data

e.g., material, energy & pressure balances, valve & pump curves, controller models

II. MODEL RESIDUAL ANALYSIS

Model residuals form patterns for input to the NN

Residuals Fault class

( 0, 0, 0, 0 , 0) : Normal operation

(-b, b, 0, 0, 0) : flow 2 biased high by amount b

(0, -b, b, 0, 0) : flow 3 biased high by amount b

(0, -b, 0, 0, 0) : leak, magnitude b, between flow & flow 3

Advantages of residuals

Simple to compute, no iteration required, no convergence problems

Models can be "partial", incomplete models - they are just information in the form of constraints, not a complete causal model

*Same true for data reconciliation*

Unmodelled faults will still generate residuals highlighting a fault, even though the NN will be unable to correctly classify the cause of the non-normal operation

*Same true for data reconciliation*

Why not just use model residuals as NN inputs?

Residuals are all "local" to one equation

Residuals arbitrarily depend on which balances are chosen

*In above example, first flow is never compared to last flow, yet that is a perfectly valid comparison/balance. Only adjacent flows are compared. *

Network has to learn all the "global" interactions

- Network may not generalize properly

Data reconciliation fully accounts for the interactions, using ALL of the model equations, instead of just comparing adjacent sensors

Data reconciliation allows you to specify measurement noise standard deviations, so network doesn't have to learn it

III. DATA RECONCILIATION

Data Reconciliation

Want best estimates of variables in a system with measurements, consistent with some algebraic models

- Combining measurement information, measurement noise properties (variances), and model information

*Analogy to Kalman Filter in dynamic systems, although usually no "process noise" is modelled, just "measurement noise"*

Traditionally associated mainly with mass & energy balances

Associated "gross error detection" based on tests of model residuals or measurement adjustments - should be random

Data Reconciliation mainly reduces effect of instrument biases

Uses algebraic models: steady state assumption, with a few tricks

- Change in tank levels treated as equivalent to flow measurement

- Other dynamic extensions exist

Plant measurements must be averaged for time period consistent with steady-state assumption

- Typical 4 hours - 1 day

- High frequency noise filtered out

- Leaves only steady state error (bias) or very low frequencies

Data Reconciliation is least-squares error minimization

Minimize "adjustments" to raw data based on their assumed variances - sum of squares of adjustements

Minimization subject to constraint that the balances are satisfied exactly

Nonlinear if the algebraic constraints are nonlinear

Data Reconciliation - mathematical formulation

The system

measurements: z = h(x) + v

constraints: g(x) = 0

v is the measurement noise, with covariance matrix R

R is usually diagonal

Diagonal elements are measurement variances (square of std. dev.)

The least-squares problem

Find best estimate x as solution to the problem:

minimize over x: (z - h(x))T R -1 (z - h(x) )

subject to: g(x) = 0

Special case solutions exist for linear constraints and measurements

IV. THE SYSTEM

Overall process of building fault diagnosis system

Build a configurable simulator

Select features to be used for input to the neural network

- Sensors, valve positions

- Model equation residuals

- Other calculations

- Data Reconciliation measurement adjustments

- Filtering, averaging, other signal processing as needed

User the simulator to generate cases - a training data set

- Include sensor bias cases as faults

- Add random noise to sensors

- Randomly vary the inputs

Train & validate the network (classification problem)

Run-time use - use same features on real data

Overview of the water grid model

Graphically-configured hydraulic network, as in municipal water grid

Generation of model equations from schematic

- Fixed pressures at sources or sinks

- Pressure/flow models of pumps, valves, orifice meters, pipes, junctions

- Conservation of mass

- Analogous to Kirchoff voltage & current laws, with device equations

- Generate matrices for linearization when desired

Algebraic equations only

- Tanks not considered, although this is a straightforward extension

The system

G2-based schematic analyzer generates linear or nonlinear equations, sets up linear or nonlinear data reconciliation

Equations solved by IMSL/IDL (Wave Advantage) nonlinear equation-solver

Nonlinear data reconciliation solved by IMSL/IDL optimizers

Case generation for NeurOn-Line (neural network)

- G2 Generates cases of various sensor failures, simulating using above models

- G2 outputs patterns of model residuals or data reconciliation adjustments to file for training

NeurOn-Line does training , runs networks

IMSL/IDL (Wave Advantage) interface to G2

Wave Advantage = IMSL/IDL, similar to MATLAB

G2 sends commands to Wave Advantage command line interpreter as ASCII text strings - G2 looks like a user to Wave Advantage

Optionally, G2 can generate files for compilation by Wave Advantage, triggered by command line input to Wave Advantage

Results come back from IMSL/IDL in files

Software roles

I. G2

Coordination of entire system

Overall developer and user interface

Model representation

Schematic analyzer to generate equations from schematic

Case generation

Running NeurOn-Line

*Calls separate C program for training (transparent to user)*

II. IMSL/IDL (now PV-WAVE)

Solution of model equations (linear & nonlinear equation solver)

Solution of data reconciliation optimization problem

Specialized 3D plots for visualization

V. RESULTS

Case studies

"Raw" features were 8 measurements, 3 valve positions

Failures simulated were high & low biases for sensors

Thus, 16 failure modes plus 1 normal mode - 17 classes

Sample pressures & valve positions automatically generated

Random measurement noise - uniform within 3 std. dev.

Conclusions

Noise useful to force generalization, avoid numerical problems, avoid having to use small # nodes

Too much noise harmful - need too many cases

Cross validation would be essential in any NN application

Scaling data important (scaling block does this automatically)

Large number of outliers reduce classification accuracy, but a few only lead to excess, useless nodes

Remember that some simulators can fail to converge sometimes, leading to outliers

During case generation, check for outliers with equation residuals (outliers not obvious with reconciled data due to smearing, without more elaborate multivariate statistical tests)

Data reconciliation step adds complexity, computing time

Radial Basis Function nets (RBFN) train faster

RBFN have their own built-in error analysis to avoid extrapolation

Models themselves handle extrapolation which NN couldn't be trusted to handle - (residual or Data Rec. approach)

Hard to train RBFN with reconciled data and small biases (vs. noise), probably due to overlap of classes in clustering step

When the sensor noise is small vs. biases:

- Reconciled data worked better

- numerical problems occured more with non-reconciled cases

Either model-based technique has the major advantage of extrapolating beyond training data, and better results for a given number of cases

Home About Us Products Services Success Stories White Papers Contact Info |