NEURAL NETWORKS FOR FAULT DIAGNOSIS BASED ON MODEL ERRORS OR DATA RECONCILIATION

 

Greg M. Stanley - Gensym Corporation

CENTRAL PROBLEM ADDRESSED

• Explore several possible mechanisms for fault detection using combinations of several technologies:

- Neural networks

- Traditional models, often based on first principles

Detect faults based on deviations from models

- Data reconciliation

Additional technique built upon traditional models

• Each technology has something to offer, and limitations

• General analysis and case study with hydraulic systems

 

NEURAL NETWORKS FOR FAULT DIAGNOSIS BASED ON MODEL ERRORS OR DATA RECONCILIATION

I. Neural networks background

II. Model residual analysis

III. Data Reconciliation

IV. System model

V. Results

I. NEURAL NETWORKS BACKGROUND

Neural Networks for nonlinear modeling

• Neural networks are nonlinear, multivariable models built from a set of input/output data

- Training phase - "learn" model from the data, given pairs of input & output data arrays ("training set")

Analogy: building regression model from data

- Run-time phase - use the model with new input array to predict the output array

Analogy: using the regression model with new inputs

• Result is a nonlinear "black box" model

Analogy: linear models for regression, DMC, typical controller design methods are all "black box"

Basic Neural Net elements

 

Neural Networks roles

• Functional approximation

- Approximate any mapping of input to output data

Think of it as multivariable interpolation

- Used for interpolation, control, simulation, etc., in place of other types of models

- Nonlinear modeling in the absence of first-principles models is a special strength of neural nets

• Classification (pattern matching)

Neural networks for classification, pattern matching, fault detection

• Input "features" are selected and collected into a vector

examples: temperatures, qualities, statuses

• Each possible feature pattern belongs to exactly one of n "classes"

example: fault detection, where "classes" are normal, fault x, fault y, ...

• There is a NN output for each of the n possible classes

• In training, 1 is applied to the correct output for the input class, 0 to the other outputs

• At runtime, a new input is presented, and an output near 1 indicates membership in that class

• A special strength of neural nets

Neural Net for classification at run time

 

Neural Networks vs. other techniques

• Complements traditional modeling, rule-based systems, optimization, regression, interpolation, and control

• Focus is on nonlinear systems, vs. traditional linear techniques which may be more efficient for linear systems

- very few systems are truly linear, especially under fault or other extreme conditions

- linearization for traditional methods often applies only within small operating regions

• First principles (simulation) models can be worked in by pre-processing or post-processing

e.g., model the differences from first-principles models with a neural net

• Neural nets can model static or dynamic systems

e.g., feed delayed inputs as well as current inputs

Applications areas for neural nets

• Dynamic and static process modeling

• Quality prediction & control

• Nonlinear and adaptive control

• Inferential "soft" sensing

• Fault detection and diagnosis

• Multivariable pattern recognition

• Data validation and rectification

• Time series prediction

• Process optimization

• Automated decision-making

"Backpropagation Network (BPN)"

• The "standard" network, widely-used

One of 4 available in NeurOn-Line

• Named after a particular training technique

Somewhat of a misnomer, but in common use

• Implies layered structure of nodes and connections

• usually 3 layers (input, hidden, output)

• "feedforward" - runtime data propagates from input through output with no feedback

• Information transmitted via connections with weights

• Each node takes weighted sum of inputs, then may apply a function to introduce nonlinearity

• Usually the nonlinear function is sigmoidal (S-shaped)

Any NuerOn-Line layer can apply linear or sigmoidal functions

 

Training and applying a neural net

(1) Choose inputs & outputs

(2) Acquire input/output training data

(3) Train the network

(4) Validate the network

(5) Apply the network

(6) Periodic retraining for adaptation

Training & applying (1) : Choose inputs & outputs

• Avoid irrelevant inputs if possible

• Functional relationship between inputs & outputs should exist

• Inputs can be calculated, model residuals, etc.

Training & applying (2): Acquire input/output training data

• Data should "cover" space of interest

• Neural nets, like other empirical models, extrapolate poorly

• Extrapolation may be uncovered during validation

• Radial Basis Function nets can warn about extrapolation at run time; backpropagation nets can't

• Quality & quantity of data determine quality of result

• Signal to noise ratio important

• Large data sets reduce variance of predictions if a functional relationship exists

• Validation techniques in NeurOn-Line can quantify network performance

Training & applying (3): Train the network

• Nonlinear parameter estimation

• Generally least-squares fit to training set (sum of squares of prediction errors over the data)

• NeurOn-Line uses standard optimization methods, rather than earlier backpropagation techniques - faster

• NeurOn-Line has shortcut methods for Radial Basis Function methods

 

 

Training & applying (4): Validate the network

• Number of parameters to be estimated by the training technique is related to the number of layers, nodes & connections

• The number of adjustable parameters (weights) must be chosen by the user or by an automated technique

like choosing the model order in control, or the order of a polynomial in curve fitting

• Too many parameters: overfitting, no "generalization"

like fitting quadratic polynomial to 3 points

• Too few parameters: underfitting, too much information lost

like using a linear curve fit when quadratic is really needed

• Want to achieve the right level of generalization

• Cross-validation techniques separate "testing" data and "training" data to choose architecture

Cross-Validation

• Pick an architecture (typically, # of hidden nodes)

• Evaluate the architecture

• Split data randomly into training and testing subsets

• Train the network using only the training data subset - training minimizes the training error

• Evaluate network prediction quality only over the testing subset only - "testing error"

• Repeat multiple times with different random split of data, and average the results of the testing error

Similar approaches exist to split the data n ways

• Repeat, choose the architecture with the lowest testing error

Typically at a minimum between underfitting & overfitting

• Train with the final architecture, using all the data

Cross validation - high-level view

 

Some details of cross-validation

Training & applying (5): Apply the network & retrain as needed

• Weights, architecture fixed while running

• Cases requiring extrapolation should be flagged

• Further data acquisition & periodic retraining & adaptation

• NeurOn-Line provides support for maintaining data set

- Adding new, novel cases

- Forgetting old cases when newer ones are better

- Rejecting outliers

- Filtering or other signal pre-processing

Recognizing NeurOn-Line applications

• Difficult-to-formulate models needed for system improvement

- Poorly-understood systems

- Lack of experts

- Nonlinearities

• Data available

• Functional relationship exists between inputs & outputs

• NeurOn-Line current limitations

- Data collection, network evalution 1 second

- No hard limits on size, best performance for input dimension < 100, number of examples < 1000

NeurOn-Line

• G2-based neural network package for online applications

• Support for maintaining set of training data and building an adaptive neural network model, recognizing novelty

• Real-time pre-processing of data (filtering, feature calculations)

• Support for run-time use of the NN model

• Various network types supported

- standard feedforward, sigmoidal

- radial basis functions. ellipsoidal basis functions

- Principal Component Analysis preprocessing option

- Autoassociative nets for nonlinear principal components analysis

- rho nets

• Training via optimization methods

• Cross-validation for testing against "overfitting"

• Graphical language for development

- GDA-based for signal processing, responding to events, sequential control

NeurOn-Line architecture

• G2 is the overall developer & end user environment

• Integrated with G2 and GDA (Gensym Diagnostic Assistant)

• Numerically-intensive training done in external C program

• Communication via remote procedure calls and file transfer

Why not just use a neural network?

• Doesn't take advantage of process knowledge

- Network has to learn more, may generalize improperly

- Danger of extrapolation outside of training data

- May be difficult/time consuming to "cover" the possible operating regimes

- A lot of testing may be required to build confidence in the network

- Minor plant changes or operating regime changes may require extensive retraining and retesting

- Many operating statuses change, leading to a large number of inputs to the net besides sensors

e.g., controller statuses, parallel equipment statuses

• A model or partial model with a wide range of validity may be easily available, or generated

- Validity may go well beyond available training data

e.g., material, energy & pressure balances, valve & pump curves, controller models

II. MODEL RESIDUAL ANALYSIS

 

 

 

 

 

 

 

Model residuals form patterns for input to the NN

Residuals Fault class

( 0, 0, 0, 0 , 0) : Normal operation

(-b, b, 0, 0, 0) : flow 2 biased high by amount b

(0, -b, b, 0, 0) : flow 3 biased high by amount b

(0, -b, 0, 0, 0) : leak, magnitude b, between flow & flow 3

Advantages of residuals

• Simple to compute, no iteration required, no convergence problems

• Models can be "partial", incomplete models - they are just information in the form of constraints, not a complete causal model

Same true for data reconciliation

• Unmodelled faults will still generate residuals highlighting a fault, even though the NN will be unable to correctly classify the cause of the non-normal operation

Same true for data reconciliation

Why not just use model residuals as NN inputs?

• Residuals are all "local" to one equation

• Residuals arbitrarily depend on which balances are chosen

In above example, first flow is never compared to last flow, yet that is a perfectly valid comparison/balance. Only adjacent flows are compared.

• Network has to learn all the "global" interactions

- Network may not generalize properly

• Data reconciliation fully accounts for the interactions, using ALL of the model equations, instead of just comparing adjacent sensors

• Data reconciliation allows you to specify measurement noise standard deviations, so network doesn't have to learn it

III. DATA RECONCILIATION

Data Reconciliation

• Want best estimates of variables in a system with measurements, consistent with some algebraic models

- Combining measurement information, measurement noise properties (variances), and model information

Analogy to Kalman Filter in dynamic systems, although usually no "process noise" is modelled, just "measurement noise"

• Traditionally associated mainly with mass & energy balances

• Associated "gross error detection" based on tests of model residuals or measurement adjustments - should be random

Data Reconciliation mainly reduces effect of instrument biases

• Uses algebraic models: steady state assumption, with a few tricks

- Change in tank levels treated as equivalent to flow measurement

- Other dynamic extensions exist

• Plant measurements must be averaged for time period consistent with steady-state assumption

- Typical 4 hours - 1 day

- High frequency noise filtered out

- Leaves only steady state error (bias) or very low frequencies

Data Reconciliation is least-squares error minimization

• Minimize "adjustments" to raw data based on their assumed variances - sum of squares of adjustements

• Minimization subject to constraint that the balances are satisfied exactly

•Nonlinear if the algebraic constraints are nonlinear

Data Reconciliation - mathematical formulation

The system

measurements: z = h(x) + v

constraints: g(x) = 0

v is the measurement noise, with covariance matrix R

R is usually diagonal

Diagonal elements are measurement variances (square of std. dev.)

The least-squares problem

Find best estimate x as solution to the problem:

minimize over x: (z - h(x))T R -1 (z - h(x) )

subject to: g(x) = 0

 

Special case solutions exist for linear constraints and measurements

 

 

 

IV. THE SYSTEM

Overall process of building fault diagnosis system

• Build a configurable simulator

• Select features to be used for input to the neural network

- Sensors, valve positions

- Model equation residuals

- Other calculations

- Data Reconciliation measurement adjustments

- Filtering, averaging, other signal processing as needed

• User the simulator to generate cases - a training data set

- Include sensor bias cases as faults

- Add random noise to sensors

- Randomly vary the inputs

• Train & validate the network (classification problem)

• Run-time use - use same features on real data

Overview of the water grid model

• Graphically-configured hydraulic network, as in municipal water grid

• Generation of model equations from schematic

- Fixed pressures at sources or sinks

- Pressure/flow models of pumps, valves, orifice meters, pipes, junctions

- Conservation of mass

- Analogous to Kirchoff voltage & current laws, with device equations

- Generate matrices for linearization when desired

• Algebraic equations only

- Tanks not considered, although this is a straightforward extension

The system

• G2-based schematic analyzer generates linear or nonlinear equations, sets up linear or nonlinear data reconciliation

• Equations solved by IMSL/IDL (Wave Advantage) nonlinear equation-solver

• Nonlinear data reconciliation solved by IMSL/IDL optimizers

• Case generation for NeurOn-Line (neural network)

- G2 Generates cases of various sensor failures, simulating using above models

- G2 outputs patterns of model residuals or data reconciliation adjustments to file for training

• NeurOn-Line does training , runs networks

IMSL/IDL (Wave Advantage) interface to G2

• Wave Advantage = IMSL/IDL, similar to MATLAB

• G2 sends commands to Wave Advantage command line interpreter as ASCII text strings - G2 looks like a user to Wave Advantage

• Optionally, G2 can generate files for compilation by Wave Advantage, triggered by command line input to Wave Advantage

• Results come back from IMSL/IDL in files

Software roles

I. G2

• Coordination of entire system

• Overall developer and user interface

• Model representation

• Schematic analyzer to generate equations from schematic

• Case generation

• Running NeurOn-Line

Calls separate C program for training (transparent to user)

II. IMSL/IDL (now PV-WAVE)

• Solution of model equations (linear & nonlinear equation solver)

• Solution of data reconciliation optimization problem

• Specialized 3D plots for visualization

V. RESULTS

Case studies

• "Raw" features were 8 measurements, 3 valve positions

• Failures simulated were high & low biases for sensors

• Thus, 16 failure modes plus 1 normal mode - 17 classes

• Sample pressures & valve positions automatically generated

• Random measurement noise - uniform within 3 std. dev.

Conclusions

• Noise useful to force generalization, avoid numerical problems, avoid having to use small # nodes

• Too much noise harmful - need too many cases

• Cross validation would be essential in any NN application

• Scaling data important (scaling block does this automatically)

• Large number of outliers reduce classification accuracy, but a few only lead to excess, useless nodes

• Remember that some simulators can fail to converge sometimes, leading to outliers

• During case generation, check for outliers with equation residuals (outliers not obvious with reconciled data due to smearing, without more elaborate multivariate statistical tests)

• Data reconciliation step adds complexity, computing time

• Radial Basis Function nets (RBFN) train faster

• RBFN have their own built-in error analysis to avoid extrapolation

• Models themselves handle extrapolation which NN couldn't be trusted to handle - (residual or Data Rec. approach)

• Hard to train RBFN with reconciled data and small biases (vs. noise), probably due to overlap of classes in clustering step

• When the sensor noise is small vs. biases:

- Reconciled data worked better

- numerical problems occured more with non-reconciled cases

• Either model-based technique has the major advantage of extrapolating beyond training data, and better results for a given number of cases

Greg Stanley and Associates Home  About Us  Products  Services  Success Stories  White Papers  Contact Info