Greg M. Stanley - Gensym Corporation


• Explore several possible mechanisms for fault detection using combinations of several technologies:

- Neural networks

- Traditional models, often based on first principles

Detect faults based on deviations from models

- Data reconciliation

Additional technique built upon traditional models

• Each technology has something to offer, and limitations

• General analysis and case study with hydraulic systems



I. Neural networks background

II. Model residual analysis

III. Data Reconciliation

IV. System model

V. Results


Neural Networks for nonlinear modeling

• Neural networks are nonlinear, multivariable models built from a set of input/output data

- Training phase - "learn" model from the data, given pairs of input & output data arrays ("training set")

Analogy: building regression model from data

- Run-time phase - use the model with new input array to predict the output array

Analogy: using the regression model with new inputs

• Result is a nonlinear "black box" model

Analogy: linear models for regression, DMC, typical controller design methods are all "black box"

Basic Neural Net elements


Neural Networks roles

• Functional approximation

- Approximate any mapping of input to output data

Think of it as multivariable interpolation

- Used for interpolation, control, simulation, etc., in place of other types of models

- Nonlinear modeling in the absence of first-principles models is a special strength of neural nets

• Classification (pattern matching)

Neural networks for classification, pattern matching, fault detection

• Input "features" are selected and collected into a vector

examples: temperatures, qualities, statuses

• Each possible feature pattern belongs to exactly one of n "classes"

example: fault detection, where "classes" are normal, fault x, fault y, ...

• There is a NN output for each of the n possible classes

• In training, 1 is applied to the correct output for the input class, 0 to the other outputs

• At runtime, a new input is presented, and an output near 1 indicates membership in that class

• A special strength of neural nets

Neural Net for classification at run time


Neural Networks vs. other techniques

• Complements traditional modeling, rule-based systems, optimization, regression, interpolation, and control

• Focus is on nonlinear systems, vs. traditional linear techniques which may be more efficient for linear systems

- very few systems are truly linear, especially under fault or other extreme conditions

- linearization for traditional methods often applies only within small operating regions

• First principles (simulation) models can be worked in by pre-processing or post-processing

e.g., model the differences from first-principles models with a neural net

• Neural nets can model static or dynamic systems

e.g., feed delayed inputs as well as current inputs

Applications areas for neural nets

• Dynamic and static process modeling

• Quality prediction & control

• Nonlinear and adaptive control

• Inferential "soft" sensing

• Fault detection and diagnosis

• Multivariable pattern recognition

• Data validation and rectification

• Time series prediction

• Process optimization

• Automated decision-making

"Backpropagation Network (BPN)"

• The "standard" network, widely-used

One of 4 available in NeurOn-Line

• Named after a particular training technique

Somewhat of a misnomer, but in common use

• Implies layered structure of nodes and connections

• usually 3 layers (input, hidden, output)

• "feedforward" - runtime data propagates from input through output with no feedback

• Information transmitted via connections with weights

• Each node takes weighted sum of inputs, then may apply a function to introduce nonlinearity

• Usually the nonlinear function is sigmoidal (S-shaped)

Any NuerOn-Line layer can apply linear or sigmoidal functions


Training and applying a neural net

(1) Choose inputs & outputs

(2) Acquire input/output training data

(3) Train the network

(4) Validate the network

(5) Apply the network

(6) Periodic retraining for adaptation

Training & applying (1) : Choose inputs & outputs

• Avoid irrelevant inputs if possible

• Functional relationship between inputs & outputs should exist

• Inputs can be calculated, model residuals, etc.

Training & applying (2): Acquire input/output training data

• Data should "cover" space of interest

• Neural nets, like other empirical models, extrapolate poorly

• Extrapolation may be uncovered during validation

• Radial Basis Function nets can warn about extrapolation at run time; backpropagation nets can't

• Quality & quantity of data determine quality of result

• Signal to noise ratio important

• Large data sets reduce variance of predictions if a functional relationship exists

• Validation techniques in NeurOn-Line can quantify network performance

Training & applying (3): Train the network

• Nonlinear parameter estimation

• Generally least-squares fit to training set (sum of squares of prediction errors over the data)

• NeurOn-Line uses standard optimization methods, rather than earlier backpropagation techniques - faster

• NeurOn-Line has shortcut methods for Radial Basis Function methods



Training & applying (4): Validate the network

• Number of parameters to be estimated by the training technique is related to the number of layers, nodes & connections

• The number of adjustable parameters (weights) must be chosen by the user or by an automated technique

like choosing the model order in control, or the order of a polynomial in curve fitting

• Too many parameters: overfitting, no "generalization"

like fitting quadratic polynomial to 3 points

• Too few parameters: underfitting, too much information lost

like using a linear curve fit when quadratic is really needed

• Want to achieve the right level of generalization

• Cross-validation techniques separate "testing" data and "training" data to choose architecture


• Pick an architecture (typically, # of hidden nodes)

• Evaluate the architecture

• Split data randomly into training and testing subsets

• Train the network using only the training data subset - training minimizes the training error

• Evaluate network prediction quality only over the testing subset only - "testing error"

• Repeat multiple times with different random split of data, and average the results of the testing error

Similar approaches exist to split the data n ways

• Repeat, choose the architecture with the lowest testing error

Typically at a minimum between underfitting & overfitting

• Train with the final architecture, using all the data

Cross validation - high-level view


Some details of cross-validation

Training & applying (5): Apply the network & retrain as needed

• Weights, architecture fixed while running

• Cases requiring extrapolation should be flagged

• Further data acquisition & periodic retraining & adaptation

• NeurOn-Line provides support for maintaining data set

- Adding new, novel cases

- Forgetting old cases when newer ones are better

- Rejecting outliers

- Filtering or other signal pre-processing

Recognizing NeurOn-Line applications

• Difficult-to-formulate models needed for system improvement

- Poorly-understood systems

- Lack of experts

- Nonlinearities

• Data available

• Functional relationship exists between inputs & outputs

• NeurOn-Line current limitations

- Data collection, network evalution 1 second

- No hard limits on size, best performance for input dimension < 100, number of examples < 1000


• G2-based neural network package for online applications

• Support for maintaining set of training data and building an adaptive neural network model, recognizing novelty

• Real-time pre-processing of data (filtering, feature calculations)

• Support for run-time use of the NN model

• Various network types supported

- standard feedforward, sigmoidal

- radial basis functions. ellipsoidal basis functions

- Principal Component Analysis preprocessing option

- Autoassociative nets for nonlinear principal components analysis

- rho nets

• Training via optimization methods

• Cross-validation for testing against "overfitting"

• Graphical language for development

- GDA-based for signal processing, responding to events, sequential control

NeurOn-Line architecture

• G2 is the overall developer & end user environment

• Integrated with G2 and GDA (Gensym Diagnostic Assistant)

• Numerically-intensive training done in external C program

• Communication via remote procedure calls and file transfer

Why not just use a neural network?

• Doesn't take advantage of process knowledge

- Network has to learn more, may generalize improperly

- Danger of extrapolation outside of training data

- May be difficult/time consuming to "cover" the possible operating regimes

- A lot of testing may be required to build confidence in the network

- Minor plant changes or operating regime changes may require extensive retraining and retesting

- Many operating statuses change, leading to a large number of inputs to the net besides sensors

e.g., controller statuses, parallel equipment statuses

• A model or partial model with a wide range of validity may be easily available, or generated

- Validity may go well beyond available training data

e.g., material, energy & pressure balances, valve & pump curves, controller models









Model residuals form patterns for input to the NN

Residuals Fault class

( 0, 0, 0, 0 , 0) : Normal operation

(-b, b, 0, 0, 0) : flow 2 biased high by amount b

(0, -b, b, 0, 0) : flow 3 biased high by amount b

(0, -b, 0, 0, 0) : leak, magnitude b, between flow & flow 3

Advantages of residuals

• Simple to compute, no iteration required, no convergence problems

• Models can be "partial", incomplete models - they are just information in the form of constraints, not a complete causal model

Same true for data reconciliation

• Unmodelled faults will still generate residuals highlighting a fault, even though the NN will be unable to correctly classify the cause of the non-normal operation

Same true for data reconciliation

Why not just use model residuals as NN inputs?

• Residuals are all "local" to one equation

• Residuals arbitrarily depend on which balances are chosen

In above example, first flow is never compared to last flow, yet that is a perfectly valid comparison/balance. Only adjacent flows are compared.

• Network has to learn all the "global" interactions

- Network may not generalize properly

• Data reconciliation fully accounts for the interactions, using ALL of the model equations, instead of just comparing adjacent sensors

• Data reconciliation allows you to specify measurement noise standard deviations, so network doesn't have to learn it


Data Reconciliation

• Want best estimates of variables in a system with measurements, consistent with some algebraic models

- Combining measurement information, measurement noise properties (variances), and model information

Analogy to Kalman Filter in dynamic systems, although usually no "process noise" is modelled, just "measurement noise"

• Traditionally associated mainly with mass & energy balances

• Associated "gross error detection" based on tests of model residuals or measurement adjustments - should be random

Data Reconciliation mainly reduces effect of instrument biases

• Uses algebraic models: steady state assumption, with a few tricks

- Change in tank levels treated as equivalent to flow measurement

- Other dynamic extensions exist

• Plant measurements must be averaged for time period consistent with steady-state assumption

- Typical 4 hours - 1 day

- High frequency noise filtered out

- Leaves only steady state error (bias) or very low frequencies

Data Reconciliation is least-squares error minimization

• Minimize "adjustments" to raw data based on their assumed variances - sum of squares of adjustements

• Minimization subject to constraint that the balances are satisfied exactly

•Nonlinear if the algebraic constraints are nonlinear

Data Reconciliation - mathematical formulation

The system

measurements: z = h(x) + v

constraints: g(x) = 0

v is the measurement noise, with covariance matrix R

R is usually diagonal

Diagonal elements are measurement variances (square of std. dev.)

The least-squares problem

Find best estimate x as solution to the problem:

minimize over x: (z - h(x))T R -1 (z - h(x) )

subject to: g(x) = 0


Special case solutions exist for linear constraints and measurements





Overall process of building fault diagnosis system

• Build a configurable simulator

• Select features to be used for input to the neural network

- Sensors, valve positions

- Model equation residuals

- Other calculations

- Data Reconciliation measurement adjustments

- Filtering, averaging, other signal processing as needed

• User the simulator to generate cases - a training data set

- Include sensor bias cases as faults

- Add random noise to sensors

- Randomly vary the inputs

• Train & validate the network (classification problem)

• Run-time use - use same features on real data

Overview of the water grid model

• Graphically-configured hydraulic network, as in municipal water grid

• Generation of model equations from schematic

- Fixed pressures at sources or sinks

- Pressure/flow models of pumps, valves, orifice meters, pipes, junctions

- Conservation of mass

- Analogous to Kirchoff voltage & current laws, with device equations

- Generate matrices for linearization when desired

• Algebraic equations only

- Tanks not considered, although this is a straightforward extension

The system

• G2-based schematic analyzer generates linear or nonlinear equations, sets up linear or nonlinear data reconciliation

• Equations solved by IMSL/IDL (Wave Advantage) nonlinear equation-solver

• Nonlinear data reconciliation solved by IMSL/IDL optimizers

• Case generation for NeurOn-Line (neural network)

- G2 Generates cases of various sensor failures, simulating using above models

- G2 outputs patterns of model residuals or data reconciliation adjustments to file for training

• NeurOn-Line does training , runs networks

IMSL/IDL (Wave Advantage) interface to G2

• Wave Advantage = IMSL/IDL, similar to MATLAB

• G2 sends commands to Wave Advantage command line interpreter as ASCII text strings - G2 looks like a user to Wave Advantage

• Optionally, G2 can generate files for compilation by Wave Advantage, triggered by command line input to Wave Advantage

• Results come back from IMSL/IDL in files

Software roles

I. G2

• Coordination of entire system

• Overall developer and user interface

• Model representation

• Schematic analyzer to generate equations from schematic

• Case generation

• Running NeurOn-Line

Calls separate C program for training (transparent to user)


• Solution of model equations (linear & nonlinear equation solver)

• Solution of data reconciliation optimization problem

• Specialized 3D plots for visualization


Case studies

• "Raw" features were 8 measurements, 3 valve positions

• Failures simulated were high & low biases for sensors

• Thus, 16 failure modes plus 1 normal mode - 17 classes

• Sample pressures & valve positions automatically generated

• Random measurement noise - uniform within 3 std. dev.


• Noise useful to force generalization, avoid numerical problems, avoid having to use small # nodes

• Too much noise harmful - need too many cases

• Cross validation would be essential in any NN application

• Scaling data important (scaling block does this automatically)

• Large number of outliers reduce classification accuracy, but a few only lead to excess, useless nodes

• Remember that some simulators can fail to converge sometimes, leading to outliers

• During case generation, check for outliers with equation residuals (outliers not obvious with reconciled data due to smearing, without more elaborate multivariate statistical tests)

• Data reconciliation step adds complexity, computing time

• Radial Basis Function nets (RBFN) train faster

• RBFN have their own built-in error analysis to avoid extrapolation

• Models themselves handle extrapolation which NN couldn't be trusted to handle - (residual or Data Rec. approach)

• Hard to train RBFN with reconciled data and small biases (vs. noise), probably due to overlap of classes in clustering step

• When the sensor noise is small vs. biases:

- Reconciled data worked better

- numerical problems occured more with non-reconciled cases

• Either model-based technique has the major advantage of extrapolating beyond training data, and better results for a given number of cases

Greg Stanley and Associates Home  About Us  Products  Services  Success Stories  White Papers  Contact Info