The published technical papers presented next (based on Ph.D thesis work and also later experience at Exxon and Gensym) formalized the framework for understanding data reconciliation and the related topics of gross error detection, observability, and redundancy, partitioning a system based on observability and redundancy, extensions for dynamic data reconciliation, and use of data reconciliation for generating signatures for gross error detection to be analyzed by techniques such as neural networks.
Data Reconciliation in steady state systems
The technical paper by Mah, Stanley, and Downing: Reconciliation and Rectification of Process Flow and Inventory Data, formalized and popularized data reconciliation in flow networks. It also introduced tests and an algorithm for detecting gross errors in flow networks (measurement errors and leaks) by analyzing nodal imbalances. It applied graph theory, significantly simplified the analysis and decomposition of problems, showed a practical application, and introduced a variety of concepts such as the “environment node” in the process graph to eliminate the distinction between “internal” and “external” flows. This paper provided the first formal use of graph theory both for analyzing flow reconciliation, and for diagnosing gross errors. Although the formal solution is for steady state systems, the paper points out that inventory changes as measured by tank level changes can be easily accounted for by treating the inventory changes as equivalent to an additional flow. The abstract for the paper is:
This paper shows how information inherent in the process constraints and measurement statistics can be used to enhance flow and inventory data. Two important graphtheoretic results are derived and used to simplify the reconciliation of conflicting data and the estimation of unmeasured process streams. The scheme was implemented and evaluated on a CDC6400 computer. For a 32node 61stream problem, the results indicate a 42 to 60 % reduction in total absolute errors, for the three cases in which the number of measured streams were 36, 50, and 61 respectively. A gross error detection criterion based on nodal imbalances is proposed. This criterion can be evaluated prior to any reconciliation calculations and appeared to be effective for errors of 20 % or more for the simulation cases studied. A logically consistent scheme for identifying the error sources was developed using this criterion. Such a scheme could be used as a diagnostic aid in process analysis.
That paper emphasized the analytical solutions for linear systems, and was mostly dedicated specifically to flow networks.
Extensions for nonlinear systems and for dynamics
Numerous enhancements in algorithms have been made since the early work. For steady state systems, the emphasis shifted from analytical solutions for the linear problems to numerical solutions to nonlinear problems, using nonlinear optimization. The optimization approach can also account for inequalities such as the physical constraints that flows are nonnegative in most normal operations, and must be nonnegative even during abnormal situations if check valves are present.
Numerous extensions for dynamic systems have also been developed. The first paper to address dynamics along with steady state constraints was Estimation of Flows and Temperatures in Process Networks. This paper by Stanley and Mah was the first to introduce the combination of Kalman Filtering and Data Reconciliation, by estimating biases and other slowly changing variables such as heat transfer coefficients. These systems were defined as “Quasi Steady State” (QSS). The paper introduced the terminology “spatial redundancy” (redundancy due to the algebraic equations over one time period), and “temporal redundancy” (extra information available from sampling at multiple time intervals). It also introduced the algorithms for taking advantage of both forms of redundancy.
That paper also addressed estimation in nonlinear systems (e.g., including temperatures and energy flows as well as material flows), by using an Extended Kalman Filter approach. The abstract for the paper is:
It is shown that temperatures and flows in a process network can be estimated from a quasi steady state model and a discrete Kalman filter. The data needed for such an application are readily available in many operating plants, and the computational requirements are within the capabilities of available process computers.
Observability and redundancy in process data estimation
The technical paper Observability and redundancy in process data estimation by Stanley and Mah addressed questions that remained unanswered in earlier work on data reconciliation. First of all, when will data reconciliation or QSS filtering perform adequately? Are there situations in which it will fail? What is the effect of measurement placement on estimator performance? Redundancy had already been shown to be useful, but how does one determine if a measurement is redundant? These questions are clearly of importance in selecting a measurement strategy.
The paper answered these questions with a general theory of observability and redundancy. Originally, observability was defined by Kalman for dynamic systems. But the fundamental issue is the same in steady state and dynamic systems: a system is observable if a given set of measurements can be used to uniquely determine the state of the system. In this paper observability was defined as a property of a steady state system defined by set membership constraints such as those described by equations and inequalities. Redundancy has a simple definition: a measurement is redundant if its removal causes no loss of observability. So, a redundant measurement could be estimated using other measurements and constraints, even it measurement values were missing.
The paper provided the first rigorous definitions of observability and redundancy for steady state and quasisteady state systems, whether linear or described by nonlinear equations and set constraints such as inequalities. It provided the first practical tests for observability and redundancy for steady state systems, and first fully explored the implications for estimator performance and problem decomposition. For nonlinear systems, observability and redundancy can be global (independent of specific values) or local (tied to specific sets of values). Simple examples of blending nodes, heat exchangers, and flow meters with different ranges illustrated the point.
The paper demonstrated the importance of these concepts in predicting qualitative estimator performance, not only for a QSS filter, but also for any constrained leastsquares estimator like data reconciliation and others. When estimates approach points that are unobservable, the estimator breaks down. When estimates approach points where redundancy is lost, raw sensor data cannot be improved, and are used directly (and hence estimates are the most sensitive to any errors, including gross errors).
Observability was defined in a very general way using topological properties of sets. Results to classify observability, predict estimator performance, and decompose problems based on observability and redundancy were then made more and more specific as additional assumptions were made, such as the existence of derivatives or second derivatives for nonlinear equations, and the extreme but important case of linear constraints and measurements. The abstract for the paper is:
By analogy to the development for dynamic systems, concepts of observability and redundancy may be developed with respect to a steady state system. These concepts differ from their counterparts for dynamic systems in that they can be used to characterize individual variables and local behavior as well as system and global behavior. Relations between local observability, global observability, calculability and redundancy are established and explored in this paper. It is shown that these concepts are useful in characterizing the performance of process data estimators with regard to bias and uniqueness of an estimate, convergence of estimation procedures and the feasibility and implications of problem decomposition.
Observability and redundancy classification in process networks
The paper Observability and redundancy classification in process networks by Stanley and Mah specialized the analysis of observability and redundancy to process networks  that is, systems defined by material and energy balances. This typically meant estimating mass flows, temperatures and energy flows, with additional relationships between temperature and enthalpy built into the measurement equations. Given the special structure of process networks, it was possible to use graph theory to predict observability and redundancy. For instance, this paper was the first to point out that for mass flow constraints, lack of observability is associated with cycles of flow arcs with zero measurements (including the “environment node”). Similarly, lack of redundancy is associated with cycles of flow arcs with exactly one measurement. Forms of cycle criteria also apply when energy balances are considered. Because of nonlinearities with energy balances, local observability and global observability are addressed. Based on the previous paper, these criteria could then be used to predict the performance of data reconciliation, in terms of ability to estimate the system state, improve estimates, and sensitivity to errors such as gross errors. The abstract for the paper is:
The utility of observability and redundancy in characterizing the performance of process data estimators was established in previous studies. In this paper two classification algorithms for determining local and global observability and redundancy for individual variables and measurements are presented. The concepts of biconnected components, perturbation subgraphs and feasible unmeasurable perturbations are introduced, and their properties are developed and used to effect classification, simplification and dimensional reduction. Stepbystep application of these algorithms is illustrated by examples.
Online data reconciliation for process control
The technical paper Online data reconciliation for process control by Stanley documents theory and applications of data reconciliation for process control applications used online in a chemical plant at Exxon. It introduced an approach for “dynamic reconciliation” that accounted for process dynamics separately from the algebraic constraints. This included some techniques for accomplishing dynamic data reconciliation such as cascade estimation. It also provided insights from a frequency response viewpoint, such as a key role of data reconciliation in estimating slowlychanging biases such as those introduced by sensors. (This was the first published data reconciliation paper pointing out that high frequency noise is eliminated by simple filtering of the raw data with exponential filters or moving averages  what is left is estimation of bias errors and elimination of gross errors.) The paper had an emphasis of not just trying to get better estimates, but in providing robust estimates of process variables used in closed loop control schemes  estimates that provided bumpless transfer when gross errors were detected and redundant sensors were removed from an estimator feeding a control scheme. Despite the “steady state” orientation of data reconciliation, it is possible to exploit its use of “spatial redundancy” and use it in certain circumstances with closed loop control, as outlined in the paper. The abstract for the paper is:
Combined data reconciliation with estimation of slowly changing parameters has been implemented for closedloop control in a Chemical Plant. Goals include streamlining use of redundant measurements for backing up failed instruments, filtering noise, and, in some cases, reducing steady state estimation errors. Special considerations include bumpless transfer from failed instruments and automatic equipment up/down classification. Parameters are calculated and filtered, then held fixed during each data reconciliation.
Gross error detection / fault diagnosis
“Gross errors” are unexpectedly large errors, due to instrument problems or unmodeled problems such as leaks. Initial tests for these were already mentioned in the papers cited above. Much additional work has been done on this, described in the books cited below.
Gross error detection can be considered as one part of the overall more general problem of fault detection and diagnosis, which may be more effective when considering additional models and heuristics, and a larger number of sensors, controller modes, and valve positions not involved in just the steady state balance equations. This includes the analysis of measurement noise, and sudden jumps in value that can reveal problems or instrument calibration procedures that would be masked in the averages used as inputs to data reconciliation.
An example is detecting stuck measurements for sensors normally involved in closed loop control. This can be detected outside of data reconciliation because a stuck measurement will lead to the calculation of nearzero standard deviation in the raw (unfiltered, unreconciled) values sampled at a shorter time interval than the reconciliation interval. That evidence might be combined with observing the controller output swinging to either the minimum or maximum value as long as there is some integral action in the controller. (Similar controller behavior but with normal sensor noise could indicate a stuck valve  a process problem rather than a sensor problem). When the operator notices the problem, they will put the controller into manual, which is also a heuristic indication of a possible failure, while the sensor or valve is being fixed.
This technical paper by Stanley shows an approach to modelbased diagnostics using either model errors or data reconciliation, combined with a pattern analyzer such as a neural net: Neural nets for fault diagnosis based on model errors or data reconciliation. The abstract is:
Instrument faults and equipment problems can be detected by pattern analysis tools such as neural networks. While pattern recognition alone may be used to detect problems, accuracy may be improved by "building in" knowledge of the process. When models are known, accuracy, sensitivity, training, and robustness for interpolation and extrapolation should be improved by building in process knowledge. This can be done by analyzing the patterns of model errors, or the patterns of measurement adjustments in a data reconciliation procedure. Using a simulation model, faults are hypothesized, during "training", for later matching at run time. Each fault generates specific model deviations. When measurement standard deviations can be assumed, data reconciliation can be applied, and the measurement adjustments can be analyzed using a neural network. This approach is tested with simulation of flows and pressures in a liquid flow network. A generic, graphicallyconfigured simulator & casegenerating mechanism simplified case generation.
The concept paper Pipeline Diagnosis Emphasizing Leak Detection: An Approach And Demonstration outlines an approach to pipeline leak detection that combines causal models of abnormal behavior with both static (algebraic) models and dynamic models, making use of data reconciliation.
Copyright 20102013, Greg Stanley
External links
Books
S. Narasimhan and C. Jordache, Data Reconciliation and Gross Error Detection: An Intelligent Use of Process Data, Gulf Publishing Company, Houston, 2000.
J. Romagnoli and M. Sanchez, Data Processing and Reconciliation for Chemical Process Operations, Volume 2 (Process Systems Engineering), Academic Press, San Diego, 2000.
Tutorials
Introduction to data reconciliation and gross error diagnosis (Narasimhan)
Data Reconciliation  Validation Intro (Heyen)
General
LinkedIn Data Reconciliation group
