Greg Stanley and Associates Performity LLC
  Home     About Us     Products     Services     Examples     Tech Resources     Contact Us 
HomeTech Resources >  CDG Technology >

CDG:  Causal Directed Graph (Causal Digraph) Technology for Diagnostics

Technical Resources:


CDG: A technology for event correlation, root cause fault diagnosis, impact prediction, test management, and automated recovery

The general approach behind Causal Directed Graph (CDG) technology is described in the causal model section of the Guide to Fault Detection and Diagnosis.  This page is a guide to online literature on details and applications of this technology for the CDG/SymCure products.

The basic idea of using cause and effect models for diagnosis has existed for some time.  The specific CDG model-based reasoning technology was invented by Greg Stanley, and developed by Greg Stanley and co-workers Ramesh Vaidhyanathan and Ravi Kapadia and others while at Gensym. Its first application was Iridium fault management - monitoring that worldwide satellite-based communication system. While initially used for satellite communications diagnosis, it was developed for general applications. This includes network/systems/applications/service/IT management, as well as process control applications (Abnormal Condition Management). Gensym's SymCure product is based on this technology, and it was originally named for the CDG technology it contained. SymCure is part of the overall Integrity/Optegrity product line, formerly called Operations Expert. CDG technology supports event correlation, root-cause fault diagnosis, impact prediction, test management, and automated recovery.  A CDG software architecture diagram follows:


SymCure/CDG Architecture


The remainder of this page summarizes the published CDG/SymCure technical papers, and provides links to them.

A Generic Fault Propagation Modeling Approach to On-Line Diagnosis and Event Correlation

The first published CDG technical paper by Stanley and Vaidhyanathan
A Generic Fault Propagation Modeling Approach to On-Line Diagnosis and Event Correlation (pdf)
was originally presented at the 3rd IFAC Workshop on On-Line Fault Detection and Supervision in the Chemical Process Industries.  (IFAC is the International Federation of Automatic Control). That paper was expanded into a tutorial in the Integrity Reasoner White Paper (pdf). The abstract of the original paper is:

CDG (Causal Directed Graph) provides a methodology and framework for real-time fault management in large-scale systems, addressing the full life cycle of problem identification based on symptoms, diagnostic testing, and fault isolation, through recovery, as well as protecting the operator from “alarm flooding”.  It is based on generic fault propagation models, tied to an object-oriented domain representation and scalable algorithms.  CDG combines the generality of FMEA models with on-line, asynchronous event correlation and diagnosis. The architecture of CDG is described and the modeling approach is discussed with examples. Event correlation and interactive diagnosis using CDG is illustrated through a nitric acid cooling system example.

Applications for Abnormal Condition Management (ACM)

A white paper by Mark Allen of Gensym on application of the technology in process control for Abnormal Condition Management is described in Optegrity for Abnormal Condition Management (pdf) .

A technical paper by Noureldin and Roveta, Using Expert System and Object Technology for Abnormal Condition Management (pdf), was presented at the BIAS conference in Milano, Italy. The abstract for the paper is:

This paper discusses the problem of Abnormal Condition Management (ACM), defines the requirements for addressing this problem, and presents an application, developed using Gensym’s Optegrity platform, which provides generic objects for managing abnormalities on heaters. The goal of this application is to sustain operational performance and maintain continuous availability by detecting and resolving abnormal process conditions early – before they impact operations. The heater models developed for the first application can be easily reused and adapted to other heating devices by customizing the objects with graphical tools.  The first application of these “generic heaters” has been installed in a refinery in the Middle East, and it is currently in the process of being deployed at other sites. A total of 80 preconfigured faults have been included for identifying the root cause of various heater problems. The application includes almost 240 messages that can be presented to operators for assisting with the diagnosis of problems and for providing guidance to quickly return to normal operation.  As part of the justification of this project, a return on investment analysis was completed. The payback period was estimated to be in the range of 3 to 8 months, depending on the type of heater, the application of the heater and the existing operating conditions.

A white paper by the ARC Advisory Group summarizes the incentives for using tools like Optegrity for Abnormal Condition Management (ACM), available at Abnormal Condition Management (pdf)

Applications at BMC for systems management

White papers on application of this technology to systems management (Microsoft Exchange Servers and Windows 2000 Servers) can be found in the BMC white paper page .

Real world model-based fault management

The technical paper Real World Model-based Fault Management (pdf) by Kapadia, Stanley and Walker, is from the 18th International Workshop on the Principles of Diagnosis, Nashville TN  (2007).  It summarizes more recent advances and experience with applications of the SymCure product.  The abstract is:

Real world fault management applications encompass a number of diagnostic activities such as symptom monitoring, root cause analysis, impact prediction, testing, and recovery.  They motivate powerful knowledge representation schemes to capture domain expertise and the development of intelligent algorithms that can exploit this knowledge. There are vast opportunities for the application of state-of-the-art fault management in commercial settings and, with billions of dollars at stake, industries are eager to embrace intelligent knowledge based solutions. Over the past decade, we have developed an object-oriented model-based domain-independent methodology for real world fault management, called SymCure. In this paper, we use this experience to generalize a set of requirements for real world fault management. We present an overview of the architecture and the modeling language of SymCure. We review a sample of projects where we have applied this approach, and share the motivations, challenges, successes and failures that have been our companions along this memorable journey.

Share this page:    Share this page by e-mailing link...