AT&T Internet Service Network Management
Background
This work started in late 1993, as part of Gensym consulting. AT&T had an HP OpenView-based network management system for several of their data networks. Through proxies, they monitored server applications as well as network elements. This system grew to include the WorldNet internet service - monitoring the network itself and especially the edge elements like modem banks. The network and its applications generate several million alarm events per day.
Business Problem
Their operators were faced with too many alarm events. Due to information overload, problems went undetected and undiagnosed for too long. In addition, modem management was a major headache. Service factors, customer satisfaction, and operations staffing levels (cost) were all major concerns.
Solution
The installed solution was based on Integrity (called Operations Expert) at the time. It intercepts every event on the way from SNMP agents to HP OpenView, and performs filtering, correlation, and diagnosis. In some cases, automated corrective actions are taken, especially for the critical area of modem management. Please see the Integrity white papers on Network Management. It includes an AT&T white paper specifically using the ATT system as an example.
Results
The number of events sent on to operators was reduced from 2 million to under 1000 per day. These messages were concise summaries of problems, and of groups of symptoms related to common root cause problems. The result is that staffing did not need to increase much even though the network grew dramatically. Service levels are also very high -- generally rated the best or among the best in the industry.
Gensym’s Integrity product line was a productized by us as a result of this collaboration with AT&T, based around the OPAC graphical language for scalable, event-oriented processing.
|