Are you passionate about making your Production Support team better? Join me as we explore topics in Production Support of Mission Critical applications.
Monday, August 12, 2013
The Jenga Stack: Application Monitoring
One of the most critical responsibilities that an application support team has is that of Application Health Management (see my previous entry The 6 Managements of Prod Support), also known as Monitoring and Alerting. From experience, I believe monitoring application health has 3 layers (plus a few additional areas that need to be monitored, lest availability be impacted).
These three layers work much like a Jenga stack. As individual components or blocks start coming out of service, system stability starts degrading until eventually the stack comes crashing down. Application Support should be able to know that individual components have been impacted and should be able to take proactive steps to put them back to 100% service.
The three layers I’m talking about are:
1) Machine Health: Total CPU, Total Memory, Total Disk, Total Swap, Network interfaces, etc.
2) Basic Application monitoring: Running processes being up, process memory utilization, basic error and exception checking, smoke tests, etc.
3) Business Process monitoring: Transactions occurring correctly, Transactions occurring within performance SLA, Transaction acknowledgements, Transacation persistence in db, etc. For financial applications: market data feeds, user sessions, pricing, etc.
The other pieces that I didn’t include in these 3 are middleware components. The reason I don’t bundle them with the application is twofold. For one, middleware components such as application servers, databases and queuing systems are not really part of the application itself. Secondly, in most medium and large organizations, the monitoring of system health for shared infrastructure is managed by a separate team.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment