Are you passionate about making your Production Support team better? Join me as we explore topics in Production Support of Mission Critical applications.
Tuesday, August 6, 2013
The 6 Managements of Prod Support
There are many processes that support teams must follow. But if you can do these 6 things correctly, you'll have a solid base to build on as a Prod Support group:
Application Health Management: This is also known as Monitoring and Alerting. At any given point, the support staff needs to know how the application is doing and taking proactive measures to ensure continued availability. At the low end of the spectrum, this can be done by sending e-mail alerts when something isn't right in the environment. A more ideal solution is to develop graphical dashboards that provide a Red/Amber/Green status for the various components of the application.
Capacity Management: This is also known as Capacity Planning. Capacity Planning goes hand-in-hand with Application Health Management. By being able to correlate Key Performance Indicators in the hardware and software with transactional volume, a Prod Support team can know how stable an application will be during periods of high volume.
Change Management: Many, if not most, of the problems arising in Production, from my experience, can be traced to changes in the environment. Whether it's a botched release or someone fat-fingering an IP address in a configuration file, changes expose applications to instability. Managing who can make changes and how those changes are done can significantly help application availability.
Incident Management: Things will go wrong. Otherwise, there would be no Production Support teams to worry about availability. When things do go wrong, a speedy recovery focused on service restoral is critical to mitigate impact to business users.
Problem Management: This is how stability improves. Incident Management should logically segway into Problem Management. If Incident Management answers the question of how service will be restored, Problem Management answers the question of how the outage will be prevented from recurring.
Customer Relationship Management: Production Suppor teams should be intimately aware of who their stakeholders are and how to communicate with them. A solid documentation set describing how system outages affect stakeholders and how to contact them, enables Production Support teams to prioritize recovery steps. It also helps contact business users to let them know an issue has occurred, so that business users can execute workaround steps if needed.
Underlying all these processes is Staff Training as well as Metrics and Reporting.
Why training? You can have all the process in the world, but unless the support staff is knowledgeable about their application and how to apply those processes, they won't be effective.
Metrics and Reporting, on the other hand, is how you track that each of these processes is being followed. With metrics you identify improvement areas for the entire team and the suite of applications they support.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment