Production Support Blog: The Info You Need When You Need it Most: Runbooks

Friday, September 13, 2013

The Info You Need When You Need it Most: Runbooks

For this post, I'm going to continue to focus on the knowledge aspect of an application. In particular, I'll talk about Runbooks.

Runbooks should be the first point of reference for anything related to an application. Each and every application you support should have a runbook. Otherwise, it would be like flying an airplane without a manual (for those who didn't catch the reference, every pilot has to use the airplane manual when starting it, no matter how familiar they are with the model).

Runbooks should contain some key information about an application.

The most important section a runbook should contain is a Business Context section which provides the users some idea of the business processes, their criticality and potential financial impact. Most runbooks I've seen don't contain this section, but I like to have this in place. This section should help to further solidify to a group of techies that they don't support some technology or application, but a business instead.

Runbooks should inform the analyst about the Architecture of an application. It should provide an overview of the servers and databases they communicate with. The Architecture section should provide a network context for the application, as well. It should also depict any middleware being used and also provide an idea of other upstream and downstream dependencies.

Another key section for the Runbook is an Administration section. This section should provide the user information about things like how to restart processes, scheduled jobs, breakglass procedures and start/end of day checks.

Likely, the most critical section in a runbook, when it comes to incidents, is a Monitoring and Alerting section. This section of the runbook should provide a list of common alerts and how to resolve them. This section might also contain information about the eyes-on-glass procedures for monitoring the application.

Next in criticality from the Monitoring and Alerting section is the Escalation section. The contact details for Development and Key Business users should be documented there. Also, contact information for key Infrastructure teams and Upstream/Downstream teams should be captured.

A section which provides more detail about how the application works would be an Application Deployment section. This section should contain information like which locations an application is deployed in and what dependencies it has.

The Monitoring and Alerting section should be supplemented with a Troubleshooting section which captures the most common issues, known bugs and limitations.

A Tools section in a runbook which contains the common tools the team utilizes for troubleshooting might be a good thing to document as well. New team members would certainly appreciate having a handy list of the tools their teammates use and perhaps links to downloading/installing these tools should be there as well.

A final word about Runbooks. Do you want to assess your team's proficiency when it comes to application knowledge? Make a bulleted list with each section of your runbook. Pick some topics from each section and make a little quiz. You'll now have a quick and dirty way to find out their proficiency level.

3 comments:

Jim HirschauerSeptember 17, 2013 at 1:35 PM
Very nice write-up on runbooks. Documentation is an area that usually is lacking within IT organizations. What is your opinion on Application Aware Runbook Automation and how it fits into what you've written above? Here is a link for your reference... http://www.appdynamics.com/blog/2013/03/14/application-runbook-automation-detailed-walk-through/

Full Disclosure: I work for AppDynamics but I am genuinely interested in your opinion.
ReplyDelete
Replies
Rajan VASeptember 17, 2013 at 6:04 PM
Nice .... agree with your comment on inclusion of some business context. It will help analyst do a quick and meaningful impact assessment.

Overuse or underuse of runbook can give different directions to focus on in terms ... could be application stability, run book being kept updated, quality of runbook etc ...
ReplyDelete
Replies

Add comment

Production Support Blog

Available for Consulting

Friday, September 13, 2013

The Info You Need When You Need it Most: Runbooks

3 comments:

About Me