Available for Consulting

Need a job? I might be able to help you find one. Need help? I'm available for consulting engagements. Send me an e-mail. Or you can contact me via Google+ or Linked In.

Thursday, August 15, 2013

Capacity Management Shouldn't Be An Afterthought

Capacity management or planning is the often forgotten process that can make a huge difference in terms of application availability. When applications first come online, both the hardware and the software are sized to handle the volumes of business at that point in time. However, transaction volumes never stay the same and typically grow at faster rates than anticipated. The reasons for this vary, but generally, new applications enable business efficiencies to help it grow. Also, it's very hard to anticipate business growth volumes and external factors such as regulatory changes. All these changes can increase the demand for resources in the system, and Production Support teams need to be aware how these growths in volume can affect application performance.


Capacity Management attempts to avoid reactionary approaches towards sizing systems (usually at the point of impact) and instead it looks to cost-effectively avoid system degradation due to volume increases.

At a basic level, Capacity Management programs need to track basic system Key Performance Indicators (KPIs) such as CPU, Memory, Disk, Swap and Network utilization. The real power of capacity planning comes when transaction volumes can be correlated with the individual system KPIs. This enables teams to answer critical questions such as: If volume increases by x%, how well will my system perform? If your capacity management program doesn't answer that question with a good degree of certainty, then you don't have much of a program at all.

Beyond the basic KPIs, Production Support teams need to identify system-specific metrics that can give them insight into the performance of the tools they manage. For example, I used to run a group in charge of supporting FX platforms. A key indicator for FX is how quickly a price can be generated once markets tick. So pricing latency was a critical KPI for that system.

Capacity metrics need to be tracked and reported on, at least, a monthly basis. A thorough analysis needs to be done to determine how transaction volumes are affecting KPIs and a forecast analysis should be done to determine how upcoming volumes will impact performance. This can be done with simple regression analysis, where you look to determine how your transaction volumes correlate to your KPI numbers. You can find many articles on the Internet about how to perform regression analysis with tools like Excel. Some capacity planning tools do the forecasting part as a built-in feature.

Regression analysis will provide two variables (amongst others) which are critical for Prod Support teams:
  1. R-squared: Which tells you how good your trasaction volume is at impacting your KPIs. Because CPU percentage, for example isn't really measured in the same scale as transaction volumes, you should expect low r-squared values. The higher this value, though, the better.
  2. Correlation Coefficient: This is the key variable. What this will tell you is stuff like, if your transaction volumes go up by 1%, how much will a particular KPI change.
The KPIs also need to be compared to predetermined Service Level Agreements (SLAs). To continue with the pricing example, above, knowing how quickly we should be generating a price is critical. Let's say the target is 5 milliseconds and it's taking us 1 millisecond to generate a price, at the moment. Our forecast shows that increasing volumes by 50% will degrade pricing performance to 4 milliseconds. We know, then, that it would be OK to increase volume by 50%, as we'd still stay within the SLA.

My capacity planning tool of choice is TeamQuest, which is actually quite nice. I'm keen to hear from other folks which other tools they've used and what their experience with them is.

No comments:

Post a Comment