Available for Consulting

Need a job? I might be able to help you find one. Need help? I'm available for consulting engagements. Send me an e-mail. Or you can contact me via Google+ or Linked In.

Friday, September 27, 2013

How to hire Support staff.

Hiring the right personnel is one of the most critical things you can do as a Support manager. In an era of small budgets, you need all the people you can get and you need them firing on all cylinders. Otherwise, you'll spend the next six months trying to teach your new hire about your application, your processes, your business, etc. just to find out they "don't get it."

So how do you ensure you hire people that "get it?" There are several things you can do to ensure hiring success:
  1. A good phone screen. I like to split up phone screens into various parts. First I explain the role and what I'm looking for in a candidate. Second, I have the candidate take me through their resume. I leave this open-ended so I can also assess their communications skills. I also look for them to explain how their past experience makes them a good fit. Third, I ask them a bunch of short answer technical questions, e.g. "What command would you use to find a string in a log file?" Any question regarding key words they put in their resume is fair game. For example, if you say you know C, you'd better know what the static key word in C means (hint: it's not the same as in C++). Finally, I give them the opportunity to ask me questions.
  2. Fair questions in face-to-face interviews. I generally ask three technical questions when I interview face-to-face. These are all written/on-the-board exercises. I ask a SQL question, usually how to perform an implicit join. I ask a scripting question related to the relevant OS for the role. I also ask them to write a short program in any compiled or scripting language they're comfortable with. For example, "How would you implement a function to write a string backwards?" The idea is not to "stump" people or to show off your own knowledge, but to assess whay they know.
  3. Answer your own question: "Am I able to work with this person?" This might perhaps be the most important piece of the puzzle. Even if a person has all the skills they need for the role, they'll need also need to have the right personality to get hired. You know your group's personality and culture. Never hire someone who won't fit in. You'll only end up losing them as soon as you're done training them.
The above pieces of information about a candidate should give you enough insight into their skills and their personality. You can ask more, or you can ask less. But for me, what I outlined above has become a winning formula.

Wednesday, September 25, 2013

Who's issue is this?

One of the things I try to challenge my teams with is following through on issues and user inquiries. There are many times when issues come our way, just to find out that it's really within the scope of another team to correct or address it. This can happen for several reasons. For example, from experience, a business user has determined that your team provides the best turnaround time on issues. It could also be that the documentation on whom to contact might be unclear and the business users goes knocking on the first door she finds.

In many cases, I see teams simply forward the e-mail or ticket along to another group. A lot of times, the e-mail or ticket won't have full documentation on timeline, impact, history, etc. The team receiving the e-mail or ticket might not react with the right level of urgency. In fact, I've seen issues go on for days like this; being passed from one team to another. In the meantime the business user just waits in frustration.

A better approach to prevent long running e-mail threads that lead nowhere, is for the receiving team to follow through on the issue, as if it were their own. In my opinion, if the business user sends an issue your way, you should own it to completion. Instead of forwarding the e-mail or ticket, get the right team(s) on a call, and ask the right questions. Communicate the right level of urgency on that call as well. Just choosing the right forum (phone call versus e-mail) can help significantly cut down on the turnaround time.

Once you have an answer, personally deliver it to the business user. Don't expect other teams to do it. They might not have the same finesse and level of service that you have. Remember, you always want to keep those business users delighted.

Many teams will complain that they don't have enough staffing to personally handle each of these types of issues or inquiries. I would challenge that assumption. Many times, we spent more time fighting fires and explaining bad results than it would have taken to just manage the issue to completion. Guess who the business user will complain about if the issue doesn't get addressed on time?

So skip that coffee or tea break if you have to. Challenge yourself to provide your users the best service possible. They'll thank you for it and your organizational growth will, indeed, reflect it (so will your bottom line).

Tuesday, September 24, 2013

Are You Sure About Your Monitoring?

Today we had an embarrassing issue happen. It started at 1:00 AM and we didn't catch the problem until 10:00 AM when business users reported they were missing some data. So, basically, we went about half a day without knowing something was wrong. As it turns out, we had a monitoring gap.

A log monitor which captures certain strings in the file did not capture one of the strings it was configured for. Here's the timeline of events of why it didn't capture the error:
  1. The monitor was set up to tail the log file every 5 minutes to capture everything in the log since the last time the monitor ran. This is by design with a vended application we use for monitoring.
  2. The monitor ran at 12:59 PM and didn't find any errors.
  3. The error comes in at 1:00:59 AM with the string "LOG EXCEPTION"
  4. The log file rolls because it has a size limitation.
  5. The monitor runs at 1:04 AM and tails the file again, but the error is now in the rolled file.
  6. 10:00 AM, the business user reports the problem

Gotcha! Clearly we missed this in our thinking when we set up the monitor. We've now configured our monitoring to always look at the last two log files. Since the files don't grow too quickly, that should suffice (given the 5 minute interval).

So if you use Sitescope, keep this in mind. Don't get caught with your pants down. I'm sure by now I've lost everyone who uses Sitescope for monitoring (they're now checking they don't have similar gaps). :-)

Cheers!

Friday, September 20, 2013

A Tough Interview Question and How to Ace It

I was asked the following question in a forum, and I thought it was worth sharing as a blog post:

Question (edited):

During a job interview I was given the following scenario to test my ability at handling difficult situations as a Production Support analyst.
You receive a call from two business users and:
  1. You are alone covering the shift.
  2. The issues are not documented in the run book.
  3. Both the stakeholders are insisting their issue is critical.
  4. The severity level of both issues is same.

Issue 1: The business user is saying they are unable to log into the application. This is happening to multiple business users and you know this is occurring during peak hours.

Issue 2: After the first call, you get a call from a different business user saying they are unable to generate reports to validate the data for another application.

Which issue should be given priority? How would you handle this situation?

Answer:

Several things are going on here:
  1. In the first reported issue, you didn't have clear indication of impact. If people couldn't trade, that might be more important than producing a report (the 2nd issue), despite what you heard on the phone from either partner. However, it sounded like the report was needed for reconciliations, which some groups depend on for trading. In this case, you need clarification on the issues. Get the business users on the phone again and get find out more. One of the things you have to get really good at in Support is to ensure you really understand the problem. Sometimes, calling the users back and getting clarity is the only way to accomplish this.
  2. If you're alone in a shift and need help, call and get it. It's better to take a few minutes and escalate, than to try going at it alone. Remaining calm and really thinking about the best approach is a sign of maturity in a Support associate. Wake someone up if you have to. I always tell my guys it's better to wake someone up than to let things fall apart causing financial loss.
  3. Keep in mind that there are no two issues that are really, exactly the same in terms of urgency. One is usually more urgent than the other. But suppose they were the same and you can't get help. In this case all you can do is work them on a first-come-first-serve basis. You being the sole person on a shift and not having enough bandwidth to handle multiple issues is more a sign of bad coverage (and ineffective management) than anything. Of course, you wouldn't say that in an interview ;)

I hope you find this answer helpful and that it will help you ace your next Production Support job interview!

Wednesday, September 18, 2013

Examples on Increasing Productivity: Part 2 (For Managers)

A reader asked me a good question: "Would you provide concrete examples on how to increase the productivity of my Production Support team?" Although I'll answer with points that would be important for any team, not just Support, I'll provide examples that apply more directly to Production Support groups.


This is Part 2 of this article. Click on the link to go to Part 1.

The first thing you can do as a manager is to set high expectations for yourself and your team. This means two things set expectations and make sure they're challenging enough. One thing that has worked great for me is doing strategic planning at the beginning of every year with all my directs. We keep it simple. We identify things we'd like to improve about the applications we support (The Challenges). Then we come up with Action Items. Action Items define three things (Where we are, Where we want to be, and what we're going to do to get there). Incidentally, there's a technical name for just talking about the challenges and not coming up with what you're going to do to fix them...It's called complaining.

Make sure the action items achievable (yes, you can use SMART goals), but make sure they'll also challenge your team to do their best. Having clear guidelines and defined projects has worked wonders for the amount of work that my groups achieve. People don't come into work wondering what they're going to do. If the BAU work (incidents, service requests, etc.) is low, then it's time to pull out the plan and work on those strategic objectives. I review plan progress on a weekly basis and provide quarterly updates to senior management to ensure my directs' work is being highlighted and they're getting the right visibility level.

Keep a constant eye for ways to maximize the productivity of your team. Just because you have a plan doesn't mean you can't include important items on the go. Also, if something that was previously identified as important, no longer is, then remove it from the plan. Work only on those things that will add value to your group.

As I've said in prior posts, take time to reward and reinforce productive behaviors. Production Support teams go through a lot of stress and team members need to know that their work is not going unnoticed.

Finally, use your metrics to determine areas for improvement and to track how Productive your teams are being. If all the work you've planned to do is not having a positive impact on your Availability metrics, Support effort, Time tracking, etc., then you're not focusing on the right things. Keep the Purpose of Production Support in mind in everything that you do.

Examples on Increasing Productivity: Part 1 (For Associates)

A reader asked me a good question: "Would you provide concrete examples on how to increase the productivity of my Production Support team?" Although I'll answer with points that would be important for any team, not just Support, I'll provide examples that apply more directly to Production Support groups.


So, let's start with what you can do as an associate to improve the productivity of your team. There's an implication, here, and that is, that productivity increases are not just the responsibility of managers (though we'll talk about things managers can do). First of all, be open, ask your manager the question "What can we do to be more Productive?" Many times as Support groups, we get bogged down in the day-to-day, tactical, activities and we don't spend enough time thinking strategically. A question like this one, during a team meeting, might spark a conversation with your entire group about the things that can be put in place. Collect those ideas and come up with approaches to get them effected.

Another thing you can do is determine what you can do to increase internal and external client satisfaction. Let me provide an example of each:
  • Internal: I just had a conversation last night with one of my directs. A user had asked a question and it was taking longer than normal to resolve it. The gist of it was that it was a different Support group who should have been handling the query, but somehow it landed on my team's lap. What my team had done was forward the e-mail to the other Support group and there had been no response. My challenge to the team was that we should take more ownership of issues. It would have been better to call the user to clarify the problem. Instead of sending an e-mail, it would have been better to engage the other Support group directly, over the phone, so that a richer conversation could have happened. This would have been a great opportunity to transfer accountability, reassign tickets, convey urgency, etc.
  • External: In a prior gig I had, we had many institutional banking customers who connected to our systems to receive prices on financial instruments. If a client was not connected to us, they were also not dealing with us. This means loss of revenue, of course. The went through the logs and found out approximate times that customers normally connected (we didn't have documented SLAs, a problem we inherited). We set up monitoring for each customer and we put a threshold on the monitoring such that, if they didn't connect after a period of time from when they normally did, an alert showed up in our dashboards. This prompted us to call the client and ask them to connect. Many times they didn't know they weren't connected. This small effort increased revenues for the bank and customers really appreciated being notified.

Even associates can help when it comes to expense reduction. I was at a company where we used a monitoring tool that cost over $1MM in licensing annually. It was quite feature rich, great graphical interface, etc. But as it turns out, we needed something a bit more basic. A simple dashboards that would display alerts was all that we needed. Most of the Support people in my group had Development backgrounds, so we took on a project to build a monitoring tool. A few months later we delivered the tool and were actually able to replace the vended software. We saved that $1MM in expense.

Increasing productivity might also be defined as stopping low value tasks and doing more productive tasks. I was in a Support group where the monitoring was quite noisy. There were tons of alerts and people would clear them out every day. Day in, day out, clear the alert. Repeat. Doing this is low value. Instead, we cleaned up the monitoring. We put a list together of noisy, false-positives and embarked on a project to clean them up: configuring the tool to ignore some, reclassifying the severity of the alert, removing the alert from the code altogether, etc. The now quiet monitoring tool enabled us to focus on more value added tasks, like building automation and putting together tools to help the support effort.

If you are a manager, click here to go to Part 2 of this article.

Monday, September 16, 2013

A Lesson Learned

We had reached a critical point in the meeting. For several weeks now, we'd been focused on defining a laundry list of projects that we were about to embark on. The goal was to standardize Support processes across the organization to achieve greater efficiencies in terms of: tracking metrics, managing incidents, monitoring and alerting, etc. You name it, we had it covered. All of the Support processes were to be same across the organization. Things were going to be much easier for everyone.


Right then, one of the managers declared that he had no interest in doing the work. So, we asked why. Was it a bandwidth concern? Was it a funding problem? Did she not find value in performing the work? The answer to all the questions was a No. So what was it we asked. Her response: "My manager simply doesn't care whether I do the work or not. She hasn't asked me to do it, so I don't think I really need to."

Someone else chimed in and said something quite similar.

There are several things we can learn from this story (true story, by the way). The first is for us Support managers out there:

Show your teams you care about their work.


Support teams go through a lot of stressful situations. It can be a thankless job. But if you as a manager don't take the time to acknowledge your group's efforts who will? There is nothing that kills momentum and initiative more than managers who don't recognize their groups' efforts. For Support teams, not having engaged managers who recognize the importance of the Support effort can be deadly: People get burnt out. The due diligence in monitoring goes away. They snooze on alerts instead of reacting aggressively. Or they simply feel too disengaged to work on those initiatives that can really make things better.

As managers we need to learn to take time to celebrate your teams' accomplishments. Send that thank you note or two. Gather the troops around and recognize that person who went the extra mile. A small cheer or clap might be all that's needed to re-energize that team member who used to be great but has fizzled out a bit.

For team members there's something to learn from this story as well:

Do the right thing.


Never stop doing the right thing, just because you don't think your manager cares. There's value in Availability metrics (this is a report on how well your apps are doing). Be relentless with Problem Management, this is what makes your applications more stable. Take on those projects that will help make it better all around. Never give up. Your efforts will be recognized. Who knows, perhaps one day you'll have the leadership of the group and you can be a different kind of manager.

In the end,We do what we do, not because our manager cares. We do it to enable a business. It takes a special person to wake up in the morning, know you're going to do Support and still come into the office with a smile on your face. In many respects the terms Application Support or Production Support don't really do justice to what we do.

So, keep the goal in mind and keep driving towards it. Your business will thank you for it and you'll feel much better about those daily achievements that come from Production Support.

Friday, September 13, 2013

The Info You Need When You Need it Most: Runbooks

For this post, I'm going to continue to focus on the knowledge aspect of an application. In particular, I'll talk about Runbooks.


Runbooks should be the first point of reference for anything related to an application. Each and every application you support should have a runbook. Otherwise, it would be like flying an airplane without a manual (for those who didn't catch the reference, every pilot has to use the airplane manual when starting it, no matter how familiar they are with the model).

Runbooks should contain some key information about an application.

The most important section a runbook should contain is a Business Context section which provides the users some idea of the business processes, their criticality and potential financial impact. Most runbooks I've seen don't contain this section, but I like to have this in place. This section should help to further solidify to a group of techies that they don't support some technology or application, but a business instead.

Runbooks should inform the analyst about the Architecture of an application. It should provide an overview of the servers and databases they communicate with. The Architecture section should provide a network context for the application, as well. It should also depict any middleware being used and also provide an idea of other upstream and downstream dependencies.

Another key section for the Runbook is an Administration section. This section should provide the user information about things like how to restart processes, scheduled jobs, breakglass procedures and start/end of day checks.

Likely, the most critical section in a runbook, when it comes to incidents, is a Monitoring and Alerting section. This section of the runbook should provide a list of common alerts and how to resolve them. This section might also contain information about the eyes-on-glass procedures for monitoring the application.

Next in criticality from the Monitoring and Alerting section is the Escalation section. The contact details for Development and Key Business users should be documented there. Also, contact information for key Infrastructure teams and Upstream/Downstream teams should be captured.

A section which provides more detail about how the application works would be an Application Deployment section. This section should contain information like which locations an application is deployed in and what dependencies it has.

The Monitoring and Alerting section should be supplemented with a Troubleshooting section which captures the most common issues, known bugs and limitations.

A Tools section in a runbook which contains the common tools the team utilizes for troubleshooting might be a good thing to document as well. New team members would certainly appreciate having a handy list of the tools their teammates use and perhaps links to downloading/installing these tools should be there as well.

A final word about Runbooks. Do you want to assess your team's proficiency when it comes to application knowledge? Make a bulleted list with each section of your runbook. Pick some topics from each section and make a little quiz. You'll now have a quick and dirty way to find out their proficiency level.

Monday, September 9, 2013

Four Key Areas your Support Team Needs Training In

Everyone knows their staff needs training, but have we given any thought into what training should entail? Let's start with answering WHAT training should entail.
There are four key areas your Production Support Staff needs to be trained on:
  1. The Application: You cannot be successful unless your Production Support staff knows the application they're supporting. You can have all the process in the world, but if your Support guys don't know the application, they won't be able to support it. I've seen very few Production Support teams who have staff training plans, especially for new hires. With the way Production Support teams are budgeted for, this is a mistake. Typically the reason you're hiring someone is because you have some urgent need: someone left or you're taking on Support for a new tool. However, if no staffing plan is in place, it'll take six months to a year, depending on the complexity of your application, before that new joiner is truly productive.
  2. Your Processes: Your team needs to know what they need to do to ensure your processes are being followed. A good portion of the posts on this blog so far have been focused on the necessary Support processes. For example, unless your team members know how to enter a ticket (and why this is necessary), they won't do it correctly. This could mean that your metric tracking will be off, as perhaps you won't have a record of key issues or all Support effort. Not following correct change management process could mean a botched release, or even worse, some really uncomfortable meetings with auditors.
  3. Your Releases: Just because someone knows the application, doesn't mean they know all about the new features being pumped into it with every release. You can consider initial Application training to be more like you providing your staff member a snapshot of it. But keep in mind that the Application will continue to evolve.
  4. Your Stakeholders: Again, we Support a business. Support staff need to know who their business is, who the key players are and how it's organized. What things are urgent to the business should also be covered.

Now, let me provide you some ideas into HOW you can train your staff.

In the simplest case, put together a list of topics starting with an architectural overview of the application, moving into the key areas of the app and concluding with who the business is (and how they're organized). Use that as a template to put together a PowerPoint deck that will cover key highlights of each topic and have one of your staff members provide an overview to any new staff arriving.

In order to make the process more efficient, as you cover the slides, do it in a tool like Webex. Use Webex to record the session and the information being presented. Now, you'll be able to distribute that out to new staff without the need for a presenter.

For your processes, you can do more scenario-based type training. For example, you could use some of your previous incidents to come up with a scenario to train people in your Incident Management process. You can try to mimic the situation and allow the person being trained to explain what they would do to resolve the issue. It's important that this be detailed enough to determine gaps and provide suggestions for improvement.

Finally, for releases, it's important that your team has a forum with the Development team, for at least an hour, if not more. At a basic level you can go through the release notes and have a Q&A session. Ideally, however, your Development staff is coming up with more polished training decks that they want to cover with you for every release. Find a way to get buy-in from your Development team to ensure that if there is no training they will have to help you support any issues related to new features. At least have them understand that you will fire call them (in the middle of the night, if needed) if they don't train you.

In any case, never underestimate the need to train your staff. There is nothing more discouraging to a Support team than having someone who cannot contribute to the Support effort. There's also nothing more frustrating for a Support analyst than not being able to help. So, you have to give all your team members a fighting chance at success.

Friday, September 6, 2013

Be a Part of the Solution: Be a Hero

Remember the purpose of Production Support? There are several things that we should consider doing, as individuals, in order to accomplish our purpose as a Production Support team:
  1. Being flexible. I once knew a Production Support manager who's contention was that Production Support teams are "gatekeepers" of Production. This is a true statement. What wasn't true was his contention that any release that went into Production had to follow the Production Support procedures to a tee and that if the Development team didn't follow them, the Prod Support team should push back and delay the release. Does this meet the goal? The answer is no. Though we all want a perfect world, sometimes we need to be flexible in order to get to the goal. If the business needs a new feature urgently, to take advantage of a market opportunity, is it reasonable to expect them not to make money because Production Support didn't have all the documentation checked off? I don't think so. As with many things, finding the right balance between due diligence and meeting business objectives is an art.
  2. Raising our hands. In all companies there are two types of people: those who do and those who don't. You've seen the ones who don't. They have an opinion about everyting in meetings. They're experts at letting you know what shouldn't be done and not what should be done. They're great at telling you that your approach won't work, despite evidence to the contrary. But when it comes time to do the actual work, they vanish. I always encourage Production Support team members to be the doers. All that stuff that no one wants to do, but which is important should be something that we should be willing to do. If our aim is to meet our purpose as a Production Support team: do that break-fix even though your stated SLA says you only do Level 2 Support, manage that special project which really should be done a different group, answer the call when someone asks in a meeting who will do the work and you hear silence. Remember not to argue over who's going to do it, as that takes more time, energy and money sometimes, than actually doing the work.
  3. Doing the right thing. Many times it's difficult to do the right thing, especially when the number of issues has been high and you feel mentally and physically fatigued. At times, some of the most rewarding work can come when you push yourself a little and you do those things everyone hates doing. For example: clean up that monitoring system and find a way to remove that false-positive alert instead of just clearing it; create that ticket in the system even though the issue is already fixed; run that report and evaluate trends to make sure that the system won't run into issues. The more disciplined you are at doing those little things everyone hates doing, the easier the work will become for you and your teammates.
So be daring. Be a part of the solution, not the problem. Enable your business. In short, don't be afraid to be a hero.

Wednesday, September 4, 2013

Follow The Sun Success: Handovers

According to Wikipedia, Follow-the-Sun, is a type of global workflow in which tasks are passed around daily between work site that are many time zones apart. The idea behind follow-the-sun is that work will never stop.

Though evidence suggests that Follow-the-Sun Software Development doesn't work, this is not the case for Production Support. Follow-the-Sun can and does work, if Prod Support teams are willing to implement a few best practices to enable collaboration.

In the shops where I've worked, the most common setup is to cover 12 hours out of AMRS and 12 hours from India. Typically each region has an early (8:00 AM - 5:00 PM) and a late (11:00 PM - 8:00 PM) shift. The shifts are modified for Daylight Saving Time adjustments. Another common setup is to cover 8 hours out of AMRS, 8 hours out of APAC and 8 hours out of EMEA (where there is overalp between the shifts).

The most important process that teams need to implement, is likely the Handover process. During the handover, the accountability for the work transitions from one region to the next. There are two items that make handovers successful, in my experience:
  1. Handover E-mails: Handover e-mails should contain information about open Incidents and Service Requests. They should also contain a recap of significant issues that occured during the shift, for example: Major Incidents or preventive restarts of running processes. Handover e-mails ensure that a snapshot of the work being handed off is captured, which provides better insight into accountability.
  2. Handover Calls: Handover calls should be short (about 30 minutes) and should be utilized to cover the open tickets being handed off. It is a forum to allow for clarification of what needs to be done. It is NOT a forum that should be used to work on the tickets. Handover calls should start and end on time and should have enough representation for each ticket being handed off. During the handover call, tickets should be reassigned to people in the upcoming shift. At no point should a ticket remained assigned to people in the shift handing off, as accountability is lost and the work won't continue on it until the following day.

Another best practice that enables follow-the-sun success for Prod Support is that of implementing Start and End of Day Healthchecks. At the beginning of the week, a start-of-day check should be carried out to ensure systems are ready to perform business transactions. Then, at the end of each shift, the team receiving the system should carry out health checks to ensure that everything will run smoothly during their time. I find that doing it this way works better (than the team handing over doing them) for two reasons: 1) The team just starting is freshly rested (and isn't ready to run out the door) and 2) The team just starting will have the accountability for any issues (thus they are more invested in things not going wrong).

Start and End of Day Healthchecks should be documented. A summary of the checks performed and the results (perhaps with a Red/Amber/Green status) should be sent out to interested stakeholders to ensure that everyone is aware the system is ready. It's worth noting that healthchecks can be automated. If they are, the team receiving the handover would be responsible for correcting any anomalies the healthchecks might reveal.
Following these simple guidelines is easy and will work wonders in ensuring your Follow-the-Sun success!