Run to A Runbook

Giving users the ability to define their own searches, data segmentation and processes creates a lot of value for a SaaS.  The User Defined Parts of the codebase are also always going to contain the most “interesting” performance and scaling problems as users assemble the pieces into beautiful, powerful and mind boggling ways.

It’s not a bug, it’s performance

Performance bugs aren’t traditional bugs.  The code does come up with the right answer, eventually.  But when your clients think your system is slow, they don’t care why.  Whether it does too much work, can’t be run in parallel, or if your system allows the customer to shoot themselves in the foot, it’s all bugs to your clients.

You need to care about why because you get to do something to make things better.

Run to a Performance Runbook

A performance runbook can be nothing more than a list of tips and tricks for dealing with issues in User Generated land.  Because the problems aren’t bugs, they won’t leave obvious errors in the logs.  They require developing specialized techniques, tools, and pattern matching.

By writing down your debugging techniques, a runbook will help you diagnose problems faster.

Reduce Everyone’s Mental Load

Performance issues manifest everywhere in a tech stack.  The issues that a client is noticing are often far removed from the bottleneck.  

Having a centralized place to document issue triaging reduces the mental load on everyone in your organization.  Where do we start looking?  What’s that query?  A runbook helps you with those first common steps.

Support gets help with common trouble areas and basic solutions.  Listening to a client explain an issue and not being able to do anything but escalate is demoralizing for everyone involved.  Every issue support can fix improves the experience for the client and support.  Even something as simple as improving the questions support asks the client will pay off in time saved.

When senior support and developers are called in, they know that all the common solutions have been tried.  The basic questions have been asked and the data gathered.  They can skip the basics and move on to the more powerful tools and queries, saving everyone’s time.  New diagnosis and solutions go into the runbook making support more powerful.

The common questions and common solutions become automation targets.  You can proactively tell a client that they’re using the system “wrong”, and send them help and training materials.  The best support solutions are when you reach out to the client before they even realize they have a problem.

6 Questions To Start A Runbook

Common solutions to common problems?  Training?  Proactive alerting?  Sounds great, but daunting.

Runbooks are living documents.  The days when they were printed and bound into manuals ended decades ago.

Start small.

Talk to the developer who fixed the last issue:

  1. What did they look for in the logs?  
  2. What queries did they run?  
  3. What did they find? 
  4. How did they resolve the issue?

Write down the answers.  Repeat every time there’s a performance issue.

After a few incidents, patterns should emerge.

Bring what you’ve got to your support managers and ask:

  1. Could support have done any of the investigative work?
  2. If support had the answer, could they have resolved the issue? 

Help train support on what they can do, create tools for useful things support can’t do on their own.

Every time a problem gets escalated, that’s a chance to iterate and improve.

Conclusion – Runbooks Help Everyone

Building a performance runbook sounds a lot like accepting performance problems and working on mitigation.

Instead, it is about surfacing the performance problems faster, finding the commonalities, and fixing the underlying system.

Along the way the runbook improves the client experience, empowers support, and reduces the support load on developers.

Everyone wins when you run to a runbook!

Leave a Reply