The Opposite of Iterative Delivery

Iterative Delivery is a uniquely powerful method for adding value to a SaaS.  Other than iterative, there are no respectably named ways to deliver features, reskins, updates and bug fixes.  Big bangs, waterfalls, and Quarterly releases fill your customer’s hearts with dread.

Look at 5 antonyms for Iterative Delivery:

  • Erratic Delivery
  • Infrequent Delivery
  • Irregular Delivery
  • Overwhelming Delivery
  • Sporadic Delivery

If your customers used these terms to describe updates to your SaaS, would they be wrong?

Iterative Delivery is about delivering small pieces of value to your customers so often that they know you’re improving the Service, but so small that they barely notice the changes.

Don’t be overwhelming, erratic or infrequent – be iterative and delight your customers.

The Chestburster Antipattern

The Chestburster is an antipattern that occurs when transitioning from a monolith to services. 

The team sees an opportunity to exact a small piece of functionality from the monolith into a new service, but the monolith is the only place that handles security, permissions and composition.

Because the new service can’t face clients directly, the Chestburster hides behind the monolith, hoping to burst through at some later point.

The Chestburster begins as the inverse of the Strangler pattern, with the monolith delegating to the service instead of the new service delegates to the monolith.  

Why it’s appealing

The Chestburster’s appeal is that it gets the New Service up and running quickly.  This looks like progress!  The legacy code is extracted, possibly rewritten, and maybe better.

Why it fails

There is no business case for building the functionality the new service needs to burst through the monolith.  The functionality has been rewritten.  It’s been rewritten into a new service.  How do you go back now and ask for time to address security and the other missing pieces?  Worse, the missing pieces are usually outside of the team’s control; security is one area you want to leave to the experts.

Even if you get past all the problems on your side, you’ve created new composition complexities for the client.  Now the client has to create a new connection to the Chestburster and handle routing themselves.  Can you make your clients update?  Should you?

Remember The Strangler

If you want to break apart a monolith, it’s always a good idea to start with a Strangler. If you can’t set up a strangle on your existing monolith, you aren’t ready to start breaking it apart.

That doesn’t mean you’re stuck with the current functionality!

If you have the time and resources to extract the code into a new service, you have the time and resources to decouple the code inside of the monolith.  When the time comes to decompose into services, you’ll be ready.

Conclusion

The chestburster gives the illusion of quick progress; but quickly stalls as the team runs into problems they can’t control.  Overcoming the technical hurdles doesn’t guarantee that clients will ever update their integration.

Success in legacy system replacement comes by integrating first, and moving functionality second.  With the chestburster you move functionality first and probably never burst through.

Run to A Runbook

Giving users the ability to define their own searches, data segmentation and processes creates a lot of value for a SaaS.  The User Defined Parts of the codebase are also always going to contain the most “interesting” performance and scaling problems as users assemble the pieces into beautiful, powerful and mind boggling ways.

It’s not a bug, it’s performance

Performance bugs aren’t traditional bugs.  The code does come up with the right answer, eventually.  But when your clients think your system is slow, they don’t care why.  Whether it does too much work, can’t be run in parallel, or if your system allows the customer to shoot themselves in the foot, it’s all bugs to your clients.

You need to care about why because you get to do something to make things better.

Run to a Performance Runbook

A performance runbook can be nothing more than a list of tips and tricks for dealing with issues in User Generated land.  Because the problems aren’t bugs, they won’t leave obvious errors in the logs.  They require developing specialized techniques, tools, and pattern matching.

By writing down your debugging techniques, a runbook will help you diagnose problems faster.

Reduce Everyone’s Mental Load

Performance issues manifest everywhere in a tech stack.  The issues that a client is noticing are often far removed from the bottleneck.  

Having a centralized place to document issue triaging reduces the mental load on everyone in your organization.  Where do we start looking?  What’s that query?  A runbook helps you with those first common steps.

Support gets help with common trouble areas and basic solutions.  Listening to a client explain an issue and not being able to do anything but escalate is demoralizing for everyone involved.  Every issue support can fix improves the experience for the client and support.  Even something as simple as improving the questions support asks the client will pay off in time saved.

When senior support and developers are called in, they know that all the common solutions have been tried.  The basic questions have been asked and the data gathered.  They can skip the basics and move on to the more powerful tools and queries, saving everyone’s time.  New diagnosis and solutions go into the runbook making support more powerful.

The common questions and common solutions become automation targets.  You can proactively tell a client that they’re using the system “wrong”, and send them help and training materials.  The best support solutions are when you reach out to the client before they even realize they have a problem.

6 Questions To Start A Runbook

Common solutions to common problems?  Training?  Proactive alerting?  Sounds great, but daunting.

Runbooks are living documents.  The days when they were printed and bound into manuals ended decades ago.

Start small.

Talk to the developer who fixed the last issue:

  1. What did they look for in the logs?  
  2. What queries did they run?  
  3. What did they find? 
  4. How did they resolve the issue?

Write down the answers.  Repeat every time there’s a performance issue.

After a few incidents, patterns should emerge.

Bring what you’ve got to your support managers and ask:

  1. Could support have done any of the investigative work?
  2. If support had the answer, could they have resolved the issue? 

Help train support on what they can do, create tools for useful things support can’t do on their own.

Every time a problem gets escalated, that’s a chance to iterate and improve.

Conclusion – Runbooks Help Everyone

Building a performance runbook sounds a lot like accepting performance problems and working on mitigation.

Instead, it is about surfacing the performance problems faster, finding the commonalities, and fixing the underlying system.

Along the way the runbook improves the client experience, empowers support, and reduces the support load on developers.

Everyone wins when you run to a runbook!

Building Your Way Out OF A Monolith – Create A Seam

Why Build Outside The Monolith

When you have a creaky monolith the obvious first step is to build new functionality outside the monolith.  Working on a greenfield, without the monolith’s constraining design, bugs, and even programming language is highly appealing.

There is a tendency to wander those verdant green fields for months on end and forget that you need to connect that new functionality back to the monolith’s muddy brown field.

Eventually, management loses patience with the project and pushes the team to wrap up.  Integration at this point can take months!  Worse, because the new project wasn’t talking to the monolith, most of the work tends to be a duplication of what’s in the monolith.  Written much better to be sure!  But, without value to the client.

Integration is where greenfield projects die.  You have to bring two systems together, the monolith which is difficult to work with, and the greenfield, which is intentionally unlike the monolith.  Now you need to bring them together, under pressure, and deliver value.

Questions to Ask

When I start working with a team building outside their monolith, integration is the number one issue on my mind.

I push the team to deliver new functionality for the client as early as possible.  Here are 3 starting questions I typically ask:

  1. What new functionality are you building?  Not what functionality do you need to build; which parts of it are new for the client?
  2. How are you going to integrate the new feature into the monolith’s existing workflows?
  3. What features do you need to duplicate from the monolith?  Can you change the monolith instead?  You have to work in the monolith sooner or later.

First Create the Seam

I don’t look for the smallest or easiest feature.  I look for the smallest seam in the monolith.

For the feature to get used, the monolith must use it.  The biggest blocker, the most important thing, is creating a seam in the monolith for the new feature!

A seam is where your feature will be inserted into the workflow.  It might be a new function in a procedural straight away, an adapter in your OOP, or even a strangler at your load balancer.  

The important part is knowing where and how your feature will fit into the seam. 

Second Change The Monolith

Once you have a seam, you have a place to start modifying the monolith to support the feature.  This is critical to prevent spending time recreating existing functionality.

Instead of recreating functionality, refactor the seam to provide it to your new service.

Finally Build Outside the monolith

Now that the monolith has a spot for your feature in its workflow, and it can support the external service, building the feature is easy.  Drop it right in!

Now, the moment your external service can say “Hello World!”, it is talking to the monolith.  It is in production, and even if you don’t finish it 100%, the parts you do finish will still be adding value.  Odds are, since your team is delivering, management will be happy to let you go right on adding features and delivering value.

Conclusion

Starting with a seam lets you develop outside the monolith while still being in production with the first release.  No working in a silo for months at a time.  No recreating functionality.

It delivers faster, partially by doing less work, partially by enabling iterations.

2 Developers, A Mathematician and a Scrum Master Walk Into a Bar

And come up with “The Worst Coding Problem Ever” dun dun dun!

Imagine getting this whopper in an interview or a take home test:

The United States has been conducting a census once a decade for over 200 years.

Imagine you can iterate the data at a family level, with the family data being whatever format/object is easiest for you. 

Find the family with the longest fibonacci sequence of children.

The most fundamental issue is that it’s not clear what the answer looks like.  In fact, the 4 of us had 3 different interpretations of what the answer would look like.

Is the question looking for children’s ages going forward?

That would be an age sequence of 0, 1, 1, 2, 3, 5, etc

Or a newborn, a pair of 1 year old twins, a 2 year old, 3 year old, 5 year old, etc

Or is it looking for children born in the sequence?  (This is the inverse of the first answer)

A 6 year old, 5 year old twins, a 3 year old and a newborn

Or is it asking about the age gap between children?

In that case you’d be hunting for Twins (gap of 0), a gap of 1 year, a second gap of 1 year, a gap of 2 years, etc.

There are so many ways to be the family fibonacci.

Many Technical Problems are like this

Fairly straightforward computer problems with meaningless mathematics sprinkled on top.  Being asked by people who won’t know the implications of any of the 3 answers. 

But what’s the answer?

If you are presented with this question in an interview, the correct answer is to thank the interviewer for their time, wish them the best of luck in their search, and end the interview.

Scaling is Legacy System Rescue That Pays 4x

In my last article, You Won’t Pay Me to Rescue Your Legacy System, I talked about my original attempt at specializing, and why it didn’t work.  I bumbled along until I lucked into a client that helped me understand when Legacy System Rescue becomes an Expensive Problem.

Rather than Legacy System Rescue, I was hired to do “keep the lights on” work.  The company had a 3 developer team working on a next generation system, all I had to do was to keep things running as smoothly as possible until they delivered.

The legacy system was buckling under the weight of their current customers.  Potential customers were waiting in line to give them money, and had to be turned.  Active customers were churning because the platform was buckling.

That’s when I realized – Legacy System Rescue may grudgingly get a single developer, but Scaling gets three developers to scale and one to keep the lights on.  Scaling is an expensive problem because it involves churning existing customers and turning away new ones.

Over 10 months I iteratively rescued the legacy system by fixing bugs and removing choke points.  After investing over 50 developer months, the next generation system was completely scrapped.

The Lesson – Companies won’t pay to rescue a legacy system, but they’d gladly pay 4x to scaleup and meet demand.

You Won’t Pay Me to Rescue Your Legacy System

When I first started consulting, I tried to specialize in Legacy System Rescue.  I quickly learned that this is terrible positioning because Legacy System Rescue isn’t an Expensive Problem.  Jonathan Stark defines an Expensive Problem as a problem that someone would like to spend a lot of money on to solve right now.

Legacy System Rescue is certainly a Big Problem.  Everyone agrees that a buggy system that makes development slow and painful is bad.  Errors in production are bad.  Spending time and resources to mitigate production outages are bad.  But there is no immediacy.  There’s no reason to spend a lot of money right now instead of waiting until the next feature ships, or the next quarter.  Letting things go just a little bit longer is usually why the system needs a rescue.

Hiring someone like me to come in, analyze the codebase and find a way to untangle the mess is a lot of work.  Fixing bugs and making it easy to add new features is a low leverage situation.  It takes a lot of time by highly skilled developers.  Highly skilled developers in low leverage situations makes Legacy System Rescue an Expensive Solution.  It will probably pay off for the company, but no one department is going to get enough value from fixing the legacy system to cover the costs.  The ROI gets worse when you factor in the resentments of the developers.  Bringing in an outsider to judge their work and dictate fundamental changes doesn’t fill people with joy.

Combine the two and you have a Tragedy of the Commons – a Big Problem that requires an Expensive Solution.  What you don’t have is a business case to spend a lot of money, right now, to fix things.

You won’t pay me to rescue your legacy system because paying a lot, right now, for the solution is worth less to you than living with the problem.

Tenancy Model Roundup

Over the past few months I have been ruminating on SaaS Tenancy Models and how they drive architectural decisions.  I hope you’ve enjoyed the series as I’ve scratched my itch.

Here is a roundup of the 7 articles In case you missed any of the parts, or need a handy index to what I’m sure is the most in depth discussion of SaaS Tenancy Models ever written.

Part 1 – An introduction to SaaS Tenancy Models

Part 2 – An addendum to the introduction

Part 3 – How growth and scale drive tenancy model changes

Part 4 – Regaining Effective Single Tenancy through Cell Isolation

Part 5 – Why your job service should be Multi-Tenant even if your model is Single Tenant

Part 6 – Whose data is it anyway, why you need to separate your SaaS’s data from your clients

Part 7 – 3 Signs your resource allocation model is working against you