Acceptable Beer Bellies in your codebase

How do Beer Bellies begin?

The panel is fully functional, when you push the button a light turns on and the elevator comes.  It is also obviously wrong - the top button is flush with the mount, and the bottom button sticks out.

I found this sad, Beer Belly Elevator Panel, at a high end resort and wondered how it happened.

Certainly whomever installed the mismatched button knew it was wrong.  Did the tech not care?  Was using the wrong button the only way to get the panel repaired?  Was the plan to come back and fix it when the right parts came in?

The hotel maintenance staff had to sign off on it.  Did they care about the quality of the repair?  Were they only able to give a binary assessment of “working” or “not working”?

Did the hotel manager not care?  Were they told to keep costs down?  It isn’t broken now, it would be a waste to fix something that wasn’t broken.

Quality vs Letting Your Gut Hang Out

Employees at the hotel see the mismatched panel every day.  It is a constant reminder that letting things slide, just a little, is acceptable at this hotel.

When you let consistency and quality slide because something works, you’re creating beer bellies in your codebase.

One small button at a time until everyone sees that this is acceptable here.

So long as a light turns on when you hit the button does it matter if the light is green, red or blue?  Does it matter if the light is in the center or on the edge?

But I’m running a SaaS, not a Hotel

Your SaaS may not maintain elevator panels, but your codebase is probably full of beer bellies.

“It works, we’ll clean it up on the next release” bellies.

“This is a hack” bellies.

“This is the legacy version, we’re migrating off of it” bellies.

When you let sad little beer bellies into your codebase, your employees see exactly what you find acceptable.

Is This A Quality Panel?

I recently stepped into an elevator and saw this panel:

The panel was clean, full of high quality materials, and everything worked.

Quality is about more than the functionality, does this look like a quality panel?

Everything works!

Push a button and it lights up!

Sure, it might light up red, green or blue. 
And the light might be around the edge or in the center. 
And some of the buttons are flush with the mount, while others extend out; but that doesn't impact the light turning on.

There are 12 possible button implementations, and 5 of them appear randomly.

But when you push the button, the light turns on!

What does that have to do with SaaS Scaling?

No matter how excellent any individual endpoint implementation is, having an API with endpoints that work differently decreases the overall quality of your product.

Having a UI with mismatched widgets and styles increases the user’s cognitive load and decreases quality, even when the differences don’t change any functionality.

Consistency during the scaleup period can be difficult as multiple new teams spin up, but it’s critically important if you want a quality product.

Do You Punish Customers For Loyalty?

Does your Customer’s experience with your service get better over time?

Does it get worse?

SaaS software often punishes long term clients in subtle and frustrating ways.

Do your CRM customer screens show a decade of buying history?

How many emails can a contact open before you can’t open the contact?

Do marketing campaigns, contact lists, and tags accumulate over the years?

Do database inserts slow down as you write the 10 millionth row into a log table?

There are countless ways to punish customers for staying with you for years.  It’s not a startup problem, it sneaks in as you become a scaleup.  The flood of new customers blinds you to the slow leak as your most loyal customers churn.

When your longest customers complain about performance more than your largest, chances are your software is punishing them for being loyal.

Getting Started With Iterative Delivery

The last 4 posts have been trying to convince you that iterative, baby step, delivery, is better for your clients than moonshot, giant step delivery:

But how do you get started?  How do you shorten your stride from shooting the moon, to one small step?

The next series of posts is going to lay out my scaling iterative delivery framework.  This site is about scaling SaaS software, and this framework works best if you want an order of magnitude more of what you already offer your clients.  This isn’t a general framework, and it certainly isn’t the only way to get started with iterative delivery.

Work your way through these steps:

  1. Pick a goal - 1 sentence, highly aspirational and self explanatory.
  2. Define the characteristics of your goal - What measurable characteristics does your system need in order to achieve your goal?
  3. What are the implications? - What technical things would have to be true in order for your system to have all the characteristics you need?
  4. What are the blockers? - What is stopping you from making the implications true?
  5. What can you do to weaken the blockers? - Set aside the goal, characteristics and implications; what can you do to weaken the blockers?

Weakening the blockers is where you start delivering iteratively.  As the blockers disappear, your system becomes better for your clients and easier for you to implement your technical needs.

We will explore each step in depth in the following posts.

The Opposite of Iterative Delivery

Iterative Delivery is a uniquely powerful method for adding value to a SaaS.  Other than iterative, there are no respectably named ways to deliver features, reskins, updates and bug fixes.  Big bangs, waterfalls, and Quarterly releases fill your customer’s hearts with dread.

Look at 5 antonyms for Iterative Delivery:

  • Erratic Delivery
  • Infrequent Delivery
  • Irregular Delivery
  • Overwhelming Delivery
  • Sporadic Delivery

If your customers used these terms to describe updates to your SaaS, would they be wrong?

Iterative Delivery is about delivering small pieces of value to your customers so often that they know you’re improving the Service, but so small that they barely notice the changes.

Don’t be overwhelming, erratic or infrequent - be iterative and delight your customers.

The Chestburster Antipattern

The Chestburster is an antipattern that occurs when transitioning from a monolith to services. 

The team sees an opportunity to exact a small piece of functionality from the monolith into a new service, but the monolith is the only place that handles security, permissions and composition.

Because the new service can’t face clients directly, the Chestburster hides behind the monolith, hoping to burst through at some later point.

The Chestburster begins as the inverse of the Strangler pattern, with the monolith delegating to the service instead of the new service delegates to the monolith.  

Why it’s appealing

The Chestburster’s appeal is that it gets the New Service up and running quickly.  This looks like progress!  The legacy code is extracted, possibly rewritten, and maybe better.

Why it fails

There is no business case for building the functionality the new service needs to burst through the monolith.  The functionality has been rewritten.  It's been rewritten into a new service.  How do you go back now and ask for time to address security and the other missing pieces?  Worse, the missing pieces are usually outside of the team’s control; security is one area you want to leave to the experts.

Even if you get past all the problems on your side, you’ve created new composition complexities for the client.  Now the client has to create a new connection to the Chestburster and handle routing themselves.  Can you make your clients update?  Should you?

Remember The Strangler

If you want to break apart a monolith, it’s always a good idea to start with a Strangler. If you can’t set up a strangle on your existing monolith, you aren’t ready to start breaking it apart.

That doesn’t mean you’re stuck with the current functionality!

If you have the time and resources to extract the code into a new service, you have the time and resources to decouple the code inside of the monolith.  When the time comes to decompose into services, you’ll be ready.

Conclusion

The chestburster gives the illusion of quick progress; but quickly stalls as the team runs into problems they can’t control.  Overcoming the technical hurdles doesn’t guarantee that clients will ever update their integration.

Success in legacy system replacement comes by integrating first, and moving functionality second.  With the chestburster you move functionality first and probably never burst through.

Building Your Way Out OF A Monolith – Create A Seam

Why Build Outside The Monolith

When you have a creaky monolith the obvious first step is to build new functionality outside the monolith.  Working on a greenfield, without the monolith’s constraining design, bugs, and even programming language is highly appealing.

There is a tendency to wander those verdant green fields for months on end and forget that you need to connect that new functionality back to the monolith’s muddy brown field.

Eventually, management loses patience with the project and pushes the team to wrap up.  Integration at this point can take months!  Worse, because the new project wasn’t talking to the monolith, most of the work tends to be a duplication of what’s in the monolith.  Written much better to be sure!  But, without value to the client.

Integration is where greenfield projects die.  You have to bring two systems together, the monolith which is difficult to work with, and the greenfield, which is intentionally unlike the monolith.  Now you need to bring them together, under pressure, and deliver value.

Questions to Ask

When I start working with a team building outside their monolith, integration is the number one issue on my mind.

I push the team to deliver new functionality for the client as early as possible.  Here are 3 starting questions I typically ask:

  1. What new functionality are you building?  Not what functionality do you need to build; which parts of it are new for the client?
  2. How are you going to integrate the new feature into the monolith’s existing workflows?
  3. What features do you need to duplicate from the monolith?  Can you change the monolith instead?  You have to work in the monolith sooner or later.

First Create the Seam

I don’t look for the smallest or easiest feature.  I look for the smallest seam in the monolith.

For the feature to get used, the monolith must use it.  The biggest blocker, the most important thing, is creating a seam in the monolith for the new feature!

A seam is where your feature will be inserted into the workflow.  It might be a new function in a procedural straight away, an adapter in your OOP, or even a strangler at your load balancer.  

The important part is knowing where and how your feature will fit into the seam. 

Second Change The Monolith

Once you have a seam, you have a place to start modifying the monolith to support the feature.  This is critical to prevent spending time recreating existing functionality.

Instead of recreating functionality, refactor the seam to provide it to your new service.

Finally Build Outside the monolith

Now that the monolith has a spot for your feature in its workflow, and it can support the external service, building the feature is easy.  Drop it right in!

Now, the moment your external service can say “Hello World!”, it is talking to the monolith.  It is in production, and even if you don’t finish it 100%, the parts you do finish will still be adding value.  Odds are, since your team is delivering, management will be happy to let you go right on adding features and delivering value.

Conclusion

Starting with a seam lets you develop outside the monolith while still being in production with the first release.  No working in a silo for months at a time.  No recreating functionality.

It delivers faster, partially by doing less work, partially by enabling iterations.

Tenancy Model Roundup

Over the past few months I have been ruminating on SaaS Tenancy Models and how they drive architectural decisions.  I hope you’ve enjoyed the series as I’ve scratched my itch.

Here is a roundup of the 7 articles In case you missed any of the parts, or need a handy index to what I’m sure is the most in depth discussion of SaaS Tenancy Models ever written.

Part 1 - An introduction to SaaS Tenancy Models

Part 2 - An addendum to the introduction

Part 3 - How growth and scale drive tenancy model changes

Part 4 - Regaining Effective Single Tenancy through Cell Isolation

Part 5 - Why your job service should be Multi-Tenant even if your model is Single Tenant

Part 6 - Whose data is it anyway, why you need to separate your SaaS’s data from your clients

Part 7 - 3 Signs your resource allocation model is working against you

3 Signs Your Resource Allocation Model Is Working Against You

3 Signs Your Resource Allocation Model Is Working Against You

After 6 posts on SaaS Tenancy Models, I want to bring it back to some concrete examples.  When your SaaS has a Single Tenant model, clients expect to allocate all the resources they need, whenever they want.  When every client is entitled to the entire resource pool, no client gets a great customer experience.

Here are 3 signs your Resource Allocation Model is working against you:

  1. Large clients cause small client’s work to stall
  2. You have to rebalance the mix of clients in a cell for stability
  3. Run your job at night for best performance

Large clients cause small client’s work to stall

This is a classic “noisy neighbor” problem.  Each client tries to claim all the shared resources needed to do their work.  This isn’t much of a problem when none of the clients need a significant percentage of the pool.  When a large client comes along, it drains the pool, and leaves your small clients flopping like fish out of water.

You have to rebalance the mix of clients in a cell for stability

When having multiple large clients in a cell affects stability, the short term solution is to migrate some clients to another cell.  Large clients can impact performance, but they should not be able to impact stability.  Moving clients around buys you time, but it also forces you to focus on smaller, less profitable clients.

Run your job at night for best performance

This is advice that often pops up on SaaS message boards.  Don’t try to run your job during the day, schedule it to run in the evening so it is ready for the morning.  When clients start posting workarounds to your problems, it’s a clear sign of frustration.  Your clients are noticing that performance varies by the time of day.  They are building mental models of your platform and deciding you have load and scale issues.  By being helpful to each other, your clients are advertising your problems.

Conclusion

These 3 issues have the same root cause; your SaaS’s operational data is mixed in with client data.  If you have any of these three problems, the time has come to separate your data from the clients’.

Fixing these problems won’t be easy or quick!  

The good news is that you can separate the data and change your resource allocation model in an iterative fashion.  Start by pushing your job service across the tenancy line.

Get value and regain control one incremental step at a time, and never do a rewrite!

Whose Data Is It Anyway?

The tenancy line is a useful construct for separating SaaS data from client data.  When you have few clients, separating the data may not be worth the effort of having multiple data stores.  As your system grows, ensuring that client data is separated from the SaaS data becomes as critical as ensuring the clients’ data remains separate from each other.

Company data is everything operational.  Was an action successful?  Does it need to be retried?  How many jobs are running right now?  This data is extremely important to make sure that client jobs are run efficiently, but it’s not relevant to clients.  Clients care about how long a job takes to complete, not about your concurrency, load shaping, or retry rate.

While the data is nearly meaningless to your clients, it is useful to you.  It becomes more useful in aggregate.  It has synergy.  A random failure for one client becomes a pattern when you can see across all clients.  When operational data is stored in logically separated databases you quickly lose the ability to check the data.  This is when it becomes important to separate operational data from clients.  

Pull the operational data from a single client into a multi-tenant repository for the SaaS, and suddenly you can see what’s happening system wide.  Instead of only seeing what’s happening to a single client, you see the system.

Once you can see the system, you can shape it.  See this article for a discussion on how.

Other considerations

If visibility isn’t enough, extracting operational data is usually its own reward.

Operational data is usually high velocity - tracking a job’s progress involves updating the status with every state change.  If your operational store is the same as the client store, tracking progress conflicts with the actual work.

Site Footer