Is This A Quality Panel?

I recently stepped into an elevator and saw this panel:

The panel was clean, full of high quality materials, and everything worked.

Quality is about more than the functionality, does this look like a quality panel?

Everything works!

Push a button and it lights up!

Sure, it might light up red, green or blue. 
And the light might be around the edge or in the center. 
And some of the buttons are flush with the mount, while others extend out; but that doesn't impact the light turning on.

There are 12 possible button implementations, and 5 of them appear randomly.

But when you push the button, the light turns on!

What does that have to do with SaaS Scaling?

No matter how excellent any individual endpoint implementation is, having an API with endpoints that work differently decreases the overall quality of your product.

Having a UI with mismatched widgets and styles increases the user’s cognitive load and decreases quality, even when the differences don’t change any functionality.

Consistency during the scaleup period can be difficult as multiple new teams spin up, but it’s critically important if you want a quality product.

Picking an Iterative Goal at a Scaleup

Note: This is part of my series on Iterative Delivery

When you are in Scaleup mode, picking a goal to iterate on should be straightforward.

In Scaleup mode, picking an iterative goal should be straightforward.

What can’t you deliver?

Are you attracting larger clients and discovering your software can’t handle their size?

Do you have a swarm of small clients overwhelming the backend?

Does throwing money at your problems keep the software running smoothly, but unprofitably?

Your goal should be a single, short, aspirational sentence.  

If you get stuck, try the “We should be able to ___”  template:

We should be able to support clients of any size!

We should be able to support any number of clients!

We should be able to support clients profitably!

You don’t need to have any idea how to achieve your goal, your goal might not even be achievable.

The important thing is that you can clearly state your goal and explain it to others.

Getting Started With Iterative Delivery

The last 4 posts have been trying to convince you that iterative, baby step, delivery, is better for your clients than moonshot, giant step delivery:

But how do you get started?  How do you shorten your stride from shooting the moon, to one small step?

The next series of posts is going to lay out my scaling iterative delivery framework.  This site is about scaling SaaS software, and this framework works best if you want an order of magnitude more of what you already offer your clients.  This isn’t a general framework, and it certainly isn’t the only way to get started with iterative delivery.

Work your way through these steps:

  1. Pick a goal - 1 sentence, highly aspirational and self explanatory.
  2. Define the characteristics of your goal - What measurable characteristics does your system need in order to achieve your goal?
  3. What are the implications? - What technical things would have to be true in order for your system to have all the characteristics you need?
  4. What are the blockers? - What is stopping you from making the implications true?
  5. What can you do to weaken the blockers? - Set aside the goal, characteristics and implications; what can you do to weaken the blockers?

Weakening the blockers is where you start delivering iteratively.  As the blockers disappear, your system becomes better for your clients and easier for you to implement your technical needs.

We will explore each step in depth in the following posts.

When Your Customers Struggle

When your customers struggle with a non-intuitive interface, would they rather:

Wait 6 months while you work on an amazing redesign and brand refresh, or get weekly tweaks that make using your product easier and more intuitive?

Branding and color schemes are about you, improving the UI is about your customers.

Don’t put off customer steps while working on a moonshot like a branding refresh.

Moonshots vs Baby Steps at A Scaleup

You’re A Scaleup!  What’s a Scaleup?

You become a Scaleup when your SaaS’s service offering becomes compelling and you start attracting exponentially more clients.

All at once you have a lot more clients, clients with a lot more data.

Solutions that support 1,000 clients buckle as you pass 5,000.  Suddenly, 25,000 clients is only months away.

Services that support hundreds of thousands of transactions a day fall hopelessly behind as you onboard clients with millions of transactions.

You finally know what customers want.  You quickly find the edges of your system.  Money is rolling in from customers and VCs.  You can throw money at the problems to literally buy time to find a solution.

But you’re faced with a looming question - moonshots or baby steps.

Moonshots Are About You, Baby Steps Are About Your Clients

It’s not about you or your SaaS, it’s about your client’s outcomes.

Moonshots are appealing because they take you directly to where you need to be.  Your system needs to scale 10x today and 100x next year; why not go straight for 100x?

Baby steps feel like aiming low because the impact on you is small.  But it’s not about you!  Think about the impact on your clients.

From a technology perspective, sending emails 1% faster is ::yawn::

But for your clients, faster emails means more engagement, which means more sales.

Would your clients rather have more sales this week, compounding every week for the next year, or flat sales for a year while you build a moonshot?

Clients who churn, or go out of business, won’t get value from the moonshot.  Even if you deliver greater value eventually, your clients are better off getting some value now.

Are you delivering value to your SaaS or your clients?

The Chestburster Antipattern

The Chestburster is an antipattern that occurs when transitioning from a monolith to services. 

The team sees an opportunity to exact a small piece of functionality from the monolith into a new service, but the monolith is the only place that handles security, permissions and composition.

Because the new service can’t face clients directly, the Chestburster hides behind the monolith, hoping to burst through at some later point.

The Chestburster begins as the inverse of the Strangler pattern, with the monolith delegating to the service instead of the new service delegates to the monolith.  

Why it’s appealing

The Chestburster’s appeal is that it gets the New Service up and running quickly.  This looks like progress!  The legacy code is extracted, possibly rewritten, and maybe better.

Why it fails

There is no business case for building the functionality the new service needs to burst through the monolith.  The functionality has been rewritten.  It's been rewritten into a new service.  How do you go back now and ask for time to address security and the other missing pieces?  Worse, the missing pieces are usually outside of the team’s control; security is one area you want to leave to the experts.

Even if you get past all the problems on your side, you’ve created new composition complexities for the client.  Now the client has to create a new connection to the Chestburster and handle routing themselves.  Can you make your clients update?  Should you?

Remember The Strangler

If you want to break apart a monolith, it’s always a good idea to start with a Strangler. If you can’t set up a strangle on your existing monolith, you aren’t ready to start breaking it apart.

That doesn’t mean you’re stuck with the current functionality!

If you have the time and resources to extract the code into a new service, you have the time and resources to decouple the code inside of the monolith.  When the time comes to decompose into services, you’ll be ready.

Conclusion

The chestburster gives the illusion of quick progress; but quickly stalls as the team runs into problems they can’t control.  Overcoming the technical hurdles doesn’t guarantee that clients will ever update their integration.

Success in legacy system replacement comes by integrating first, and moving functionality second.  With the chestburster you move functionality first and probably never burst through.

Run to A Runbook

Giving users the ability to define their own searches, data segmentation and processes creates a lot of value for a SaaS.  The User Defined Parts of the codebase are also always going to contain the most “interesting” performance and scaling problems as users assemble the pieces into beautiful, powerful and mind boggling ways.

It’s not a bug, it’s performance

Performance bugs aren’t traditional bugs.  The code does come up with the right answer, eventually.  But when your clients think your system is slow, they don’t care why.  Whether it does too much work, can’t be run in parallel, or if your system allows the customer to shoot themselves in the foot, it’s all bugs to your clients.

You need to care about why because you get to do something to make things better.

Run to a Performance Runbook

A performance runbook can be nothing more than a list of tips and tricks for dealing with issues in User Generated land.  Because the problems aren’t bugs, they won’t leave obvious errors in the logs.  They require developing specialized techniques, tools, and pattern matching.

By writing down your debugging techniques, a runbook will help you diagnose problems faster.

Reduce Everyone’s Mental Load

Performance issues manifest everywhere in a tech stack.  The issues that a client is noticing are often far removed from the bottleneck.  

Having a centralized place to document issue triaging reduces the mental load on everyone in your organization.  Where do we start looking?  What’s that query?  A runbook helps you with those first common steps.

Support gets help with common trouble areas and basic solutions.  Listening to a client explain an issue and not being able to do anything but escalate is demoralizing for everyone involved.  Every issue support can fix improves the experience for the client and support.  Even something as simple as improving the questions support asks the client will pay off in time saved.

When senior support and developers are called in, they know that all the common solutions have been tried.  The basic questions have been asked and the data gathered.  They can skip the basics and move on to the more powerful tools and queries, saving everyone’s time.  New diagnosis and solutions go into the runbook making support more powerful.

The common questions and common solutions become automation targets.  You can proactively tell a client that they’re using the system “wrong”, and send them help and training materials.  The best support solutions are when you reach out to the client before they even realize they have a problem.

6 Questions To Start A Runbook

Common solutions to common problems?  Training?  Proactive alerting?  Sounds great, but daunting.

Runbooks are living documents.  The days when they were printed and bound into manuals ended decades ago.

Start small.

Talk to the developer who fixed the last issue:

  1. What did they look for in the logs?  
  2. What queries did they run?  
  3. What did they find? 
  4. How did they resolve the issue?

Write down the answers.  Repeat every time there’s a performance issue.

After a few incidents, patterns should emerge.

Bring what you’ve got to your support managers and ask:

  1. Could support have done any of the investigative work?
  2. If support had the answer, could they have resolved the issue? 

Help train support on what they can do, create tools for useful things support can’t do on their own.

Every time a problem gets escalated, that’s a chance to iterate and improve.

Conclusion - Runbooks Help Everyone

Building a performance runbook sounds a lot like accepting performance problems and working on mitigation.

Instead, it is about surfacing the performance problems faster, finding the commonalities, and fixing the underlying system.

Along the way the runbook improves the client experience, empowers support, and reduces the support load on developers.

Everyone wins when you run to a runbook!

Scaling is Legacy System Rescue That Pays 4x

In my last article, You Won’t Pay Me to Rescue Your Legacy System, I talked about my original attempt at specializing, and why it didn’t work.  I bumbled along until I lucked into a client that helped me understand when Legacy System Rescue becomes an Expensive Problem.

Rather than Legacy System Rescue, I was hired to do “keep the lights on” work.  The company had a 3 developer team working on a next generation system, all I had to do was to keep things running as smoothly as possible until they delivered.

The legacy system was buckling under the weight of their current customers.  Potential customers were waiting in line to give them money, and had to be turned.  Active customers were churning because the platform was buckling.

That’s when I realized - Legacy System Rescue may grudgingly get a single developer, but Scaling gets three developers to scale and one to keep the lights on.  Scaling is an expensive problem because it involves churning existing customers and turning away new ones.

Over 10 months I iteratively rescued the legacy system by fixing bugs and removing choke points.  After investing over 50 developer months, the next generation system was completely scrapped.

The Lesson - Companies won’t pay to rescue a legacy system, but they'd gladly pay 4x to scaleup and meet demand.

Tenancy Model Roundup

Over the past few months I have been ruminating on SaaS Tenancy Models and how they drive architectural decisions.  I hope you’ve enjoyed the series as I’ve scratched my itch.

Here is a roundup of the 7 articles In case you missed any of the parts, or need a handy index to what I’m sure is the most in depth discussion of SaaS Tenancy Models ever written.

Part 1 - An introduction to SaaS Tenancy Models

Part 2 - An addendum to the introduction

Part 3 - How growth and scale drive tenancy model changes

Part 4 - Regaining Effective Single Tenancy through Cell Isolation

Part 5 - Why your job service should be Multi-Tenant even if your model is Single Tenant

Part 6 - Whose data is it anyway, why you need to separate your SaaS’s data from your clients

Part 7 - 3 Signs your resource allocation model is working against you

Infrastructure Consolidation Drives Early Tenancy Migrations

For SaaS with a pure Single Tenant model, infrastructure consolidation usually drives the first two, nearly simultaneous, steps towards a Multi-Tenant model.  The two steps convert the front end servers to be Multi-Tenant and switch the client databases from physical to logical isolation.  These two steps are usually done nearly simultaneously as a SaaS grows beyond a handful of clients, infrastructure costs skyrocket and things become unmanageable.

Diagram of a single tenant architecture become multi-tenant

Considering the 5 factors laid out in the introduction and addendum - complexity, security, scalability, consistent performance, and synergy this move greatly increases scalability, at the cost of increased complexity, decreased security, and opening the door to consistent performance problems.  Synergy is not immediately impacted, but these changes make adding Synergy at a later date much easier.

Why is this such an early move when it has 3 negative factors and only 1 positive?  Because pure Single Tenant designs have nearly insurmountable scalability problems, and these two changes are the fastest, most obvious and most cost effective solution.

Complexity 

Shifting from Single Tenant servers and databases to Multi-Tenant slightly increases software complexity in exchange for massively decreasing platform complexity.

The web servers need to be able to understand which client a request is for, usually through sub domains like client.mySaaS.com, and use that knowledge to validate the user and connect to the correct database to retrieve data.

Increased complexity from consolidation

The difficult and risky part here is making sure that valid sessions stay associated with the correct account.  

Database server consolidation tends to be less tricky.  Most database servers support multiple schemas with their own credentials and logical isolation.  Logical separation provides unique connection settings for the web servers.  Individual client logins are restricted to the client’s schema and the SaaS developers do not need to treat logical and physical separation any differently.

Migrations and Versioning Become Expensive

The biggest database problems with a many-to-many design crop up during migrations.  Inevitably, web and database changes will be incomparable between versions.  Some SaaS models require all clients on the same version, which limits comparability issues to the release window (which itself can take days), while other models allow clients to be on different versions for years.

Versioning and Migration Diagram

The general solution to the problem of long lived versions is to stand up a pool of web and database servers on the new version, migrate clients to the new pool, and update request routing.

Security

The biggest risk around these changes is database secret handling; every server can now connect to every database.  Compromising a single server becomes a vector for exposing data from multiple clients.  This risk can be limited by proxy layers that keep database connections away from public facing web servers.  Still a compromised server is now a risk to multiple clients.

Changing from physical to logical database separation is less risky.  Each client will still be logically separated with their own schema, and permissioning should make it impossible to do queries across multiple clients.

Scalability

Scalability is the goal of Multi-Tenant Infrastructure Consolidation.

In addition to helping the SaaS, the consolidation will also help clients.  Shared server pools will increase stability and uptime by providing access to a much larger group of active servers.  The client also benefits from having more servers and more slack, making it much easier for the SaaS to absorb bursts in client activity.

Likewise, running multiple clients on larger database clusters generally increases uptime and provides slack for bursts and spikes.

These changes only impact response times when the single tenant setup would have been overwhelmed.  The minimum response times don’t change, but the maximum response times get lower and occur less frequently.

Consistent Performance

The flip side to the tenancy change is the introduction of the Noisy Neighbor problem.  This mostly impacts the database layer and occurs when large clients overwhelm the database servers and drown out resources for smaller clients.

This can be especially frustrating to clients because it can happen at any time, last for an unknown period, and there’s no warning or notification.  Things “get slow” and there are no guarantees about how often clients are impacted, notice, or complain.

Synergy

There is no direct Synergy impact from changing the web and database servers.

A SaaS starting from a pure Single Tenant model is not pursuing Synergy, otherwise the initial model would have been Multi-Tenant.

Placing distinct client schemas onto a single server does open the door to future Synergy work.  Working with data in SQL across different schemas on the same server is much easier than working across physical servers.  The work would still require changing the security model and writing quite a bit of code.  There is now a doorway if the SaaS has a reason to walk through.

Conclusion

As discussed in the introduction, a SaaS may begin with a purely Single Tenant model for several reasons.  High infrastructure bills and poor resource utilization will quickly drive an Infrastructure Consolidation to Multi-Tenant servers and logically separated databases.

The exceptions to this rule are SaaS that have few very large clients or clients with high security requirements.  These SaaS will have to price and market themselves accordingly.

Infrastructure Consolidation is an early driver away from a pure Single Tenant model to Multi-Tenancy. The change is mostly positive for clients, but does add additional security and client satisfaction risks.

If you are enjoying this series, please subscribe to my mailing list so that you don’t miss an installment!

Site Footer