The Chestburster Antipattern

The Chestburster is an antipattern that occurs when transitioning from a monolith to services. 

The team sees an opportunity to exact a small piece of functionality from the monolith into a new service, but the monolith is the only place that handles security, permissions and composition.

Because the new service can’t face clients directly, the Chestburster hides behind the monolith, hoping to burst through at some later point.

The Chestburster begins as the inverse of the Strangler pattern, with the monolith delegating to the service instead of the new service delegates to the monolith.  

Why it’s appealing

The Chestburster’s appeal is that it gets the New Service up and running quickly.  This looks like progress!  The legacy code is extracted, possibly rewritten, and maybe better.

Why it fails

There is no business case for building the functionality the new service needs to burst through the monolith.  The functionality has been rewritten.  It’s been rewritten into a new service.  How do you go back now and ask for time to address security and the other missing pieces?  Worse, the missing pieces are usually outside of the team’s control; security is one area you want to leave to the experts.

Even if you get past all the problems on your side, you’ve created new composition complexities for the client.  Now the client has to create a new connection to the Chestburster and handle routing themselves.  Can you make your clients update?  Should you?

Remember The Strangler

If you want to break apart a monolith, it’s always a good idea to start with a Strangler. If you can’t set up a strangle on your existing monolith, you aren’t ready to start breaking it apart.

That doesn’t mean you’re stuck with the current functionality!

If you have the time and resources to extract the code into a new service, you have the time and resources to decouple the code inside of the monolith.  When the time comes to decompose into services, you’ll be ready.

Conclusion

The chestburster gives the illusion of quick progress; but quickly stalls as the team runs into problems they can’t control.  Overcoming the technical hurdles doesn’t guarantee that clients will ever update their integration.

Success in legacy system replacement comes by integrating first, and moving functionality second.  With the chestburster you move functionality first and probably never burst through.

Building Your Way Out OF A Monolith – Create A Seam

Why Build Outside The Monolith

When you have a creaky monolith the obvious first step is to build new functionality outside the monolith.  Working on a greenfield, without the monolith’s constraining design, bugs, and even programming language is highly appealing.

There is a tendency to wander those verdant green fields for months on end and forget that you need to connect that new functionality back to the monolith’s muddy brown field.

Eventually, management loses patience with the project and pushes the team to wrap up.  Integration at this point can take months!  Worse, because the new project wasn’t talking to the monolith, most of the work tends to be a duplication of what’s in the monolith.  Written much better to be sure!  But, without value to the client.

Integration is where greenfield projects die.  You have to bring two systems together, the monolith which is difficult to work with, and the greenfield, which is intentionally unlike the monolith.  Now you need to bring them together, under pressure, and deliver value.

Questions to Ask

When I start working with a team building outside their monolith, integration is the number one issue on my mind.

I push the team to deliver new functionality for the client as early as possible.  Here are 3 starting questions I typically ask:

  1. What new functionality are you building?  Not what functionality do you need to build; which parts of it are new for the client?
  2. How are you going to integrate the new feature into the monolith’s existing workflows?
  3. What features do you need to duplicate from the monolith?  Can you change the monolith instead?  You have to work in the monolith sooner or later.

First Create the Seam

I don’t look for the smallest or easiest feature.  I look for the smallest seam in the monolith.

For the feature to get used, the monolith must use it.  The biggest blocker, the most important thing, is creating a seam in the monolith for the new feature!

A seam is where your feature will be inserted into the workflow.  It might be a new function in a procedural straight away, an adapter in your OOP, or even a strangler at your load balancer.  

The important part is knowing where and how your feature will fit into the seam. 

Second Change The Monolith

Once you have a seam, you have a place to start modifying the monolith to support the feature.  This is critical to prevent spending time recreating existing functionality.

Instead of recreating functionality, refactor the seam to provide it to your new service.

Finally Build Outside the monolith

Now that the monolith has a spot for your feature in its workflow, and it can support the external service, building the feature is easy.  Drop it right in!

Now, the moment your external service can say “Hello World!”, it is talking to the monolith.  It is in production, and even if you don’t finish it 100%, the parts you do finish will still be adding value.  Odds are, since your team is delivering, management will be happy to let you go right on adding features and delivering value.

Conclusion

Starting with a seam lets you develop outside the monolith while still being in production with the first release.  No working in a silo for months at a time.  No recreating functionality.

It delivers faster, partially by doing less work, partially by enabling iterations.

2 Developers, A Mathematician and a Scrum Master Walk Into a Bar

And come up with “The Worst Coding Problem Ever” dun dun dun!

Imagine getting this whopper in an interview or a take home test:

The United States has been conducting a census once a decade for over 200 years.

Imagine you can iterate the data at a family level, with the family data being whatever format/object is easiest for you. 

Find the family with the longest fibonacci sequence of children.

The most fundamental issue is that it’s not clear what the answer looks like.  In fact, the 4 of us had 3 different interpretations of what the answer would look like.

Is the question looking for children’s ages going forward?

That would be an age sequence of 0, 1, 1, 2, 3, 5, etc

Or a newborn, a pair of 1 year old twins, a 2 year old, 3 year old, 5 year old, etc

Or is it looking for children born in the sequence?  (This is the inverse of the first answer)

A 6 year old, 5 year old twins, a 3 year old and a newborn

Or is it asking about the age gap between children?

In that case you’d be hunting for Twins (gap of 0), a gap of 1 year, a second gap of 1 year, a gap of 2 years, etc.

There are so many ways to be the family fibonacci.

Many Technical Problems are like this

Fairly straightforward computer problems with meaningless mathematics sprinkled on top.  Being asked by people who won’t know the implications of any of the 3 answers. 

But what’s the answer?

If you are presented with this question in an interview, the correct answer is to thank the interviewer for their time, wish them the best of luck in their search, and end the interview.

Whose Data Is It Anyway?

The tenancy line is a useful construct for separating SaaS data from client data.  When you have few clients, separating the data may not be worth the effort of having multiple data stores.  As your system grows, ensuring that client data is separated from the SaaS data becomes as critical as ensuring the clients’ data remains separate from each other.

Company data is everything operational.  Was an action successful?  Does it need to be retried?  How many jobs are running right now?  This data is extremely important to make sure that client jobs are run efficiently, but it’s not relevant to clients.  Clients care about how long a job takes to complete, not about your concurrency, load shaping, or retry rate.

While the data is nearly meaningless to your clients, it is useful to you.  It becomes more useful in aggregate.  It has synergy.  A random failure for one client becomes a pattern when you can see across all clients.  When operational data is stored in logically separated databases you quickly lose the ability to check the data.  This is when it becomes important to separate operational data from clients.  

Pull the operational data from a single client into a multi-tenant repository for the SaaS, and suddenly you can see what’s happening system wide.  Instead of only seeing what’s happening to a single client, you see the system.

Once you can see the system, you can shape it.  See this article for a discussion on how.

Other considerations

If visibility isn’t enough, extracting operational data is usually its own reward.

Operational data is usually high velocity – tracking a job’s progress involves updating the status with every state change.  If your operational store is the same as the client store, tracking progress conflicts with the actual work.

DropDooms! How Binding an unbounded reference table can kill your UI’s performance

This post has been a long time coming – I wrote the idea down in my first list of potential posts, and I wrote a draft way back in 2019!

It is also the first time I can say that the content has been approved by my employer since they published it on their website.

It’s a great read, and I hope you enjoy it:

Dropdooms! How Binding an Unbounded Reference Table Can Kill Your UI’s Performance

Cell Based Single Tenancy

This is part 4 in a series on SaaS Tenancy Models.  Parts 1 , 2 , and 3.

SaaS companies are often approached by potential clients who want their instance to be completely separate from any other client.  Sometimes the request is driven by legal requirements (primarily healthcare and defense), sometimes it is a desire for enhanced security.

Often, running a Multi-Tenant service with a single client will satisfy the client’s needs.  Clients are often willing to pay for the privilege of their account run Single Tenant, making it a potentially lucrative option for a SaaS.

What is a Cell?

A Cell is an independent instance of a SaaS’ software setup.  This is different from having software running in multiple datacenters or even multiple continents.  If the services talk to each other, they are in the same cell regardless of physical location.

Cells can differ with the number and power of servers and databases.  Cells can even have entirely different caching options depending on need.

The 3 most common Cell setups are Production, Staging (or Test), and Local.

Cell Properties

Cell architecture comes with a few distinct properties:

  • Cell structures allow SaaS to grow internationally and offer clients low latency and localized data policies (think GDPR).  Latency from the US to Europe, Asia and South America is noticeable and degrades the client experience.
  • Clients exist in 1 cell at a time.  They can migrate, but they can’t exist in multiple cells.
  • Generally speaking, Cells can not be part of a disaster recovery plan.  Switching clients between Cells usually involves copying the database, and can’t be done if the client’s original Cell is down.

Cell Isolation as a Single Tenant Option

In part 3 I covered the difficulties in operating in a true Single Tenant model at scale.  A Cell with a single client effectively recreates the Single Tenancy experience.

Few clients want this level of isolation, but those that need it are prepared to pay for the extra infrastructure costs of an additional Cell.

Conclusion

For SaaS without global services, a Cell model enables a mix of clients on logically separated Multi-Tenant infrastructure and clients with effectively Single Tenant infrastructure.  This allows the company to pursue clients with Single Tenant needs, and the higher price point they offer.

The catch is that Single Tenant Cells can’t exist in an architecture with global services.  If there is a single service that must have access to all client data, Single Tenant Cells are out.


If you are enjoying this series, consider subscribing to my mailing list (https://shermanonsoftware.com/subscribe/) so that you don’t miss an installment!

Infrastructure Consolidation Drives Early Tenancy Migrations

For SaaS with a pure Single Tenant model, infrastructure consolidation usually drives the first two, nearly simultaneous, steps towards a Multi-Tenant model.  The two steps convert the front end servers to be Multi-Tenant and switch the client databases from physical to logical isolation.  These two steps are usually done nearly simultaneously as a SaaS grows beyond a handful of clients, infrastructure costs skyrocket and things become unmanageable.

Diagram of a single tenant architecture become multi-tenant

Considering the 5 factors laid out in the introduction and addendumcomplexity, security, scalability, consistent performance, and synergy this move greatly increases scalability, at the cost of increased complexity, decreased security, and opening the door to consistent performance problems.  Synergy is not immediately impacted, but these changes make adding Synergy at a later date much easier.

Why is this such an early move when it has 3 negative factors and only 1 positive?  Because pure Single Tenant designs have nearly insurmountable scalability problems, and these two changes are the fastest, most obvious and most cost effective solution.

Complexity 

Shifting from Single Tenant servers and databases to Multi-Tenant slightly increases software complexity in exchange for massively decreasing platform complexity.

The web servers need to be able to understand which client a request is for, usually through sub domains like client.mySaaS.com, and use that knowledge to validate the user and connect to the correct database to retrieve data.

Increased complexity from consolidation

The difficult and risky part here is making sure that valid sessions stay associated with the correct account.  

Database server consolidation tends to be less tricky.  Most database servers support multiple schemas with their own credentials and logical isolation.  Logical separation provides unique connection settings for the web servers.  Individual client logins are restricted to the client’s schema and the SaaS developers do not need to treat logical and physical separation any differently.

Migrations and Versioning Become Expensive

The biggest database problems with a many-to-many design crop up during migrations.  Inevitably, web and database changes will be incomparable between versions.  Some SaaS models require all clients on the same version, which limits comparability issues to the release window (which itself can take days), while other models allow clients to be on different versions for years.

Versioning and Migration Diagram

The general solution to the problem of long lived versions is to stand up a pool of web and database servers on the new version, migrate clients to the new pool, and update request routing.

Security

The biggest risk around these changes is database secret handling; every server can now connect to every database.  Compromising a single server becomes a vector for exposing data from multiple clients.  This risk can be limited by proxy layers that keep database connections away from public facing web servers.  Still a compromised server is now a risk to multiple clients.

Changing from physical to logical database separation is less risky.  Each client will still be logically separated with their own schema, and permissioning should make it impossible to do queries across multiple clients.

Scalability

Scalability is the goal of Multi-Tenant Infrastructure Consolidation.

In addition to helping the SaaS, the consolidation will also help clients.  Shared server pools will increase stability and uptime by providing access to a much larger group of active servers.  The client also benefits from having more servers and more slack, making it much easier for the SaaS to absorb bursts in client activity.

Likewise, running multiple clients on larger database clusters generally increases uptime and provides slack for bursts and spikes.

These changes only impact response times when the single tenant setup would have been overwhelmed.  The minimum response times don’t change, but the maximum response times get lower and occur less frequently.

Consistent Performance

The flip side to the tenancy change is the introduction of the Noisy Neighbor problem.  This mostly impacts the database layer and occurs when large clients overwhelm the database servers and drown out resources for smaller clients.

This can be especially frustrating to clients because it can happen at any time, last for an unknown period, and there’s no warning or notification.  Things “get slow” and there are no guarantees about how often clients are impacted, notice, or complain.

Synergy

There is no direct Synergy impact from changing the web and database servers.

A SaaS starting from a pure Single Tenant model is not pursuing Synergy, otherwise the initial model would have been Multi-Tenant.

Placing distinct client schemas onto a single server does open the door to future Synergy work.  Working with data in SQL across different schemas on the same server is much easier than working across physical servers.  The work would still require changing the security model and writing quite a bit of code.  There is now a doorway if the SaaS has a reason to walk through.

Conclusion

As discussed in the introduction, a SaaS may begin with a purely Single Tenant model for several reasons.  High infrastructure bills and poor resource utilization will quickly drive an Infrastructure Consolidation to Multi-Tenant servers and logically separated databases.

The exceptions to this rule are SaaS that have few very large clients or clients with high security requirements.  These SaaS will have to price and market themselves accordingly.

Infrastructure Consolidation is an early driver away from a pure Single Tenant model to Multi-Tenancy. The change is mostly positive for clients, but does add additional security and client satisfaction risks.

If you are enjoying this series, please subscribe to my mailing list so that you don’t miss an installment!

Tenancy Models – Intro Addendum

In the first post on Saas Tenancy Models, I introduced the two idealized models – Single and Multi-Tenant.  Many SaaS companies start off as Single Tenant by default, rather than strategy, and migrate towards increasingly multi-tenant models under the influence of 4 main factors – complexity, security, scalability, and consistent performance.

After publishing, I realized that I left out an important fifth factor, synergy.

Synergy

In the context of this series, synergy is the increased value to the client as a result of mixing the client’s data with other clients.  A SaaS may even become a platform if the synergies become more valuable to the clients than the original service.  

Another aspect of synergy is that the clients only gain the extra value so long as they remain customers of the SaaS.  When clients churn, the SaaS usually retains the extra value, even after deleting the client’s data.  This organically strengthens client lock in and increases the SaaS value over time.  The existing data set becomes ever more valuable, making it increasingly difficult for clients to leave.

Some types of businesses, like retargeting ad buyers, create a lot of value for their clients by mixing client data.  Ad buyers increase effectiveness of their ad purchases by building larger consumer profiles.  This makes the ad purchases more effective for all clients.

On the other hand, a traditional CRM, or a codeless service like Zapier, would be very hard pressed to increase client value by mixing client data.  Having the same physical person in multiple client instances in a CRM doesn’t open a lot of avenues; what could you offer – track which clients a contact responds to?  No code services may mix client data as part of bulk operations, but that doesn’t add value to the clients.

Sometimes there might be potential synergy, like in Healthcare and Education, but it would be unethical and illegal to mix the data.

Not All Factors Are Client Facing

Two of the factors, complexity and scalability, are generally invisible to clients.  When complexity and scalability are noticed, it is negative:

  • Why do new features take so long to develop?  
  • Why are bugs so difficult to resolve?  
  • Why does the client experience get worse as usage grows?

A SaaS never wants a client asking these questions.

Security, Consistent Performance and Synergy are discussion points with clients.

Many SaaS companies can adjust Security concerns and Consistent Performance through configuration isolation.

Synergy is a highly marketable service differentiator and generally not negotiable.

Simplified Drawings

As much as possible I’m going to treat and draw things as 2-tier systems rather than N-tier.  As long as the principles are similar, I’ll default to simplified 2-tier diagrams over N-tier or microservice diagrams.

Next Time

Coming up I’ll be breaking down single to multi-tenant transformations.

Why a SaaS would want the transformation, what are the tradeoffs, and what are the potential pitfalls.

Please subscribe to my mailing list to make sure you don’t miss out!

Questions To Ask Before Automating

The prospect of automating manual tasks emits a siren song to most developers.  Like a siren, the call often leads you straight into disaster. Best intentions often end up leaving companies with code that’s more expensive to maintain and less useful than human labor.  Reports and tasks become a leaky faucet for productivity.

Here are six questions to ask yourself, or a developer, before dancing to the automation music:

How often is the task likely to change?

Weekly Business Intelligence reports change monthly, monthly ones change every quarter, and quarterly ones change every year.  They are never stable enough to be worth automating by an outside developer. This is why BI tools that let non-technical users semi-automate reports are a 5 billion dollar industry.

On the other hand, regulatory and compliance reports are likely to be stable for years and make great targets.

If a task won’t be executed at least 10 times between changes, it probably won’t be worth automating.

How long is the task likely to continue?

Some tasks are likely to continue “forever”.  Archiving old client data, scrubbing databases and other client onboarding/offloading tasks fall into this category.

Some tasks are never going to come up again.

If a task won’t be executed at least 5 more times, it probably won’t be worth automating.

How much human effort does the task consume, and for whom?

You can automate turning on the office lights in the morning with a motion detector, but it won’t pay off in terms of time saved from flipping a switch.

How much of an interruption is doing the task?  Turning on the lights on your way in the door isn’t an interruption for anyone.  Phone support manually resetting a user password isn’t an interruption, but having the CFO process refunds for clients is a giant interruption.

Even if the reset and refund are both a single button click that takes 15 seconds, pulling the CFO away is a much bigger deal.  Also the context switch for the CFO will be measured in minutes because she’s not processing refunds all day long.

Use a sliding scale based on time and title.  For entry level, don’t automate until the task takes more than an hour per person per day.  For the C-Suite anything over 5 minutes per day is a good target.

How much lag can automation save your clients?

Clients don’t care how long the task takes, they care about the lag between asking and receiving.  It doesn’t matter that processing a refund only take 5 minutes if your team only processes refunds once a week.

If the client lag is more than a day, consider automating.

Is the Task a real process, or are you cleaning up the effects of a bug?

Software bugs can do all sorts of terrible things to your data and process, but after the first couple of times, the damage becomes predictable and you’ll get better at fixing the damage.  

Automating the fix is one way of fixing the bug.  That’s how bugs become features.

If you don’t want to make the bug an official part of your software, don’t automate the fix.

How common and expensive are mistakes?

Mistakes are inevitable when humans are manually performing routine tasks.  

Mistakes are inevitable when developers first automate a routine task.  Assume that developer mistakes will equal one instance of a manual mistake.  

For an automation to save money you have to expect to prevent at least 2 manual errors.

As an equation:

[Cost to Automate] + [Cost of a mistake] < [Cost of a mistake] * [Frequency of mistakes]

Because the cost of mistakes is relatively easy to quantify, tasks with expensive mistakes are usually automated early on.

Conclusion

Developers always want to automate things, sometimes it pays off, sometimes it’s a mistake.

If you ask these six questions before automating you’re much more likely to make the right choice:

  1. How often is the task likely to change?
  2. How long is the task likely to continue?
  3. How much human effort does the task consume, and for whom?
  4. How much lag can automation save your clients?
  5. Is the Task a real process, or are you cleaning up the effects of a bug?
  6. How common and expensive are mistakes?