I cohost a podcast devoted to the idea that starting over and rewriting your system is a mistake that will lead to failure. But I have struggled with explaining the alternative, iterative replacement.
One commenter summed it up as: Don’t rewrite, instead rewrite.
I’m inventing a new term, TheeSeeShip, to highlight the difference. Based off the Ship of Theseus, when you TheeSeeShip, you are iteratively replacing parts of the current system that are broken, don’t scale, or just aren’t useful anymore.
A rewrite creates a second system, with the hope of one day becoming the sole system. Until that day, you have the system you use, and an ever growing mass of untested work in progress.
When you TheeSeeShip, there is only ever one system and everything will be in production the whole time. Over time you’ll replace every line as you add and remove services, features, and scaling patterns. Everything changes, but the system remains.
The opposite of a Rewrite is to TheeSeeShip. TheeSeeShipping is lower risk, provides more value to your customers, and boosts morale. I’ll dig into why in the posts ahead.
Unlike the Big Bang rewrite of Yankee Stadium, Madison Square Garden is an iterative success story. MSG has been iteratively upgraded over the years, resulting in an ever more intimate and popular venue.
Suites have been added, and removed. The original 200 seating level was merged with the 100 level, the 300 and 400 levels were merged. Floors were raised to move fans closer. MSG has tweaked food options and added overlooks without needing to start over from scratch!
Developers still want to build a new arena down the street and knock Madison Square Garden down, but not because they want to build a better Madison Square Garden. Instead, the calls for a new arena are secondary to freeing up the land for new condos, office towers, and an improved Penn Station.
The developers may end up building a new Madison Square Garden. If they do, their pitch will be about freeing the land to build something new; not “The Same, But Better”.
Hat tip to my friend Zach for pointing out that MSG is an example of iteration. Details for this post came from the Madison Square Garden article on Wikipedia
My core development tenant is Never Rewrite. You will be better off iteratively fixing and improving anything that is providing value to your customers. It is common for developers and managers to say that a rewrite will fix all problems and make new development easier. These claims usually come with the added benefit of some exciting new technology.
The odds of a rewrite being better for your business and your customers are vanishingly small. Sometimes though, a team manages to deliver on a big bang rewrite. Are the results better for the business and customers?
The Yankees, yes the baseball team, executed a real life big bang switchover. They built a new stadium on a literal green field, across the street from the original stadium. The team moved over, destroyed the original, and replaced the green field.
I can’t imagine a more perfectly executed big bang cutover.
But, were the results better for the business and customers?
The Yankees Organization completed a remodel of Yankee Stadium in the late 1970s and immediately decided that their baseball stadium was too difficult to upgrade. The team rejected multiple modernization attempts in the 1980s and 1990s, only a new stadium would do! Eventually, in 2006, the Yankees were allowed to build a new stadium on parkland across the street from the original stadium.
Baseball fans, the customers, went without serious upgrades for over 25 years. The local community lost their park for 5 years while the big bang was underway.
Surely though, with 25 years of planning and an unlimited budget, the new stadium was a marvel for the fans! Nope, the seating was further from the field, the bleachers had obstructed views, and the best seats were so expensive that they went empty, even in playoff games.
But it made for great baseball right? No again, the wall slope encouraged home runs, the spread out seating killed the noise of the crowd, and the empty premium seats also blocked the fans from getting autographs.
What did they get right? Better food, more elevators, more bathrooms, and bigger seats. Oh! And everyone has a cupholder now.
Was it worth making the fans wait 25 years for cup holders?
Would iterating on seat design, restaurants, bathrooms and elevators over 25 years have been better for everyone?
Even when a big bang is successful, the results won't be better for the business and customers.
My last three posts laying out a brief history of User Defined Field Patterns past, present, and future. This post lays out a framework for migrating. If your CRM is using the Relational or NoSql pattern and you’re ready to move to the more efficient, cheaper, and simpler future, this is the post for you!
Migration Philosophy
Before going into how to migrate, a reminder of my philosophy:
Minimize risk by taking small incremental steps
Focus on providing value to the customer
There are many ways to migrate from one pattern to another. This strategy will minimize risk and maximize customer value.
Step 1 - Extend Your Relational Database
The Relational and NoSql patterns make use of a relational database.
Step 1 is to add a JSON column to your existing contacts column.
Your new schema should look like one of these two models
That’s it - deploy an additive schema update to your database. Since there’s no code to access the new columns, there’s no coordinated deployment. Just the regular, minimal, risk of updating your database.
Step 2 - Query The New Schema
Now that the new schema is in production, it is time to extend your query code.
Add a new code path that checks to see if any data is present in the new schema. If there is data available, execute the query using the new JSON column. When there’s no data, use the original query method.
You will need to develop this code hand-in-hand with the code for migrating the data from your original system to the new schema. The important piece is that you should always be deploying with the READER code on, WRITER code off.
When you deploy this code, there won’t be any data in the JSON column. The new code will be available, but unused.
Since the new code won’t be used, this step is also extremely low risk.
Step 3 - Double Write
At this point your system will use the new schema whenever data is present.
This gives you a single switch to flip - to use the new system, start writing to the new column IN ADDITION to the original method.
Mistakes at this step are the most likely to cause customer impact! It is also the most expensive in time and resources because you are writing the data twice.
However, this also gives you a very quick fallback path. The original writing process is untouched!
If there’s a problem, turn off the double write and delete the data in the new column. Thanks to the work in Step 2 you’ll instantly fall back with NO DATA loss.
Migrations are hard! Preventing data loss minimizes the risk.
Step 4 - Only Hybrid Write
The final step is to stop writing to the original data store. This ends your ability to fall back so make sure to confiscate copies of the data deleter from Step 3!
Ending the double write should be low risk because you were only doing it as a fallback at this point. You should see an immediate bump in performance and drop in costs. This trend will continue as the data migrates from the old system to the new.
Step 5 - Clean Up
At some point you’ll be ready to shut down the old system.
The last step is to decide what to do with the unmigrated data. Depending on how long you’ve waited you’re looking at customer data that hasn’t been accessed in months. Look at your retention promises; maybe you don’t have to migrate the data at all.
Either way, clean it up and shut down the old system at your leisure.
Conclusion
You can migrate User Defined Field code to the latest pattern with very little risk by using the 5 step strategy laid out in this article.
The Hybrid Solution offers excellent scalability and performance for reasonable costs. If your CRM is using one of the earlier patterns, it is time to start migrating.
Take control of the process with small, low risk, steps and never rewrite!
In part 1 - I covered the classic solution for User Defined Fields; simple but unscalable.
NoSQL emerged as a solution to relational fields in the late 2000s. Instead of having a meta table defining fields in a relational database, the User Defined data would live in NoSQL.
The structure would look like this:
This model eliminates the meta programming and joining the same table against itself. The major new headache that this model creates is difficulty in maintaining the integrity of the field data.
Pros:
No complicated meta programming. Instead you write a filter/match function to run against the data in the Collection Of Fields.
No more repeated matching against the same table. Adding additional search criteria has minimal cost.
Open ended/internet level scaling. For a CRM or SaaS, the limiting factor will be the cost of storing data, not a hard limit of the technology.
Cons:
Much more complicated to set up and maintain. Even with managed services supporting two database technologies doubles the difficulty of CRUD. Multiple inserts, multiple deletes, tons of ways for things to go wrong.
Without a relational database enforcing the data structure, poisoned or unreadable data is common. Being able to store arbitrary data collections means you’ll invariably store buggy data. You’ll miss some records during upgrades and have to support multiple deserializers. You will lose customer data in the name of expediency and cost control.
It’s more expensive. You’ll pay for your relational database, NoSQL database, and software to map between the two.
Conclusion
NoSQL systems solve the scaling problems with setting up User Defined Fields in a relational database. The scaling comes with high costs in terms of complexity, fragility and costs.
Reducing the complexity, fragility, and costs leads to the upcoming 3rd shift, covered in part 3.
This series covers a brief history of the 2 historic patterns for implementing User Defined Fields in a CRM, the upcoming hybrid solution that provides the best of both worlds, and how to evolve your existing CRM to the latest pattern. If you care about CRM performance, scaling, or cost, this series is for you!
What are User Defined Field Patterns?
Every CRM provides a basic fields for defining a customer. Every CRM’s basic field set is different depending on the CRM’s focus. So, every user of a CRM needs to expand the basic definition in some way. Birthdays, purchase history, and interests are three very common additions.
The trick is allowing users to define their own fields in ways that don’t break your CRM.
The Three Patterns
At a high level, there have been three major architectures for implementing Custom Fields. Most of the design is driven by the strengths and weaknesses of the underlying database architecture.
Pattern 1, generalized columns in a database, spanned the dawn of time until the rise of NoSQL around 2010.
Pattern 2, NoSQL, began around 2010 and continues to today.
Pattern 3, JSON in a relational database, began in the late 2010s and combines the best of the two approaches
Pattern 1 - All in a Relational Database
Before the rise of NoSql there was pretty much one way to build generic user defined fields.
The setup is simple, just 3 tables. A table of field definitions, a table for contacts, and a relational table with the 2 ids and the value for that contact’s custom field.
The Pros
This design is extremely simple and can be implemented by a single developer very quickly.
Basic CRUD operations are easy and efficient.
The Cons
Building search queries requires complicated techniques like metaprogramming.
Every search criteria results in a join against the ContactFields table. This results in an exponential explosion in query times.
The lack of defined table columns handicaps the database’s query optimization strategies.
Conclusion
The classic relational database pattern is easy to set up, but has terrible scaling. This super simple example would bog down by 1,000 contacts and 50 fields.
There are lots of ways to redesign for scale, but this is a SHORT history. Suffice it to say that it takes extremely complex and finicky systems to scale past 100,000 contacts and 1,000 fields.
The solutions to the classic pattern’s scaling led to the NoSQL revolution, covered in part 2.
This weekend I cheerfully went to lose at a fencing tournament.
That’s not negativity, I’ve been fencing off and on for 30 years and I just started up again after a multi-year hiatus. I fenced terribly, which I already knew from practice. What I didn’t know, what practice couldn’t tell me, is where I would break down when I fenced for real.
I was just competent enough at the things I knew to watch for, that my opponents destroyed me in ways I didn’t even remember to practice.
What does this have to do with software and my mantra of Never Rewrite?
After years of not doing maintenance, I am a legacy system. Practice is literally a testing environment. If I had spent months working on my known problems, when I finally got to production, I would still have been destroyed by my blind spots.
You can choose to spend months rewriting only to have your system fail when you finally make it to production.
Or, you can choose to cheerfully go to production with what you’ve got so that you can see what breaks and iterate.
The panel is fully functional, when you push the button a light turns on and the elevator comes. It is also obviously wrong - the top button is flush with the mount, and the bottom button sticks out.
I found this sad, Beer Belly Elevator Panel, at a high end resort and wondered how it happened.
Certainly whomever installed the mismatched button knew it was wrong. Did the tech not care? Was using the wrong button the only way to get the panel repaired? Was the plan to come back and fix it when the right parts came in?
The hotel maintenance staff had to sign off on it. Did they care about the quality of the repair? Were they only able to give a binary assessment of “working” or “not working”?
Did the hotel manager not care? Were they told to keep costs down? It isn’t broken now, it would be a waste to fix something that wasn’t broken.
Quality vs Letting Your Gut Hang Out
Employees at the hotel see the mismatched panel every day. It is a constant reminder that letting things slide, just a little, is acceptable at this hotel.
When you let consistency and quality slide because something works, you’re creating beer bellies in your codebase.
One small button at a time until everyone sees that this is acceptable here.
So long as a light turns on when you hit the button does it matter if the light is green, red or blue? Does it matter if the light is in the center or on the edge?
But I’m running a SaaS, not a Hotel
Your SaaS may not maintain elevator panels, but your codebase is probably full of beer bellies.
“It works, we’ll clean it up on the next release” bellies.
“This is a hack” bellies.
“This is the legacy version, we’re migrating off of it” bellies.
When you let sad little beer bellies into your codebase, your employees see exactly what you find acceptable.