Scaffolding is a temporary structure used by workers to build or repair a permanent structure. Scaffolding goes up, the building gets built, the scaffolding comes down.
In software development scaffolding is code and tools that you create to help you as you develop, that are also destroyed in the development process. Incremental refactoring steps are scaffolding. So are no-op interface implementations, dirty shell scripts used to glue things together, and ideas that fail.
They are ephemeral ways to build software. They may live forever in your source control, but you’ll only use them for a short time.
This makes scaffolds different from tests. Tests not only help you build software; they increase in value as they age. Tests are support beams inside your application.
Throw away work is when your software works correctly, but it never gets used. Features that get built and never get used are throw away work. Throw away work doesn’t create new opportunities, it often exists as a software crutch for uncomfortable decisions. It rarely serves a purpose and never provides value.
The biggest difference between scaffolding and throw away work is how it makes you feel. Scaffolding should be energizing and opens new horizons. Tests give you confidence. Throw away work leads you to a dead end.
When in doubt ask yourself how long the code is supposed to last and what it would take for it to be a success.
Onward with SmileAndDial CRM, and their quest to remove phone numbers as a fundamental constraint! Part 2 introduced the concept of TheeSeeShipping, a low risk, iterative way to make large sweeping changes. At the end of part 2, the database looked like this:
New Id columns have been added, but they are unused and null.
Now SmileAndDial’s developers can TheeSeeShip small, incremental, low risk changes to production. These changes are invisible to users and can be worked in with day-to-day work.
TheeSeeShipping works like a ratchet; once a change is complete, it stays complete. Whether the work is done quickly or slowly, change is inexorable and one way.
The Second Series of Releases - Propagating ContactId
The second series of releases is to insert the new Contact.Id value into other tables’ ContactId value when rows are inserted.
This requires lots of small changes - getting the new Id after inserting into Contact and adding it to the insert statement in other tables. But, because the fields are nullable, there are no foreign keys, and no query logic; these changes can be done incrementally.
Fix one or two inserts every release. These are small, easily testable changes. Grab the Id, add it to the insert, confirm that the row was written with the correct ContractId.
TheeSeeShip it in low risk, two line changes until all new rows have a ContactId.
The next step in this series is to backfill the existing data. The Contact table has an Id for all rows. Run an update query for each table to backfill the data.
Guess what, since the data still isn’t used in the system this is also a low risk query.
Now every row in the database has a value for ContactId, and every new row will get a value on insert. The time has come to remove the nullability and add in the foreign keys.
The Third Series of Releases - Updating Selects
With ContactId propagated to every table in the database, the time has come to update the Select queries. Instead of selecting from every table by phone number, SmileAndDial needs to run off of ContactId, or join Contact to use Phone
Each of these changes can be made incrementally in a separate release. They are small, easy to test, and independent of each other. TheeSeeShip it; the data model is different, but to customers it is still SmileAndDial.
After any bugs or missing ContactIds get fixed, it is time to remove the foreign key constraint on the Phone field. Note that this can be done at any time that is convenient to the DB team. There’s no rush and no impact if a bug with ContactId does turn up.
The Fourth Series of Releases - Removing the Fundamental Constraint
Everything in the first three series of releases was incremental and non-breaking. Each change was minimal and non-breaking.
Even at the end of phase 3, the discovery of an unmodified table would have no impact on the system. The table and code would continue to work as it always had. Any mistakes were easy to roll back.
Everything had changed, and everything was still the same.
In the fourth series of iterations, Email will be removed as a fundamental constraint in the system. This, finally, is a breaking change that is not trivial to undo.
But, it is also a change that is easy to do in an incremental, low risk way.
Incrementally remove the Phone column from all tables except for Contact. At this point there are no foreign relations or Select statements that use Phone.
Finally, when Phone is gone everywhere else, make Contact.Phone nullable. Update the service and UI logic to allow the creation of a contact without a phone number.
Phone Is No Longer Fundamental
There are a lot of incremental changes and releases in the four series. The key is that each change was very small and very low risk. After each release customers had a fully functional, up to date, version of SmileAndDial.
Unreleased work in progress was low and work could have halted for months at a time without any of the completed work suffering from code rot.
TheeSeeShipping changed the fundamental nature of SmileAndDial without ever taking SmileAndDial out of the hands of its customers.
Welcome back to SmileAndDial CRM’s journey to remove phone numbers as a fundamental constraint. In the introduction I introduced SmileAndDial CRM, a loathsome service for boiler rooms, pump and dumps, extended car warranties, and other horrid outbound call centers needs. SmileAndDial has a problem, all contacts need a phone number and they need to transition away from phone numbers to enable their customers to send spam.How fundamental are phone numbers? Let’s begin with the data model and how it reinforces the constraint.
Phone Numbers As Primary Keys
As the data model leaves no doubt, to SmileAndDial, you are nothing but a number. Phone number is the primary key used throughout the system. The developers made an interesting trade off - in exchange for a bigint primary key, which doesn’t have great indexing, they got a system with no joins.
That’s right! Because the phone number is the primary key, users can enter a phone number into any search field and the system can use it directly without having to join against the Contact table.
When was the last time I called a contact? “Select max(Date) from Calls where phone = [phone]”
How much have we sold to a contact? “Select sum(price) from Sales where phone = [phone]”
Few operations need to join back to the Contact table, which reduces code complexity.
What Happens If There’s No Phone?
Now we get to the crux of the constraint - nothing works without a phone number.
Simply updating the Contact creation logic to allow phoneless contacts won’t work. The new contact can’t have notes or sales without a phone number.
In part 2, we will start TheeSeeShipping our way out of this mess. By delivering incremental, iterative changes that we will reduce risk and increase customer value.
Data assumptions are baked into your CRM’s makeup and can seem impossible to change. Email marketing requires contacts to have an email, because you can’t do email marketing without one. Call center software requires prospects to have phone numbers so that agents can do outbound sales.
But what happens when your business needs to change and your fundamental constraints are no longer fundamental? How do you change your core data model assumptions without starting over or freezing development?
SmileAndDial CRM has spent years positioning themselves as the go to CRM for boiler rooms, pump and dumps, extended car warranties, and other outbound call centers on the strength of their dialer integrations. They’ve done well, making a quality product for horrible people. But the FTC is cracking down on junk calls and putting their customers out of business. They need to expand into email spamming and help support their horrid customers. After all, keeping your customers out of jail massively extends Customer Lifetime Value.
This series will follow SmileAndDial on their journey to remove phone numbers as a fundamental constraint in their software.
After a few weeks, a customer’s first page of contacts will become static. Same with tags, lists, and every other object that your CRM supports. When the data on the first page becomes static, users stop seeing it entirely.
Instead the first page becomes muscle memory on the way to your user’s real actions.
How long do you make your customers wait to load a page of data objects that won’t even register in their minds? How many extra hoops do they have to jump through to get to the actions they want to take? How much slower is the process for your biggest customers?
Customers log in to take actions, not objects. Don’t waste their time showing data objects until you know enough context to show meaningful data.
Data scales, actions and attention don’t. You can wage a constant fight to scale your UI, or you can choose Actions over Objects, and avoid the issue entirely.
There’s no upper limit on the number of Contacts that customers will want to add to your CRM. Until you can support billions of contacts, there’s always an argument that you should support more.
On the other hand, Relationship and Management options are constrained by the choices you present to your users. You can’t support everything and need to consider which options make sense for your CRM. Every option adds complexity and cost to you; and cognitive load on your users. Adding features often decreases value.
When it comes to contacts, think about what scales. For everything else, remember that people have limits.
My core development tenant is Never Rewrite. You will be better off iteratively fixing and improving anything that is providing value to your customers. It is common for developers and managers to say that a rewrite will fix all problems and make new development easier. These claims usually come with the added benefit of some exciting new technology.
The odds of a rewrite being better for your business and your customers are vanishingly small. Sometimes though, a team manages to deliver on a big bang rewrite. Are the results better for the business and customers?
The Yankees, yes the baseball team, executed a real life big bang switchover. They built a new stadium on a literal green field, across the street from the original stadium. The team moved over, destroyed the original, and replaced the green field.
I can’t imagine a more perfectly executed big bang cutover.
But, were the results better for the business and customers?
The Yankees Organization completed a remodel of Yankee Stadium in the late 1970s and immediately decided that their baseball stadium was too difficult to upgrade. The team rejected multiple modernization attempts in the 1980s and 1990s, only a new stadium would do! Eventually, in 2006, the Yankees were allowed to build a new stadium on parkland across the street from the original stadium.
Baseball fans, the customers, went without serious upgrades for over 25 years. The local community lost their park for 5 years while the big bang was underway.
Surely though, with 25 years of planning and an unlimited budget, the new stadium was a marvel for the fans! Nope, the seating was further from the field, the bleachers had obstructed views, and the best seats were so expensive that they went empty, even in playoff games.
But it made for great baseball right? No again, the wall slope encouraged home runs, the spread out seating killed the noise of the crowd, and the empty premium seats also blocked the fans from getting autographs.
What did they get right? Better food, more elevators, more bathrooms, and bigger seats. Oh! And everyone has a cupholder now.
Was it worth making the fans wait 25 years for cup holders?
Would iterating on seat design, restaurants, bathrooms and elevators over 25 years have been better for everyone?
Even when a big bang is successful, the results won't be better for the business and customers.
Before going into how to migrate, a reminder of my philosophy:
Minimize risk by taking small incremental steps
Focus on providing value to the customer
There are many ways to migrate from one pattern to another. This strategy will minimize risk and maximize customer value.
Step 1 - Extend Your Relational Database
The Relational and NoSql patterns make use of a relational database.
Step 1 is to add a JSON column to your existing contacts column.
Your new schema should look like one of these two models
That’s it - deploy an additive schema update to your database. Since there’s no code to access the new columns, there’s no coordinated deployment. Just the regular, minimal, risk of updating your database.
Step 2 - Query The New Schema
Now that the new schema is in production, it is time to extend your query code.
Add a new code path that checks to see if any data is present in the new schema. If there is data available, execute the query using the new JSON column. When there’s no data, use the original query method.
You will need to develop this code hand-in-hand with the code for migrating the data from your original system to the new schema. The important piece is that you should always be deploying with the READER code on, WRITER code off.
When you deploy this code, there won’t be any data in the JSON column. The new code will be available, but unused.
Since the new code won’t be used, this step is also extremely low risk.
Step 3 - Double Write
At this point your system will use the new schema whenever data is present.
This gives you a single switch to flip - to use the new system, start writing to the new column IN ADDITION to the original method.
Mistakes at this step are the most likely to cause customer impact! It is also the most expensive in time and resources because you are writing the data twice.
However, this also gives you a very quick fallback path. The original writing process is untouched!
If there’s a problem, turn off the double write and delete the data in the new column. Thanks to the work in Step 2 you’ll instantly fall back with NO DATA loss.
Migrations are hard! Preventing data loss minimizes the risk.
Step 4 - Only Hybrid Write
The final step is to stop writing to the original data store. This ends your ability to fall back so make sure to confiscate copies of the data deleter from Step 3!
Ending the double write should be low risk because you were only doing it as a fallback at this point. You should see an immediate bump in performance and drop in costs. This trend will continue as the data migrates from the old system to the new.
Step 5 - Clean Up
At some point you’ll be ready to shut down the old system.
The last step is to decide what to do with the unmigrated data. Depending on how long you’ve waited you’re looking at customer data that hasn’t been accessed in months. Look at your retention promises; maybe you don’t have to migrate the data at all.
Either way, clean it up and shut down the old system at your leisure.
You can migrate User Defined Field code to the latest pattern with very little risk by using the 5 step strategy laid out in this article.
The Hybrid Solution offers excellent scalability and performance for reasonable costs. If your CRM is using one of the earlier patterns, it is time to start migrating.
Take control of the process with small, low risk, steps and never rewrite!
Part 2 covers how NoSQL emerged as an improvement over the classic relational database solution for User Defined Fields. NoSQL delivers speed and scalability by being expensive and fragile. In part 3 I’m going to cover the emerging Hybrid Database solution for User Defined Fields.
Hybrid Databases allow you to combine the best aspects of the relational and NoSQL models, while avoiding most of the downsides.
A hybrid implementation looks like this:
The hybrid model brings the data back to a single server, but without the Contact->Field relation. Instead the field data is stored as a JSON object in the Contact table itself.
No meta programming and no filters, everything is back to SQL. Hybrid databases allow you to directly query JSON fields as if they were regular columnar fields.
You can create indexes on the JSON data. This is an improvement over both the classic and NoSQL models. It can significantly improve performance by allowing the database engine to optimize queries based on usage.
Having a single system makes things simple to set up and easier to maintain.
The database will enforce valid JSON structures, which makes it difficult to poison your data.
There’s no enforced relationship between the JSON data and your User Defined Fields. This means that data can get lost because your system no longer knows to display or delete it.
While Hybrid Databases should scale far beyond the needs of your SaaS, the scaling isn’t quite as open ended as the NoSQL model. If you out-scale the Hybrid model, congratulations, your company’s services are in high demand!
If your SaaS is implementing User Defined Fields from scratch today, go with the Hybrid model. If you already have the classic or NoSQL pattern in place, it’s a good time to start thinking about how to evolve towards a hybrid solution.
I’ll cover how to evolve your existing solution in Part 4.
In part 1 - I covered the classic solution for User Defined Fields; simple but unscalable.
NoSQL emerged as a solution to relational fields in the late 2000s. Instead of having a meta table defining fields in a relational database, the User Defined data would live in NoSQL.
The structure would look like this:
This model eliminates the meta programming and joining the same table against itself. The major new headache that this model creates is difficulty in maintaining the integrity of the field data.
No complicated meta programming. Instead you write a filter/match function to run against the data in the Collection Of Fields.
No more repeated matching against the same table. Adding additional search criteria has minimal cost.
Open ended/internet level scaling. For a CRM or SaaS, the limiting factor will be the cost of storing data, not a hard limit of the technology.
Much more complicated to set up and maintain. Even with managed services supporting two database technologies doubles the difficulty of CRUD. Multiple inserts, multiple deletes, tons of ways for things to go wrong.
Without a relational database enforcing the data structure, poisoned or unreadable data is common. Being able to store arbitrary data collections means you’ll invariably store buggy data. You’ll miss some records during upgrades and have to support multiple deserializers. You will lose customer data in the name of expediency and cost control.
It’s more expensive. You’ll pay for your relational database, NoSQL database, and software to map between the two.
NoSQL systems solve the scaling problems with setting up User Defined Fields in a relational database. The scaling comes with high costs in terms of complexity, fragility and costs.
Reducing the complexity, fragility, and costs leads to the upcoming 3rd shift, covered in part 3.