This is a true story of how an inappropriate software design put a SaaS company’s reputation on the hook for their client’s bugs. All names have been changed.
Once upon a time I consulted for Chicago Freight Brokers, a freight brokerage that was struggling with the performance of their in house systems. CFB had an internal service, Express, that talked to 5 different transportation SaaS companies, added “special sauce” business value, and helped the brokers match trucks with loads. Express was an MVP that was being crushed under the weight of its own success.
One of SaaS companies Express integrated with was ChicagoTMS. ChicagoTMS, was a Transportation Management Service that built loads of freight and spoke to CFB’s accounting and insurance systems. When a broker hit save on ChicagoTMS’s website, they would push the data into Express using a Webhook. One day I got asked to jump on a call, “ChicagoTMS is unbearably slow, it takes 30 seconds to save a load! And they say it’s our fault! Can you talk to them?”
I jumped onto the call expecting that something had been lost in translation, after all, how could CFB’s scaling issues have any impact on ChicagoTMS’s web performance.
An hour later, I understood. ChicagoTMS’s save operation looked like this:
The Webhook was called Synchronously as part of the save process, meaning users had to wait for the Webhook to complete successfully before they could continue their work. CFB’s scaling problems were causing ChicagoTMS’s performance problems!
Fortunately, the culprit was a missing database index and the save time quickly dropped from 30 seconds to 2 seconds. Unfortunately for ChicagoTMS, their design made them responsible for their client’s performance issues. ChicagoTMS’s reputation took a hit from CFB’s simple mistake.
A Better Design
Webhooks are convenient, but they are not a reliable or guaranteed delivery mechanism.
A better design would have been for ChicagoTMS to make their Webhooks asynchronous. This would have kept ChicagoTMS’s services fast and responsive, regardless of whether CFB made mistakes, did a lot of processing of the message, or was overwhelmed by a spike in load.
Asynchronous Webhooks would have given the users a much better experience and saved ChicagoTMS a lot of repetitional damage with their clients.
TAKE Your Reputation oFF the Hook!
Webhook performance will always be outside your control. When you call a client synchronously you put your reputation in the client’s hands. Your users won’t understand or care who exactly is responsible, if it’s your website, you get the blame.
Go asynchronous and take your reputation off the hook!