In an earlier post I suggested Asynchronous Processing as a way to buy time to handle scaling bugs. Remembering my friend and his comment “assume I have a hammer, a screwdriver, and a database”, today’s post will explain Synchronous versus Asynchronous processing and discuss how asynchronous processing will help your software scale.
Processing: Synchronous versus Asynchronous
Synchronous processing means that each step starts, does some action, and then starts the next step. Eventually the last action completes and returns, and so on back.
A basic synchronous web request looks like this:
A user clicks save and the browser tells the server to save the data. The server tells the database. The database returns OK, then the server returns OK, and the browser shows a Save Successful message.
Simple to understand, but when you are having scaling problems, sometimes that save time can go from 100ms to 10s. It’s a horrible user experience and unnecessary wait!
Asynchronous Processing gives a superior user experience by returning to the browser immediately. The actual save will be processed later. This makes things more complex because the request has been decoupled from the processing.
The user is now insulated from scaling issues. It doesn’t matter if the save takes 100ms or 10s, the user gets a consistent experience.
In an asynchronous model, the user doesn’t get notified that the save was successful. For most cases this is fine, the user shouldn’t be worried about whether their actions are succeeding, the client should be able to assume success.
The client being able to assume success does not mean your system can assume success! Your system still needs to handle failures, exceptions and retries! You just don’t need to drag the user into it. Since you no longer have a direct path from request through processing, asynchronous operations can be harder to reason about and debug.
For instances where “blind” asynchronous isn’t acceptable you need a polling mechanism so that the user can check on the status.
How Asynchronous Processing Helps Systems to Scale
With synchronous processing your system must process all of the incoming activity and events as they occur, or your clients will experience random, intermittent, failures.
Synchronous scaling results in numerous business problems:
- It runs up infrastructure costs. The only way to protect service level agreements is by greatly over provisioning your system so that there is significant excess capacity.
- It creates repetitional problems. Clients can easily impact each other with cyclical behavior. Morning email blasts, hourly advertising spending rates, and Black Friday are some examples.
- You never know how much improvement you’ll get out of the next fix. As your system scales you will always be rate-limited by a single bottleneck. If your system is limited to 100 events/s because your database can only handle 100 events/s, doubling the hardware might get you to 200 events/s, or you might discover that your servers can only handle 120 events/s.
- You don’t have control over your system’s load. The processing rate is set by your clients instead of your architecture. There is no way to relieve pressure on your system without a failure.
Asynchronous processing gives you options:
- You can protect your service level agreements by pushing incoming events onto queues and acknowledging the event instantly. Whether it takes 100ms, 1s, or 10 minutes to complete processing, your system is living up to its service level agreements.
- After quickly acknowledging the event, you can control the rate at which the queued events are processed at a client level. This makes it difficult for your large clients to starve out the smalls ones.
- Asynchronous architecture forces you to loosely couple your system’s components. Each piece becomes easy to load test in isolation, giving you’ll have a pretty good idea about how much a fix will actually help. It also makes small iterations much more effective. Instead of spending 2x to double your databases when your servers can only support another 20%, you can increase spending 20% to match your server’s max capacity. Loosely coupled components can also be worked on by different teams at the same time, making it much easier to scale your system.
- You regain control over system load. Instead of everything, all at once, you can set expectations. If clients want faster processing guarantees, you can now not only provide them, but charge accordingly.
Shifting from synchronous to asynchronous processing will require some refactoring of your current system, but it’s one of the most effective ways to overcome scaling problems. You can be highly tactical with your implementation efforts and apply asynchronous techniques at your current bottlenecks to rapidly give your system breathing room.
If your developers are ready to give up on your current system, propose one or two spots to make asynchronous. You will get your clients some relief while rebuilding your team’s confidence and ability to iterate. It’s your best alternative to a total rewrite!