Blog Posts

Reducing Latency Won’t Increase Throughput Of Streaming Systems

A counter intuitive property of streaming systems is that latency has no long term impact on throughput.  Increasing or decreasing latency will give a short term change, but once the system stabilizes in its steady state, the throughput will be the same as before.

How can latency and throughput, two important performance metrics, be unrelated?

Let’s define some terms

Latency is the amount of time between when a message is sent and when it is fully processed.  This includes the time spent getting the message onto the stream, in queue waiting to process, and process time.

Throughput is the number of completions in a time period.  It could be 1 million messages a second, 5 per hour, or anything else.  Throughput doesn’t include processing time, that’s part of latency.  The million messages/s could have taken 10ms or 10 minutes each to process; so long as 1 million of them finish every second, the throughput is 1 million/s.

Steady State is when the system is fully warmed up and taking on its full load.  For a streaming system, this means that it is consuming the full stream, it is producing its maximum output, and the work in progress is being added to as rapidly as it is finished.

Example

Imagine two systems that receive 1 million events per second.  The first system takes 5s to process a million messages, the second system takes 2s to process the same messages.

The latency is different, the throughput is the same!

Implications beyond Latency and Throughput

Besides latency and throughput, there are 3 other notable differences between the two systems.

  1. Higher latency means more events in flight.  When it gets to steady state, the first system will be working on 5 million events at a time, the second system will only be working on 2 million.  This usually means that the first system will require more resources - bigger queues, more workers, a higher degree of parallelism, etc.
  2. Higher latency means slower startup.  It takes 5 seconds for events to start emerging from the first system, but only 2 seconds for the second system.
  3. Higher latency means slower shutdown.  At the other end of the lifecycle, systems with higher latency take longer to drain and safely shut down than systems with lower latency.

Summary

Why doesn’t latency matter?  Because streaming systems have constrained inputs.  So long as the system has enough capacity to handle 100% of the inputs, then latency doesn’t impact throughput.

Latency still controls the system requirements; slow is expensive!

The Never Rewrite Podcast, Episode One Hundred Nine: Conway’s Law and Software Quality

Does Conway's Law apply to software quality? In this episode, Isaac, Dustin, and I explore how company culture and structure shape software.

If you've ever wondered about the forces that shape your code base, this is the episode for you!

Watch on YouTube or listen to it at Spotify, Apple Podcasts, or your favorite podcast app, and let us know if you have ever been involved in a rewrite. We would love to have you on the show to discuss your experience!

The Never Rewrite Podcast, Episode One Hundred Eight: Consolidating Tech Stacks – Is It Worth It?

How can you determine the mertis of consolidating or diversifying your tech stack? In this episode we discuss the how consolidation and diversification impact the business, engineering efficiency, and cross-team dynamics.

If you've been wondering how to go about debating your tech stack, this is the episode for you!

Watch on YouTube or listen to it at Spotify, Apple Podcasts, or your favorite podcast app, and let us know if you have ever been involved in a rewrite. We would love to have you on the show to discuss your experience!

The Never Rewrite Podcast, Episode One Hundred Seven: Rebuilding vs. Rewriting vs. Refactoring?

This week, Isaac and I dive deep into an Allen Holub suggestion that developers should 'rebuild' instead of 'rewrite' software. Are we all saying the same thing? Is there some neuance between rebuilding, rewriting, and refactoring?

If you've been wondering if you should even bother updating your legacy system, this is the episode for you!

Watch on YouTube or listen to it at Spotify, Apple Podcasts, or your favorite podcast app, and let us know if you have ever been involved in a rewrite. We would love to have you on the show to discuss your experience!

The Never Rewrite Podcast, Episode One Hundred Six: How to Stop a Rewrite in Progress

It is all well and good to say "Never Rewrite", but what do you do if you find yourself part of one?
In this episode Isaac and discuss the steps and thinking that will help you stop a rewrite faster and safer than waiting for it fail.

If you're working on a rewrite and don't know what to do, this is the episode for you!

Watch on YouTube or listen to it at Spotify, Apple Podcasts, or your favorite podcast app, and let us know if you have ever been involved in a rewrite. We would love to have you on the show to discuss your experience!

The Never Rewrite Podcast, Episode One Hundred Five: A Core Engine Rewrite with Nick Gerace

Guest Nick Gerace discusses how he backed into a rewrite of the core engine at System Initiatives. Nick walks us through how and why his work to add plugins and package management ended with a new core engine that still lacks package management.

If you want to hear about the philosophy and tradeoffs behind a successful rewrite, this episode is for you!

Watch on YouTube or listen to it at Spotify, Apple Podcasts, or your favorite podcast app, and let us know if you have ever been involved in a rewrite. We would love to have you on the show to discuss your experience!

The Never Rewrite Podcast, Episode One Hundred Four: Iteratively Replacing Logging Infrastructure with Guest Paul Stack

Paul Stack shares his experiences transforming his company from individual server based logs, to a unified log stream searchable with Grafana. Paul walks us through the stepwise iterations: going from single machine logs to aggregated, how aggregating the logs overwhelmed the service so they brought in kafka, how kafka made it difficult to restart, and so on. This story is pre-cloud and years before the concept of Open Telemetry; Paul's deep dive sheds light on some of the very difficult problems that modern observability stacks make easy.

If you've ever wondered about how aggregated logging systems evolved, this is the episode for you!

Watch on YouTube or listen to it at Spotify, Apple Podcasts, or your favorite podcast app, and let us know if you have ever been involved in a rewrite. We would love to have you on the show to discuss your experience!

Context Required For Static Vs System Optimizations

A static analyzer can go through code and find big-O type problems.  A developer can go through and refactor the code to make it run more efficiently.

Neither of these tasks requires much context about the larger system.

For example, a function can take a list of contacts and load each one from the database.  After loading the contact, it can then load each activity that the contact has done, one at a time.  This is a standard nested for loop with O(n^2).

With no further context, you could modify the code to load all of the contacts in a single query.  You could then load all of the activity with a second query.

From the perspective of querying the database, the function would go from O(n^2) to O(1).  The only context you would need to know is if N could be large enough to exhaust your software’s memory.  (There would still be O(n^2) work in memory to assemble the data objects, but that is negligible against the cost of DB calls)

But if you had deeper knowledge you might realize that you don’t need the data in the first place.  No database calls are not only faster, but deleting the code is a whole lot easier than refactoring.

AI and other tools make static type efficiency optimizations much easier; but they can’t ask “Should the code even be doing this?”  System context is where developers still shine.

The Never Rewrite Podcast, Episode One Hundred Three: Recognizing When a Rewrite is Failing

What are the signs of a failing rewrite? This week Isaac and I discuss the signs to watch for when you're on a team doing a rewrite.

If you're on a rewrite and wondering if the project is in trouble, this is the episode for you!

Watch on YouTube or listen to it at Spotify, Apple Podcasts, or your favorite podcast app, and let us know if you have ever been involved in a rewrite. We would love to have you on the show to discuss your experience!

You Can’t Change Your Answers

Without showing your work. 

When you have customer reports, you will eventually want to change how and what you measure.  That’s fine!  You have to explain the differences. 

Especially if the change is because the old system’s measurements were wrong.

Your customers calibrated their business decisions around the old system. Changing it, even for the better, throws off their calculations. 

Always improve your systems, and when calculations share, you need to overshare.

Remember the golden rule of SaaS: Do unto others as you would have AWS do unto you.

Site Footer