In Part 1 of Making Link Tracking Scale I showed how switching event recording from synchronous to asynchronous processing creates a superior, faster and more consistent user experience. In Part 2, I will discuss how Link Tracking scaling issues are governed by Long Tails, and how to overcome the initial burst using edge caching and tools like Memcache and Redis.

The Long Tails of Link Tracking

When your client sends an email campaign, publishes new content your link tracker will experience a giant burst of activity, which will quickly decay like this:

To illustrate with some numbers, imagine an email blast that results in 100,000 link tracking events. 80% of those will occur in the first hour.

In our original design from Part 1, that would 22 URL lookups, and 22 inserts per second:

For simplicity, pretend that inserts and selects produce similar db load. Your system would need to support 44 events/s to avoid slowdowns and frustrating your clients.

The asynchronous model:

Reduces the load to 22 URL lookups, and a controllable number of inserts. Again for simplicity let’s go with 8 inserts/s, for a total of 30 events/s. That’s a 1/3 reduction in load!

But, your system is still looking up the Original URL 22 times/s. That’s a lot of unnecessary db load.

Edge Caching The Original URL

The Original URL is static data that can be cached on the web server instead of loaded from the database for each event. Instead, each server would retrieve the Original URL from the db once, store it in memory, and reuse it as needed.

This effectively drops the lookup rate from 22 events/s to 0 events/s, reducing the db load to 8 events/s, a 55% drop! Combined with the asynchronous processing improvements from Part 1, that’s an 80% reduction in max database load.

Edge Caching on the servers works for a while, but as your clients expand the number of URLs you’ll need to keep track of won’t fit in server memory. At that point you’ll need to add in tools like Memcached or Redis. Like web servers, these tools are a lot cheaper than scaling your database.

Consistent Load on the Database

The great thing about this design is that you can keep the db load consistent, regardless of the incoming traffic. Whether the load is 44 events/s or 100 events/s you control the rate of asynchronous processing. So long as you have room on your servers for an internal queue, or if you use an external queue like RabbitMQ or SQS you can delay processing the events.

Scaling questions become discussions about cost and how quickly your clients need to see results.

Conclusion

Caching static data is a great way to reduce database load. You can use prebuilt libraries like Guava for Java, cacheout for Python, or dozens of others. You can also leverage distributed cache systems like Memcached and Redis. While there’s no such thing as a free lunch, web servers and distributed caches are much much cheaper to scale than databases.

You’ll save money and deliver a superior experience to your clients and their users!

Sherman On Software.

Making Link Tracking Scale – Part 2 Edge Caching

The Long Tails of Link Tracking

Edge Caching The Original URL

Consistent Load on the Database

Conclusion

Like this:

jeffpsherman

It’s Not Sabotage, They’re Drowning

I wouldn’t have done it this way.

Related Posts:

Dead Database Pixel Tracking

In A Remote First World, Being Local Is The Only Leverage A Developer Has

ShermanOnSoftware 2023 Year In Review

Leave a ReplyCancel reply

The Never Rewrite Podcast, Episode One Hundred Fifty-Nine: Vibe Coding

The Never Rewrite Podcast, Episode One Hundred Fifty-Eight: Token-Based Billing Is Nuts

The Never Rewrite Podcast, Episode One Hundred Fifty-Seven: Is AI Making You The New John Henry?

The Never Rewrite Podcast, Episode One Hundred Fifty-Six: Modernization: Where AI Doesn’t Save You Any Steps

The Never Rewrite Podcast, Episode One Hundred Fifty-Five: Agentic Development Is Turning Engineers Into Product Owners

Making Link Tracking Scale – Part 2 Edge Caching

The Long Tails of Link Tracking

Edge Caching The Original URL

Consistent Load on the Database

Conclusion

Share this:

Like this:

Post Navigation

Related Posts:

Leave a ReplyCancel reply

Discover more from Sherman On Software