At scale, processes that can be retried outperform processes that only allow for success or failure.
Five 9s (99.999%) means you can expect one failure for every 100,000 processes. If you can’t retry, those failures will lead to angry customers and churn.
On the other hand, a 1⁄3 success rate with retries means you’ll have to try 300,000 times for every 100,000 processes. Your compute costs will add up, but failures only count when you can’t retry.
Compute is cheap compared to churn. Compute is super cheap compared to your reputation.
Can your SaaS restart processes?