What is a Queue Anyway?

Several readers wrote in response to my article Your Database is not a Queue to tell me that they don’t know what a queue is, or why they would use one.  As one reader put it, “assume I have 3 tools, a hammer, a screwdriver and a database”. Fair enough, this week I want to talk about what a message queue is, what features it offers, and why it is a superior solution for batch processing.

To keep things as concrete as possible, I’ll use a real world example, Nightly Report Generation, and AWS technologies.

Your customers want to know how things are going.  If your SaaS integrates with a shopping cart, how many sales did you do?  If you do marketing, how many potential customers did you reach? How many leads were generated?  Emails sent? Whatever your service, you should be letting your customers know that you’re killing it for them.

Most SaaS companies have some form of RESTful API for customers, and initially you can ask clients to help themselves and generate reports on demand.  But as you grow, on demand reporting becomes to slow. Code that worked for a client with 200 customer shopping carts a day may be to slow at 2,000 or 20,000 carts.  These are great problems to have!

To meet your client’s needs, you need a system to generate reports overnight.  It’s not client facing, so it doesn’t need to be RESTful, and it’s not driven by client activity, so it won’t run itself.

Enter queues!

For this article we will use AWS’s SQS, which stands for Simple Queue Service.  There are lots of other wonderful open source solutions like Rabbit MQ, but you are probably already using AWS, and SQS is cheap and fully managed.

What does a queue do?

  • A queue gives you a list of messages that can only be accessed in order.
  • A queue guarantees that one, and only one, consumer can see a message at a time.
  • A queue guarantees that messages do not get lost, if the consumer takes a message from the queue, it has a certain amount of time to tell the queue that the message is done, or the queue will assume something has gone wrong and make the message available again.
  • A queue offers a Dead Letter Queue, if something goes wrong to many times the message is removed and placed on the Dead Letter Queue so that the rest of the list can be processed

As a practical matter all that boils down to:

  • You can run multiple instances of the report generator without worrying about missing a client, or sending the same report twice.  You get to skip the early concurrency and scaling problems you’d encounter if you wrote your own code, or tried to use a database.
  • When your code has a bug in an edge case, you’ll still be able to generate all of the reports that don’t hit the edge case.
  • You can alert on failures, see which reports failed, and after you fix the bug, you can click of a button put them back into the queue to run again.
  • Because SQS is managed by AWS you get monitoring and alerts out of the box
  • Because SQS is managed by AWS you don’t have to worry about the queue’s scale or performance.
  • You can add API endpoints to generate “on demand” messages and put them on the queue.  This means that the “standard” report messages, and client generated “on demand” report requests follow the same process and you don’t need to build and maintain separate reporting pipelines for internal and external requests.

Everything is tradeoffs, and queues don’t solve all problems well.  What kinds of problems are you going to have with SQS?

  • Priority/Re-sorting existing messages – You have 10,000 messages on the queue and an important client needs a report NOW!  Prioritized messages, or changing the order of messages on the queue is not something SQS supports. You can have a “fast lane” queue that gets high priority messages, but it’s clunky.
  • Random Access – You have 10,000 messages and want to know where in the queue a specific client’s report is?  Queues let you add at the end and take from the front, that’s it. If you need to know what’s in the middle you’ll need to maintain that information in a separate system.
  • Random Deleting – Probably not important for reports, but for cases like bulk importing, if a client changes her mind and wants to cancel a job, you can’t reach into the middle of the queue and remove the message.
  • Process order is not guaranteed – If you have a 3 part task, and you cannot start Task 2 until Task 1 finishes, you cannot add all 3 tasks onto the queue at once.  It is highly likely that a second worker will come along and start Task 2 while Task 1 is still in process. Instead you will need to have Task 1 add Task 2 onto the queue when it finishes.

None of these problems will crop up in your early iterations, and they are great problems to have!  They are signs that your SaaS is growing to meet your client’s needs, you and your clients are thriving!

To bring it full circle, what if you already have a home grown system for running reports?  How can you get on the path to queues? See my article on about common ways into the DB as a Queue mess, and some suggestions on how to get out.

What’s the first step towards fixing a terrible system?

Imagine you’ve just been hired to fix a horrible legacy system.

You’ve just been handed a giant monolith that talks to customers on the internet, accesses 4 different internal databases, handles employee and customer on-boarding, sends marketing emails, contains a proprietary task scheduling system, and has concurrency issues.  Which problem do you tackle first?

Separation of concerns?  Maybe the massive number of exceptions in the logs?  Probably none of these.

The first thing you really need to learn is why you been hired to fix the system in the first place.  Ignore the technical problems, what are the business problems with the current system?  What does the business need done so badly that they’ve hired you?  Your fellow programmers care about how difficult the current system is, but the business cares about how hard it is to get what they need.  Don’t confuse tech problems with business problems.

Learn the business problem that justifies your salary.  Find a way to provide that value.  You can fix or replace the legacy system over time.