The Software Engineer And The Mechanical Engineer: A Parable

Once upon a time a Software Engineer and a Mechanical Engineer needed to lift the leg of a table and slide a carpet underneath.  The Software Engineer, being younger, offered to lift the table with his hands.  This would create enough room for the Mechanical Engineer to slide the carpet under.  They would repeat the process for each leg.

The Software Engineer lifted with all of his might, but could not raise the table.  “The worker is not powerful enough”, he declared.  “I will call in some more people.  By scaling out the number of workers, we will reduce the amount of power each worker needs.  When we have sufficient parallel workers, the table leg will rise.”

The Mechanical Engineer replied, “Get me a 2x4 and a brick.”

And so the edge of the table was lifted.  “I estimate that the lever gives me about 20:1 leverage,” the Mechanical Engineer said, holding the table up with one hand.

Real Life Testing In Production – Hotel Opening Disaster

This past week I stayed at a brand new resort hotel during opening week.  Opening week was a major disaster for the hotel, culminating with them giving away free alcoholic drinks to all guests.  For a week.

This post isn’t about all of the problems, it is about testing in production, and how much worse things could have been.  You see, the resort will eventually be 9 buildings, and they opened with 3.  As a firm believer in iteration I commend them because their opening would have been so much worse if they waited.

Eventually, the resort will be nine 4-story buildings in a U shape around a water park.  3 buildings were complete and receiving guests, the exteriors of 3 were complete, and 3 were being constructed.  Too late to learn any architectural lessons from the early guests, but not too early to learn a lot about the hotel’s operations.

First up - you need maintenance staff on site, because our first room had no hot water.  Maybe we were the first guests, maybe the system was installed wrong and broke.  Whatever the reason, there was no hot water, and no one who could diagnose and fix the issue.

You could have someone walk through with a checklist to test for things like hot water.  Or you could let your customers do it!  You could have maintenance staff on site, or you could literally file a ticket and wait for someone to show up and fix the problem.

The hotel wasn’t full, so we were moved to another room.  Another new room!

This one had no cold water in the bathroom sink.  Fortunately I could debug this issue!  The shutoff valve was closed; I opened it.  Yes, I flipped a switch to turn on a feature.

Another opening week oops - the hotel had purchased cone coffee makers and basket coffee filters.

But the ultimate config issue wasn't no hot water, or no cold water, or the wrong shaped coffee filters.  The ultimate config issue was no liquor license.  At a resort hotel.  For over a week.  They had a fully stocked bar, 2 bar tenders, and 4 wait staff.  But, no liquor license, means no alcohol sales.  They resolved the “alcohol permission issue” by giving it away.

All of these issues would have been so much worse with 3 times more guests.  They could have caught the quality issues with better testing.  They could have avoided giving away thousands in alcohol by waiting until they were ready.  There were many, many ways the hotel could have opened better.

But at least they iterated!  By opening with ⅓ of their rooms available, they were able to limit the fallout from testing in production.

Multiple Queues Vs Prioritized Queues at the Airport

Multiple Queues Vs Prioritized Queues For SaaS Background Workers was a dense discussion of queues, prioritization, trade offs, and outcomes.

This post is a much less dense discussion of the same topic with examples from airports.  Airports use a multiple queue system at Security, and a priority queue at Boarding.

Security Has Multiple Queues

Image from https://www.wanderingearl.com/the-benefits-of-tsa-precheck/

Most airports in the US have 2 or 3 different queues to get through the security checkpoint: Clear, TSA Pre, and regular.  Agents help filter passengers into the different lines.  Each line represents different priorities and has a different number of agents conducting security screenings.  Once in a line, it operates as a FIFO (First in, first out) Queue.  There’s no additional sorting.

This is a human driven Multiple Queue system, and it makes sense:

  1. The workload is highly variable.  There are peak times and slow times.  Times that favor high priority people, and times that favor regular people.  It is impractical to constantly shuffle the security checkpoint layout, so the system must accommodate all workloads.
  2. You need to prevent resource starvation.  Ie - you need to keep the regular line moving no matter how many people show up at TSA Pre 
  3. You want to minimize worker waste.  Ie - when the TSA Pre line is empty, the agent starts screening people from the regular security line.
Image from https://www.inquirer.com/things-to-do/travel/tsa-precheck-clear-plus-global-entry-phl.html

Security checkpoints are slow and frustrating.  They are also well balanced to provide a simple, understandable, system that supports multiple priorities and minimizes agent idle time.

Boarding Gates Are Priority Queues

Boarding gates, where passengers wait to get on the airplane, are Priority Queues.  

The gates operate under different constraints from the security checkpoint:

  1. Nearly all passengers are at the gate when boarding begins
  2. There are a set number of passengers
  3. All of the high priority passengers should board before any of the regular priority passengers board.  Unlike the security checkpoint, resource starvation is desirable.
  4. The resources cannot be scaled.  There’s one plane, one door, and one person through at a time.

The queues take multiple forms.  They can be simple, like United’s

Image From https://www.tripadvisor.com/LocationPhotoDirectLink-g1-d8729177-i375422300-United_Airlines-World.html

Or complex, like Southwest’s

From https://www.quora.com/On-Southwest-Airlines-have-you-been-asked-to-switch-seats-after-the-open-seat-boarding-process

The Priority Queues have a common structure.  They have self sorting guided by signs and instructions.  The ticket agent acts as a final filter, either accepting or rejecting people.  The ticket agent (the worker) always runs at full capacity, while the queue itself is extremely inefficient and keeps people waiting a long time.

Since the plane only has one entrance, a Priority Queue is the only way to ensure that the high priority passengers get on first.

Reminder - We’re Really Talking About Scaling

Airports are designed to scale.  They use Multiple Queues at the security checkpoint, because it fits the problem.  They use Prioritized Queues at the boarding gate because it fits the problem.

How should your Background Worker system be designed?

These are the considerations:

  1. Resource Starvation aka job latency
  2. Workload and priority variation
  3. Worker waste
  4. Scalability and configurability - aka how hard is it to add workers, or shift them around

If you get stuck, let me know and I’ll help you out in a future post!

Site Footer