Cheerfully Going To Fail In Production

This weekend I cheerfully went to lose at a fencing tournament.

That’s not negativity, I’ve been fencing off and on for 30 years and I just started up again after a multi-year hiatus.  I fenced terribly, which I already knew from practice.  What I didn’t know, what practice couldn’t tell me, is where I would break down when I fenced for real.

I was just competent enough at the things I knew to watch for, that my opponents destroyed me in ways I didn’t even remember to practice.

What does this have to do with software and my mantra of Never Rewrite?

After years of not doing maintenance, I am a legacy system.  Practice is literally a testing environment.  If I had spent months working on my known problems, when I finally got to production, I would still have been destroyed by my blind spots.

You can choose to spend months rewriting only to have your system fail when you finally make it to production.

Or, you can choose to cheerfully go to production with what you’ve got so that you can see what breaks and iterate.

Acceptable Beer Bellies in your codebase

How do Beer Bellies begin?

The panel is fully functional, when you push the button a light turns on and the elevator comes.  It is also obviously wrong - the top button is flush with the mount, and the bottom button sticks out.

I found this sad, Beer Belly Elevator Panel, at a high end resort and wondered how it happened.

Certainly whomever installed the mismatched button knew it was wrong.  Did the tech not care?  Was using the wrong button the only way to get the panel repaired?  Was the plan to come back and fix it when the right parts came in?

The hotel maintenance staff had to sign off on it.  Did they care about the quality of the repair?  Were they only able to give a binary assessment of “working” or “not working”?

Did the hotel manager not care?  Were they told to keep costs down?  It isn’t broken now, it would be a waste to fix something that wasn’t broken.

Quality vs Letting Your Gut Hang Out

Employees at the hotel see the mismatched panel every day.  It is a constant reminder that letting things slide, just a little, is acceptable at this hotel.

When you let consistency and quality slide because something works, you’re creating beer bellies in your codebase.

One small button at a time until everyone sees that this is acceptable here.

So long as a light turns on when you hit the button does it matter if the light is green, red or blue?  Does it matter if the light is in the center or on the edge?

But I’m running a SaaS, not a Hotel

Your SaaS may not maintain elevator panels, but your codebase is probably full of beer bellies.

“It works, we’ll clean it up on the next release” bellies.

“This is a hack” bellies.

“This is the legacy version, we’re migrating off of it” bellies.

When you let sad little beer bellies into your codebase, your employees see exactly what you find acceptable.

Chipping Away

You have a goal, you know what it means, and what it implies.

You also know what’s blocking your progress.

It’s time to iterate!

Iterating Against the Blockers

Ask yourself:

  • What’s the first step?
  • What if all I needed was a tool?
  • How will I know if it’s working?

These three primary questions will help you chip away at the Blockers.

You want progress, not victory.  When you keep iterating away at the Blockers, eventually achieving your goals becomes easy.

Besides the questions, the main constraint to keep in mind is that each iteration needs to leave the system in a state where you can walk away from the project.

Unused functionality?  Totally fine.  

Refactored code with no new usage?  Great!

Tools that work but don’t do anything useful?  One day.

Nothing that requires keeping two pieces of code in sync.

Nothing that would prevent other developers from evolving existing code.

Whatever you do, it must go into the system with each iteration.

Example: Continuing from the Async Processing Blockers

I want to make my API asynchronous so that client size doesn’t impact the API’s responsiveness.  But, I can’t make the API asynchronous because:

  • I don’t have a system to queue the requests.
  • I don’t have a system to process the requests off of a queue.
  • I don’t have a way to make the processing visible to the client.

Attempting to do all of this work in one giant step is a recipe for a project that gets delivered 6 months to a year late.

I’m going to hand wave and declare that we are using SQS, AWS’s native queuing system.  This makes setting up the queue trivial and reliable.

I don’t have a system to queue the requests.

What’s the first step?

Write a data model and serializer.  What am I even going to write onto this queue?

What if all I needed was a tool?

Instead of worrying about a system, create a command line tool in your existing codebase to push data to SQS.  It won’t be reliable, it won’t have logging, and it won’t have vizability.

But you’re the only one using it, so that’s fine.

How will I know it’s working?

Manually!  AWS has great observability.  You don’t need to do anything.

Combining your first step, a data model, your tool and AWS observability you’ll be able to push data onto a queue and view what got sent.

The data model will be wrong and the tool will not be production ready!

That’s ok because no existing functionality is blocked or broken.  Getting interrupted doesn’t create risk, which means you can work even if you only have a little time.

I don’t have a system to process the requests off of a queue.

What’s the first step?  

Write a data model and deserializer.  What data do I need to be on the queue in order to recreate the event I need to process?

What if all I needed was a tool?

Create a tool to pull the message off the queue and deserialize.  Send the result to a data validator.  (You’re accepting customer requests from an API, you’d better have a data validator)

How will I know it’s working?

Manually!  AWS has great observability.  You don’t need to do anything.

Combining the three gets the ability to manually pull data off the queue, deserialize and validate.

You can do this before, after or during your work to get the data onto a queue.  It’s not production ready, but it also doesn’t create risk.

I don’t have a way to make the processing visible to the client.

What’s the first step? 

What does visibility look like to the client?  Where does the data go in your UI?  What would you want to know?

What if all I needed was a tool?

Make an endpoint that calls AWS and returns the data you think you need.

How will I know it’s working?

Manually!  Compare what your endpoint tells you with what AWS tells you.  Don’t start until you have tools for adding and removing events from the queue.

Combining the three gets you an endpoint that tells you about the queue.

The endpoint should be safe to deploy to production.  The queue is always empty.

Conclusion

Iterating allows you to chip away at your blockers until there’s nothing stopping you.

Apply the three questions:

  • What’s the first step?
  • What if all I needed was a tool?
  • How will I know if it’s working?

Always keep the system in a state where you can walk away from the project.

Keep iterating against your blockers, and you’ll be amazed at how soon you’ll achieve your goals!

The Blockers

Implications, defining what has to be true for your goal to succeed, are the hardest step.

Step 4, The Blockers, are relatively easy.

What’s Stopping You?

What is stopping you and your team from making your implications true?

I’m not working on it, and there aren’t enough hours in the day, aren’t valid answers.

The Implications are too big, too complex and too scary to tackle head on.  If they weren’t, you wouldn’t be stuck.

Defining the blockers is about naming the steps you can’t take; the chasims you can’t cross.

Continuing with the Async Processing Example

Continuing with the example of using asynchronous processing to make API response time consistent for customers of any size.

Why can’t I make the API asynchronous? 

Three big reasons:

  1. I don’t have a system to queue the requests.
  2. I don’t have a system to process the requests off of a queue.
  3. I don’t have a way to make the processing visible to the client.

The reasons create a giant requirement loop.  I can’t build a job system until I have a queue to read off of, and I can’t queue the work until I can process it.  I also need endpoints and UI to keep the customer informed of how the processing is going, if anything got rejected, and a way to remediate failures.

That’s a lot of work!

Even dropping customer observability, adding queueing and processing is a big step.

Big steps are scary, and not iterative.

Next Steps

Before going on to the next step, pick your 2-3 most important Implications, and write down 3 Blockers for each.

I recommend you don’t write down more than 9 Blockers before moving on to Step 5 - Weakening The Blockers

The Implications Of Your Characteristics

Part Three of my series on iterative delivery 

You have a goal, you have characteristics, now it’s time to ponder the implications.

The implications are things that would have to be true in order for your characteristic to be achieved.  They are levers you can pull in order to make progress.

Let’s work through an example

In part 2 I suggested that a Characteristic of a system that can support clients of any size is that API endpoints respond to requests the same amount of time for clients of any size.

What would need to happen to an existing SaaS in order to make that true?

  • The API couldn’t do direct processing in a synchronous manner.  It would have to queue the client request for async processing.  Adding a message to a queue should be consistent regardless of how large the client, or the queue, becomes.
  • For data requests endpoints would need to be well indexed and have constrained results.
  • Offer async requests for large, open ended, data.  An async request works by quickly returning a token, which can then be used to poll for the results.

Implications are measurable

How much processing can be done asynchronously today?  How much could be done last month?

How many data endpoints allow requests that will perform slowly?  Is that number going up or down?

How robust is the async data request system?  Does it exist at all?

Implications are Levers

Progress on implications pushes your service towards its goal.  Sometimes a little progress will result in lots of movement, sometimes a lot of progress will barely be noticeable.

Speaking to Implications

It is important that you can speak to how the implications drive progress towards your goal.

Asynchronous processing lets your service remain responsive.  It doesn’t mean you can process the data in a timely manner yet.  It sets the stage for parallel processing and other methods of handling large loads.

Next Steps

Before continuing on, try to come up with 3 implications for your most important characteristics.

You’ll want a good selection of implications for the next part - blockers.

We will explore what’s preventing you from moving your system in the direction it needs to go.

Defining Iterative Characteristics

Part 2 of my series on Iterative Delivery

Congratulations, you have a goal!  Now what?

Now, it is time to write down what your goal means to you.  What measurable things will be true about your system by the time you achieve your goal.  What measurable things will be false?

Picking measurable characteristics allows you to iterate towards the goal.

Each one should be 1-2 sentences; short enough that you can still explain them quickly and long enough to remove most ambiguity.

For example, if you are attracting clients that are 10x the size you are used to supporting, a good goal would be something like: We should be able to support clients of any size!

Your characteristics would be something like:

  • Pages in the UI render the same for any size client.  The site should never be ponderous or slow.
  • API endpoints respond to requests the same amount of time for clients of any size.
  • Backend processes always complete fast enough that the customer doesn’t notice them.

Like goal setting, you don’t need to know how to achieve your characteristics.  They still don’t need to be achievable!

You need a general idea of how you will measure the characteristics, but you don’t have to measure or set up measurement tools before proceeding:

Pages in the UI render the same for any size client.  The site should never be ponderous or slow.

Page rendering time can be measured with tools like Chrome’s Lighthouse and network logs.

You should come up with 3-10 characteristics before moving on to the next step, implications.

Picking an Iterative Goal at a Scaleup

Note: This is part of my series on Iterative Delivery

When you are in Scaleup mode, picking a goal to iterate on should be straightforward.

In Scaleup mode, picking an iterative goal should be straightforward.

What can’t you deliver?

Are you attracting larger clients and discovering your software can’t handle their size?

Do you have a swarm of small clients overwhelming the backend?

Does throwing money at your problems keep the software running smoothly, but unprofitably?

Your goal should be a single, short, aspirational sentence.  

If you get stuck, try the “We should be able to ___”  template:

We should be able to support clients of any size!

We should be able to support any number of clients!

We should be able to support clients profitably!

You don’t need to have any idea how to achieve your goal, your goal might not even be achievable.

The important thing is that you can clearly state your goal and explain it to others.

Getting Started With Iterative Delivery

The last 4 posts have been trying to convince you that iterative, baby step, delivery, is better for your clients than moonshot, giant step delivery:

But how do you get started?  How do you shorten your stride from shooting the moon, to one small step?

The next series of posts is going to lay out my scaling iterative delivery framework.  This site is about scaling SaaS software, and this framework works best if you want an order of magnitude more of what you already offer your clients.  This isn’t a general framework, and it certainly isn’t the only way to get started with iterative delivery.

Work your way through these steps:

  1. Pick a goal - 1 sentence, highly aspirational and self explanatory.
  2. Define the characteristics of your goal - What measurable characteristics does your system need in order to achieve your goal?
  3. What are the implications? - What technical things would have to be true in order for your system to have all the characteristics you need?
  4. What are the blockers? - What is stopping you from making the implications true?
  5. What can you do to weaken the blockers? - Set aside the goal, characteristics and implications; what can you do to weaken the blockers?

Weakening the blockers is where you start delivering iteratively.  As the blockers disappear, your system becomes better for your clients and easier for you to implement your technical needs.

We will explore each step in depth in the following posts.

Site Footer