Tag: #best-alternative-to-a-total-rewrite

3 Signs Your Resource Allocation Model Is Working Against You

In software designTags #best-alternative-to-a-total-rewrite, #neverrewrite, SaaS, software architectureMarch 22, 2022111 ViewsLeave a comment

jeffpsherman

3 Signs Your Resource Allocation Model Is Working Against You

After 6 posts on SaaS Tenancy Models, I want to bring it back to some concrete examples. When your SaaS has a Single Tenant model, clients expect to allocate all the resources they need, whenever they want. When every client is entitled to the entire resource pool, no client gets a great customer experience.

Here are 3 signs your Resource Allocation Model is working against you:

Large clients cause small client’s work to stall
You have to rebalance the mix of clients in a cell for stability
Run your job at night for best performance

Large clients cause small client’s work to stall

This is a classic “noisy neighbor” problem. Each client tries to claim all the shared resources needed to do their work. This isn’t much of a problem when none of the clients need a significant percentage of the pool. When a large client comes along, it drains the pool, and leaves your small clients flopping like fish out of water.

You have to rebalance the mix of clients in a cell for stability

When having multiple large clients in a cell affects stability, the short term solution is to migrate some clients to another cell. Large clients can impact performance, but they should not be able to impact stability. Moving clients around buys you time, but it also forces you to focus on smaller, less profitable clients.

Run your job at night for best performance

This is advice that often pops up on SaaS message boards. Don’t try to run your job during the day, schedule it to run in the evening so it is ready for the morning. When clients start posting workarounds to your problems, it’s a clear sign of frustration. Your clients are noticing that performance varies by the time of day. They are building mental models of your platform and deciding you have load and scale issues. By being helpful to each other, your clients are advertising your problems.

Conclusion

These 3 issues have the same root cause; your SaaS’s operational data is mixed in with client data. If you have any of these three problems, the time has come to separate your data from the clients’.

Fixing these problems won’t be easy or quick!

The good news is that you can separate the data and change your resource allocation model in an iterative fashion. Start by pushing your job service across the tenancy line.

Get value and regain control one incremental step at a time, and never do a rewrite!

Amazon’s Elastic Load Balancer is a Strangler

In UncategorizedTags #batr, #best-alternative-to-a-total-rewrite, software architectureMarch 4, 2020814 Views1 Comment

jeffpsherman

The Strangler is an extremely effective technique for phasing out legacy systems over time. Instead of spending months getting a new system up to parity with the current system so that clients can use it, you place The Strangler between the original system and the clients.

The Strangler passes any request that the new system can’t handle on to the legacy system. Over time, the new system handles more and more, the legacy system does less and less. Most importantly, your clients won’t have to do any migration work and won’t notice a thing as the legacy system fades away.

A common objection to setting up a Strangler is that it Yet Another Thing that your overloaded team needs to build. Write a request proxy on top of rewriting the original requests! Who has time?

Except, AWS customers already have a fully featured Strangler on their shelf. The Elastic Load Balancer (ELB) is a tool that takes incoming requests and forwards them on to another server.

The only requirement is that your clients access your application through a DNS name.

With an afternoon’s worth of effort you can set up a Strangler for your legacy application.

You no longer need to get the new system up to feature parity for clients to start using it! Instead, new features get routed to the new server, while old ones stay with the legacy system. When you do have time or a business reason to replace an existing feature the release is nothing more than a config change.

Getting a new system up to parity with the legacy system is a long process with little business value. The Strangler lets your new system leverage the legacy system, and you don’t even have to let your clients know about the migration. The Strangler is your Best Alternative to a Total Rewrite!

A Tale of Two Failures

In UncategorizedTags #best-alternative-to-a-total-rewrite, software development, team dynamicsDecember 18, 2019233 ViewsLeave a comment

jeffpsherman

It was the best of times, it was the worst of times. The company hired contractors to do maintenance work so that the full time developers can write a new system, the company hired new developers to write a new system stuck original developers with the maintenance work.

When software has been bankrupted by tech debt, a common strategy is to start over. Just like financial bankruptcy, the declaration is a way to buy time while continuing to run the business. The software can’t be turned off, someone has to keep the old system running while the new system is built. Depending on your faith and trust in your current employees you will be tempted to either hire “short term” contractors to keep things running, or hire a new team to write the new system.

The two tactics play out differently, but the end result is the same, the new system will fail.

Why Maintenance Contractors Fail

Contractors fail because they won’t have the historical context for the code. They may know how it works today, but not how it got there. Without deep business knowledge their actions are limited to superficial bug fixes and tactical features. The job is to keep the lights on, not to push back on requests or ask Who, What, Where, When, Why.

Since the contractors are keeping the lights on for you while current employees build a new system, they aren’t going to care about the long term quality of the system. It’s already bad, it’s already going away, why spend the extra time refactoring.

Contractors won’t be able to stabilize the system and buy your employees enough time to build the new system. Equally bad, sometimes the contractors *will* stabilize the old system which becomes an excuse to cancel the new system and not restructure your debts.

Why Hiring a New Team Fails Too

The flip side is to leave the current team in place and hire a new team to write the new system. This is super attractive when you suspect that the original team will recreate the same mistakes that led to disaster.

This tactic fails because the new team won’t have the historical context for the code. Documentation and details will be superficial and lack all of the critical edge cases. All of which delays the project. Meanwhile, your original team will become extremely demoralized.

If they couldn’t keep things running before, wait until you tell them their whole job is to keep things running. The most capable members will quickly leave and you’ll have to pull developers from the new team onto the old system.

You’ll end up pulling the new developers onto the old team and praying they don’t see it as bait-and-switch.

Don’t split on Old vs New

Both tactics fail because the teams are divided between old vs new. In the end it doesn’t matter who is on which team, because you need old *and* new to be successful.

Keeping the team together ensures that everyone has context into how the system works and has skin in the game.

Instead of splitting old vs new, find a way to split the system in half. Have each team own half the old system responsibilities, and one of the two new systems. This gives everyone a stake in keeping the old system alive, a chance to work on the new system, and reduces the business risk of either team failing.

Once you find a way to split the responsibilities in half, you might even find a way to make iterative improvements, it's your Best Alternative to a Total Rewrite!

Dead Database Pixel Tracking

In UncategorizedTags #batr, #best-alternative-to-a-total-rewrite, SaaS, software architectureDecember 4, 2019350 ViewsLeave a comment

jeffpsherman

Pixel Tracking is a common Marketing SaaS activity used to track page loads. Today I am going to try and tie several earlier posts together and show how to evolve a frustrating Pixel Tracking architecture into one that can survive database outages.

Pixel Tracking events are synchronously written to the database. A job processor uses the database as a queue to find updates, and farms out processing tasks.

Designed to Punish Users

This design is governed by database performance. As the load ramps up, users are going to notice lagging page loads. Worse, each event recorded will have to be processed, tripling the database load.

Designed to Scale

You can relieve the pressure on the user by making your Pixel Tracking asynchronous. Moving away from using your database as a queue is more complicated, but critical for scaling. Finally, using Topics makes it easy to expand the types of processing tasks your platform supports.

Users are now completely insulated from scale and processing issues.

Dead Database Design

There is no database in the final design because it is no longer relevant to the users’ interactions with your services. The performance is the same whether your database is at 0% or 100% load.

The performance is the same if your database falls over and you have to switch to a hot standby or even restore from a backup.

With a bit of effort your SaaS could have a database fall over on the run up to Black Friday and recover without data loss or clients noticing. If you are using SNS/SQS on AWS the queue defaults are over 100,000 events! It may take a while to chew through the queues, but the data won't disappear.

When your Pixel Tracking is causing your users headaches, going asynchronous is your Best Alternative to a Total Rewrite.

Three Ways To Refactor a Legacy System – A Cheesy Analogy

In UncategorizedTags #best-alternative-to-a-total-rewrite, #development, #neverrewriteNovember 12, 2019282 ViewsLeave a comment

jeffpsherman

Software is immortal, but systems age. They reach maximum capacity and can't scale to support additional clients. They get twisted into knots as your business evolves in ways the system wasn't designed to support.

Without constant vigilance you end up with a system that your developers hate to work on and your clients find frustrating. You realize the current system is holding your business back and ask for options.

The most common answer, unfortunately, is the "6 month rewrite", also known as "a big bang." Just give your developers 6 months and they will produce a new system that does all of the good things from the old system, and none of the bad.

The "6 month rewrite" almost never works and often leaves your company in a worse situation because of all the wasted time and resources. I'm going to try and explain why with a very cheesy analogy, and suggest 2 much more effective strategies.

A very cheesy analogy

Imagine, that this piece of string cheese is your system:

"A 6 month rewrite" or "big bang" is the idea that your developers are going to shove the whole thing in their mouths and chew the whole log.

You won't really see any progress during the system mastication, but you'll be able to see the developer's jaws chewing furiously.

6 months is a long time to have developers working on one and only one thing. Especially when the chewing takes longer than expected and you reach the 9, 12, 18 month point. If you stop you'll be left with this:

Your original system. Worse for the wear and tear, but fundamentally, the original system that is restricting your business.

It's the worst of all worlds, you get no value unless the whole cheese is chewed, and you loose all the potential value if you stop!

Cut it up INTO small pieces

A great strategy when your system is failing due to scaling issues is to cut it up and refactor small pieces. Scaling issues include your system not being fast enough, unable to handle enough clients, or unable to handle large clients.

You can analyze which of these pieces are responsible for the bottlenecks in your system and tackle just those pieces:

And if you have to stop work on a single piece?

Your potential loses are much smaller.

Steel threads

When your system has been tied in knots due to changing requirements, replacing individual pieces won't help. Instead, try peeling off small end-to-end slices, creating stand alone pieces that work the way your business works now:

This is the "steel thread" or "tracer bullet" model for refactoring a system.

It allows you to try small, quick, ways to build a new system. Each thread adds immediate value as it is completed. You don't run the risk of having a large body of work that isn't helping your clients.

Like the "small pieces" strategy, you can stop and start without much loss.

Conclusion

6 month rewrites are risky and likely to fail and leave you with nothing of value from your investment of time and resources. Small piece and steel thread strategies offer ways to quickly get incremental value into your client's hands, and greatly reduce the risk of wasted work. They're your best alternative to a total rewrite!

Your Database is not a queue – A Live Example

In UncategorizedTags #best-alternative-to-a-total-rewrite, #development, #SaaS, best-alternative-to-a-total-rewrite, SaaS, software developmentNovember 11, 2019223 ViewsLeave a comment

jeffpsherman

A while ago I wrote an article, Your Database is not a Queue, where I talked about this common SaaS scaling anti-pattern. At the time I said:

Using a database as a queue is a natural and organic part of any growing system. It’s an expedient use of the tools you have on hand. It’s also a subtle mistake that will consume hundreds of thousands of dollars in developer time and countless headaches for the rest of your business. Let’s walk down the easy path into this mess, and how to carve a way out.

Today I have a live example of a SaaS company, layerci.com, proudly embracing the anti-pattern. In this article I will compare my descriptions with theirs, and point out expensive and time consuming problems they will face down the road.

None of this is to hate on layerci.com. An expedient solution that gets your product to market is worth infinitely more than a philosophically correct solution that delays giving value to your clients. My goal is to understand how SaaS companies get themselves into this situation, and offer paths our of the hole.

What's the same

In my article I described a system evolving out of reporting, layerci's problem:

We hit it quickly at LayerCI - we needed to keep the viewers of a test run's page and the github API notified about a run as it progressed.

I described an accidental queue, while layerci is building one explicitly:

CREATE TYPE ci_job_status AS ENUM ('new', 'initializing', 'initialized', 'running', 'success', 'error');

CREATE TABLE ci_jobs(
	id SERIAL, 
	repository varchar(256), 
	status ci_job_status, 
	status_change_time timestamp
);

/*on API call*/
INSERT INTO ci_job_status(repository, status, status_change_time) VALUES ('https://github.com/colinchartier/layerci-color-test', 'new', NOW());

I suggested that after you have an explicit, atomic, queue your next scaling problem is with failures. Layerci punts on this point:

As a database, Postgres has very good persistence guarantees - It's easy to query "dead" jobs with, e.g., SELECT * FROM ci_jobs WHERE status='initializing' AND NOW() - status_change_time > '1 hour'::interval to handle workers crashing or hanging.

What's different

There are a couple of differences between the two scenarios. They aren't material towards my point so I'll give them a quick summary:

My system imagines multiple job types, layerci is sticking to a single process type
layerci is doing some slick leveraging of PostgreSQL to alleviate the need for a Process Manager. This greatly reduces the amount of work needed to make the system work.

What's the problem?

The main problem with layerci's solution is the amount of developer time spent designing the solution. As a small startup, the time and effort invested in their home grown solution would almost certainly have been better spent developing new features or talking with clients.

It's the failures

From a technical perspective, the biggest problem is lack of failure handling. layerci punts on retries:

As a database, Postgres has very good persistence guarantees - It's easy to query "dead" jobs with, e.g., SELECT * FROM ci_jobs WHERE status='initializing' AND NOW() - status_change_time > '1 hour'::interval to handle workers crashing or hanging.

Handling failures is a lot of work, and something you get for free as part of a queue.

Without retries and poison queue handling, these failures will immediately impact layerci's clients and require manual human intervention. You can add failure support, but that's throwing good developer time after bad. Queues give you great support out of the box.

Monitoring should not be an afterthought

In addition to not handling failure, layerci's solution doesn't handle monitoring either:

Since jobs are defined in SQL, it's easy to generate graphql and protobuf representations of them (i.e., to provide APIs that checks the run status.)

This means that initially you'll be running blind on a solution with no retries. This is the "Our customers will tell us when there's a problem" school of monitoring. That's betting your client relationships on perfect software with no hiccups. I don't like those odds.

SCALING Databases is expensive

The design uses a single, ever growing jobs table ci_jobs, which will store a row for every job forever. The article points out postgreSQL's amazing ability to scale, which could keep you ahead of the curve forever. Database scaling is the most expensive piece in any cloud application stack.

Why pay to scale databases to support quick inserts, updates and triggers on a million row table? The database is your permanent record, a queue is ephemeral.

Conclusion

No judgement if you build a queue into your database to get your product to market. layerci has a clever solution, but it is incomplete, and by the time you get it to work at scale in production you will have squandered tons of developer resources to get a system that is more expensive to run than out of the box solutions.

Do you have a queue in your database? Read my original article for suggestions on how to get out of the hole without doing a total rewrite.

Is your situation unique? I'd love to hear more about it!

What does Go Asynchronous mean?

In UncategorizedTags #batr, #best-alternative-to-a-total-rewrite, #development, #SaaSNovember 6, 201951 ViewsLeave a comment

jeffpsherman

In an earlier post I suggested Asynchronous Processing as a way to buy time to handle scaling bugs. Remembering my friend and his comment “assume I have a hammer, a screwdriver, and a database”, today’s post will explain Synchronous versus Asynchronous processing and discuss how asynchronous processing will help your software scale.

Processing: Synchronous versus Asynchronous

Synchronous Explained

Synchronous processing means that each step starts, does some action, and then starts the next step. Eventually the last action completes and returns, and so on back.

A basic synchronous web request looks like this:

A user clicks save and the browser tells the server to save the data. The server tells the database. The database returns OK, then the server returns OK, and the browser shows a Save Successful message.

Simple to understand, but when you are having scaling problems, sometimes that save time can go from 100ms to 10s. It’s a horrible user experience and unnecessary wait!

Asynchronous Explained

Asynchronous Processing gives a superior user experience by returning to the browser immediately. The actual save will be processed later. This makes things more complex because the request has been decoupled from the processing.

The user is now insulated from scaling issues. It doesn’t matter if the save takes 100ms or 10s, the user gets a consistent experience.

In an asynchronous model, the user doesn’t get notified that the save was successful. For most cases this is fine, the user shouldn’t be worried about whether their actions are succeeding, the client should be able to assume success.

The client being able to assume success does not mean your system can assume success! Your system still needs to handle failures, exceptions and retries! You just don’t need to drag the user into it. Since you no longer have a direct path from request through processing, asynchronous operations can be harder to reason about and debug.

For instances where “blind” asynchronous isn’t acceptable you need a polling mechanism so that the user can check on the status.

How Asynchronous Processing Helps Systems to Scale

With synchronous processing your system must process all of the incoming activity and events as they occur, or your clients will experience random, intermittent, failures.

Synchronous scaling results in numerous business problems:

It runs up infrastructure costs. The only way to protect service level agreements is by greatly over provisioning your system so that there is significant excess capacity.
It creates repetitional problems. Clients can easily impact each other with cyclical behavior. Morning email blasts, hourly advertising spending rates, and Black Friday are some examples.
You never know how much improvement you’ll get out of the next fix. As your system scales you will always be rate-limited by a single bottleneck. If your system is limited to 100 events/s because your database can only handle 100 events/s, doubling the hardware might get you to 200 events/s, or you might discover that your servers can only handle 120 events/s.
You don’t have control over your system’s load. The processing rate is set by your clients instead of your architecture. There is no way to relieve pressure on your system without a failure.

Asynchronous processing gives you options:

You can protect your service level agreements by pushing incoming events onto queues and acknowledging the event instantly. Whether it takes 100ms, 1s, or 10 minutes to complete processing, your system is living up to its service level agreements.
After quickly acknowledging the event, you can control the rate at which the queued events are processed at a client level. This makes it difficult for your large clients to starve out the smalls ones.
Asynchronous architecture forces you to loosely couple your system’s components. Each piece becomes easy to load test in isolation, giving you'll have a pretty good idea about how much a fix will actually help. It also makes small iterations much more effective. Instead of spending 2x to double your databases when your servers can only support another 20%, you can increase spending 20% to match your server’s max capacity. Loosely coupled components can also be worked on by different teams at the same time, making it much easier to scale your system.
You regain control over system load. Instead of everything, all at once, you can set expectations. If clients want faster processing guarantees, you can now not only provide them, but charge accordingly.

Conclusion

Shifting from synchronous to asynchronous processing will require some refactoring of your current system, but it’s one of the most effective ways to overcome scaling problems. You can be highly tactical with your implementation efforts and apply asynchronous techniques at your current bottlenecks to rapidly give your system breathing room.

If your developers are ready to give up on your current system, propose one or two spots to make asynchronous. You will get your clients some relief while rebuilding your team's confidence and ability to iterate. It’s your best alternative to a total rewrite!

Four ways Scaling Bugs are Different

In UncategorizedTags #best-alternative-to-a-total-rewrite, best-alternative-to-a-total-rewrite, scalingOctober 30, 2019158 ViewsLeave a comment

jeffpsherman

Scaling Bugs don’t really exist, you will never find “unable to scale” in your logs. Scaling bugs are timing, concurrency and reliability bugs that emerge as your system scales. Today I’m going to show you 4 signs that your system is being plagued by scaling bugs, and 4 things you can do to buy time and minimize your client’s pain.

Scaling bugs boil down to “Something that used to be reliable is no longer reliable and your code doesn’t handle the failure gracefully”. This means that they are going to appear in the oldest parts of your codebase, be inconsistent, bursty, and hit your most valuable clients the hardest.

Scaling Bugs appear in older, stable, parts of your codebase

The oldest parts of your are typically the most stable, that’s how they managed to get old. But, the code was also written with lower performance needs and higher reliability expectations.

Reliability bugs can lay dormant for years, emerging where you least expect it. I once spent an entire week finding a bug deep in code that hadn't changed in 10 years. As long as there were no problems, everything was fine, but a database connection hiccup in one specific function would cause a cascading failure on a distributed task being processed on over 30 servers.

Database connectivity is ridiculously stable these days, you can have hundreds of servers and go weeks without an issue. Unless your databases are overloaded, and that’s when the bug struck.

Scaling Bugs Are Inconsistent

Sometimes the system has trouble, sometimes things are fine. Even more perplexing is that they occur regardless of multi-threading or the statefulness of your code.

This makes scaling bugs difficult to find, since you’ll never be able to reproduce them locally. They won’t appear for a single test execution, only when you have hundreds or thousands of events happening simultaneously.

Even if your code is single threaded and stateless, your system is multi-process and has state. A serverless design still has scaling bottlenecks at the persistence layer.

Scaling Bugs Are Bursty

Bursty means that the bugs appear in clusters, usually in ever increasing numbers after ever shorter intervals. Initially the error crops up once every few weeks and does minimal damage, so it gets documented as low priority and never worked on. As your platform scales though, the error starts popping up 5 at a time every few days, then dozens of time once a day. Eventually the low priority, low impact bug becomes an extremely expensive support problem.

Scaling Bugs Hit Your Most Valuable Clients Hardest

Which are the clients with the most contacts in a CRM? Which are the ones with the most emails? The most traffic and activity?

The same ones paying the most for the privilege of pushing your platform to the limit.

The impact of scaling bugs mostly fall on your most valuable clients, which makes their potential impact high in dollar terms.

Four ways to buy time

These tactics aren’t solutions, they are ways to buy time to transform your system to one that operates at scale. I’ll cover some scaling tactics in a future post!

Throw money at the problem

There’s never a better time to throw money at a problem then the early stages of scaling problems! More clients + larger clients = more dollars available.

Increase the number of servers, upgrade the databases, and increase your network throughput. If you have a multi-tenant setup, add shards and decrease the number of customers running on the same hardware.

If throwing money at the problem helps, then you know you have scaling problems. You can also get a rough estimate of the time-for-money runway. If the improved infrastructure doesn’t help you can downgrade everything and stop spending the extra money.

Keep your Error Rate Low

It’s common for the first time you notice a scaling bug to be when it causes a cascading system failure. However, it’s rare for that to be the first time the bug manifested itself. Resolving those low priority rare bugs is key to keeping catastrophic scaling bugs at bay.

I once worked on a system that ran at over 1 million events per second (100 billion/day). We had a saying: The nice thing about this system is that something that’s 1 in a million happens 60 times a minute. The only known error we let stand: Servers would always fail to process the first event after a restart.

Retries

As load and scale increases, transient errors become more common. Take a design cue from RESTful systems and add retry logic. Most modern databases support upsert operations, which go a long way towards making it safe to retry inserts.

Asynchronous Processing

Most actions don’t need to be processed synchronously. Switching to asynchronous processing makes many scaling bugs disappear for a while because the apparent processing greatly improves. You still have to do the processing work, and the overall latency of your system may increase. Slowly and reliably processing everything successfully is greatly preferable to praying that everything processes quickly.

Congratulations! You Have Scaling Problems!

Scaling bugs only hit systems that gets used. Take solace in the fact that you have something people want to use.

The techniques in this article will help you buy enough time to work up a plan to scale your system. Analyze your scaling pain points to gain insight into which parts of your system are most useful to your clients and prioritize your refactoring accordingly.

Remember that there are always ways to scale your current system without resorting to a total rewrite!

Tactical API Obsolescence

In UncategorizedTags #best-alternative-to-a-total-rewriteOctober 9, 2019186 ViewsLeave a comment

jeffpsherman

For a SaaS company, the cost of sudden API changes aren’t measured in your developers’ time, but your clients’ trust and goodwill. Worse, the burden doesn't fall on small clients at your lowest tier, you are placing the burden squarely on your most valuable enterprise customers, who have invested in custom API integrations. API changes are unexpected hits to your client’s timelines and budgets. Nothing will change your image from a trusted partner, to a flake that needs to be replaced, faster than a developer spending a day doing a hotfix because their API integration stopped working.

At the same time, you need to keep improving and expanding your service. How do you keep moving forward? The answer is extremely low tech: set expectations and communicate your API Obsolescence plans.

By law, auto manufacturers have to make parts for 10 years after they stop selling a car. Microsoft typically guarantees 5 years of support for Windows as part of the purchase price, with an option to pay for 5 additional years.

Unless you publish a sunset date for an API feature, your clients will expect you to keep supporting it for the life of the contract. Most SaaS companies offer discounts for annual renewals; your API is locked in until a year from tomorrow!

Can you afford to support your current API for another year? Can you afford to change it?

If you don’t have an Obsolescence strategy, here are 5 tips for getting started:

Notify early! A year is ideal, but a week is better than nothing. Your goal is to be a trustworthy and stable partner. Putting notifications on your website will give you clarity and your support reps something to bring to your clients.
The less lead time you can give, the more important it is to reach out and talk to your most important clients. Make sure you have a way to communicate with your client’s developers. Email lists, documentation, even a notice on login. From your client’s perspective, all outages from missed notifications are your fault so you have to take responsibility for notifications.
Know which clients are using which features. Metrics around which clients use which pieces of your API are critical for making prioritization and communication decisions. These metrics don’t need to be real time, you can generate them nightly or weekly from your server log files. But if you don’t know who uses what, you have to assume everyone uses everything.
API versioning makes everything easier. If you aren’t using a versioning scheme, that should be your first change.
An endpoint is a contract, not code, and you should only communicate contract changes. You can create a new endpoint that uses old code, or completely rewrite the code behind an existing endpoint. Endpoints are a decoupling point, and they give you a bit of freedom and wiggle room.

Taking control of your APIs lifecycle is key to managing your tech debt. Tactical obsolescence may be your Best Alternative to a Total Rewrite.

Talking bout my generation

In dev teamsTags #best-alternative-to-a-total-rewrite, #neverrewriteOctober 2, 2019226 Views2 Comments

jeffpsherman

In the world of SaaS software, nothing is as sure a sign that a project will fail than calling it The Next Generation. No one wants the next generation of your software! Your clients don’t want The Next Generation, they want the service that they’re paying for. If they wanted something revolutionarily different, they wouldn’t be your clients. Your internal users don’t want The Next Generation, they want tools that help them do their job.

The Next Generation is something created by developers to sugar coat a full rewrite of the existing software. The Next Generation has no business value. If you need The Next Generation of your software in order to write a new reporting module, it’s a sign you’ve given up on the current system.

When developers pitch The Next Generation they are being lazy. It in an admission that the developers do not understand the business or care about client needs. The Next Generation has no release plan other than replacement, and any theoretical client value comes at the end of the project. At the same time it consumes resources that could have been used to for incremental features or code improvements.

The Next Generation starts with great fanfare, then goes silent for 6 months. Then the team starts growing! Not because of success, but the sunk cost fallacy. The company has spent so much and things are so close. Good money after bad, critical months after lost months. Around 12 months management starts to micromanage. At 18 months the project is declared a failure. The team lead leaves. Often the managers are not far behind. After millions of dollars spent on The Next Generation, you’re still where you started.

In SaaS, clients are buying This Generation. If your developers are done with This Generation, you don’t need The Next Generation, you need to find your Best Alternative to a Total Rewrite!

Older Posts