Site icon Sherman On Software

Case Study – A Failure To Iterate Became A Full rewrite

This is a story of how a project that could have been done in a highly iterative fashion became a big bang rewrite.

We chose to do fewer, larger, steps in order to get some features out the door faster.  Ultimately, we failed to deliver the most important feature sooner; and probably delivered it later than if we had gone with the more iterative approach.

Along the way we also encountered all of the expected problems from doing a full rewrite instead of iterating.  This includes more impactful bugs, multiple rollbacks, and a longer project timeline.  As a result we decreased customer satisfaction for no benefit.

The Problem: A Highly Visible Legacy Dashboard

We had a dashboard page with multiple visible problems:

  1. The page had 4s latency to first paint (that means you stare at a blank screen for 4s before you start to see anything).
  2. The widgets on the page had additional load time on top of the page load.
  3. The widgets hadn’t been rethought in ~8 years and weren’t well designed to show customers the most important data.
  4. The dashboard and the widgets were loaded via server side rendering, while the rest of the app had migrated to React and APIs

For these reasons the product group wanted to replace the existing page with an API driven React page.

Lesson: There were also multiple problems that weren’t visible that became problems.  We did not do any research into what other business processes were running through the page.

Tech’s proposal: iterate.

The developers proposed an iterative approach:

  1. First replace each widget with an API + React component.  The widgets would replace the existing widgets on the dashboard page.  Work would be done one widget at a time; and the widget would go into production immediately.  (This would have no impact on the 4s page load time.)
    1. This would keep work in progress minimal.
    2. It would allow us to change widgets one at a time, and roll back one at a time if there was a problem.
    3. Customers would see the widgets changing one at a time.
    4. Once a widget had been converted to react, the frontend devs could also begin iterating on the widget’s design in parallel.
  2. Once all the widgets are converted, replace the dashboard framework.  This would get rid of the 4s load time.
    1. The complete dashboard framework was estimated as half of the total work.  (As much as the widgets).
    2. Customers would be offered a “try the new version” link which would allow them to opt-in to the faster version before all the features were complete.  This would allow the work to go into prod immediately, but without all of the features.

Product’s Counter: The only thing that matters is the 4s load time.

The product group told tech tech that the CEO was personally embarrassed by the 4s load time.  He didn’t care about the widgets themselves.  Therefore, the dashboard framework must go first.

Back and forth, attempts to iterate:

  1. Devs: If we go with the framework first, none of the widgets will be available.  What’s the minimum number of widgets we can use to go live?
    Product: We can do a release with half (4/8) the widgets.  Feature flagged off so that only the CEO and testers can see.
  2. Devs: Once we have the framework and some widgets done, we want to put up a link letting users try/switch to the new dashboard.
    Product: No, no reveal until we have all the widgets.
  3. Devs: Do we need feature parity on the framework before showing to customers?
    Product: No, we can remove things like customized layout for the first release.
  4. Devs: There are a bunch of features that have terrible performance or don’t make sense.  Can we fix them as part of the rewrite?  (NOTE: This was a mistake on our part)
    Product: Yes.  We approve some of the changes on this list

Table is set; we have agreed to do a 50-75% rewrite because that is the minimum for the feature that our stakeholder, the CEO, wants.

Lesson: “The CEO wants the framework first” was an unverified claim.  The CEO (the stakeholder) was never presented with the widget vs framework.  The project manager seized on the difference as a way to get the main feature out faster.

Lesson: No opt-in link was a personal preference of the project managers.  No evidence was presented.  An opt-in link would have mitigated many of the issues by reducing WIP and surprise around the change.

The Project Went Off The Rails

First Derailment: 50% of the widgets is not actually ok

The first project derailment came when the CEO and CPO looked at the dashboard and declared that it was not ok to go out without all of the widgets.  

“Removing features for customers is not ok”, said the CPO.

We were able to put the project back on track by agreeing to only release it to NEW customers.  They would not be losing functionality and would see new features appear every week.

Lesson: At this point it was clear that the project people representing the customer had missed the project set up.  We should have had a meeting with the customer (the CEO) and confirmed/denied the rest of the project.  This would have revealed other issues.

New customers were the first to work with the MVP set of re-written widgets and did find bugs that were missed.  Because the MVP represented a lot of built up work in progress, customers encountered all of the bugs at once.  The iterative approach would not have resulted in any fewer bugs, but they would have emerged once a week instead of all at once.  Iteration would have made the bugs less noticeable and allowed developers to fix them faster.

Lesson: Releasing the MVP to new customers was a mistake.  It put our new customers in a situation where they were more likely to have a poor experience.

Second Derailment: We didn’t know what the framework did

The second project derailment came when we discovered that the framework ALSO contained a billing workflow.  Because we didn’t know it was there, it wasn’t ported, and accounts with free trials were unable to convert.

Because we were only targeting new customers, this meant that the issue hit almost everyone.  We rolled the feature back.

We got the project back on track by adding the billing logic.

Lesson: Neither tech or product did research to understand what ELSE the framework might be doing before beginning the work.  We knew the code was over 10 years old, assuming that only the visible parts were meaningful was foolish. 

Third Derailment: We must respect settings

The third derailment came after we had ported all of the widgets.  At that time the CPO declared that we must also respect customizations (widget ordering and removal) before going live to any existing customers.

Dev pushed back by asking how many customers actually used the feature.  Turns out we did not know.  Search was done at this point to determine that ~11% of users had customized the dashboard.

Project was blocked until we had drag and drop ordering, widget add/remove, and a way to migrate the settings.

At this point we were at full parity with the original dashboard.  The 4s delay was gone, the widgets were optimized for performance, and cleaned up to make the visualizations easier to understand.

Lesson: At this point we had done 100% of the original project as a full rewrite.  We did not achieve our original goal of getting rid of the 4s page load first.  We did not keep WIP low.  We did not release incrementally and gather feedback.  Existing customers got a big bang – we did not hit any intermediate release goals for existing customers.

Fourth Derailment: The features we cut were important to a vocal minority

Turns out that one of the features we cut due to bad performance and clutter, were important to a vocal minority of customers.

Dev tried to argue that the features were available, in a better form, elsewhere, but the decision was made to roll the rewrite back.

Lesson: “I hate the new design” is a common phenomenon when an existing UI is re-skinned.  Iterative releases would have defused the issue.

Lesson: We did not do any customer research to understand how customers used the original page before making performance optimizations.

Lesson: There was a meaningful feature that was only available in a dashboard, and we cut it.  We had to restore the feature and create alternative ways to access the data.

Conclusion: The Worst Of All Successes

We managed to achieve success in the worst possible way:

The lessons were clear:

Above all, iterate!

Exit mobile version