User Defined Field Patterns 2 – NoSql Relations

In part 1 - I covered the classic solution for User Defined Fields; simple but unscalable.

NoSQL emerged as a solution to relational fields in the late 2000s.  Instead of having a meta table defining fields in a relational database, the User Defined data would live in NoSQL.

The structure would look like this:

This model eliminates the meta programming and joining the same table against itself.  The major new headache that this model creates is difficulty in maintaining the integrity of the field data.

Pros:

  • No complicated meta programming.  Instead you write a filter/match function to run against the data in the Collection Of Fields.
  • No more repeated matching against the same table.  Adding additional search criteria has minimal cost.
  • Open ended/internet level scaling.  For a CRM or SaaS, the limiting factor will be the cost of storing data, not a hard limit of the technology.

Cons:

  • Much more complicated to set up and maintain.  Even with managed services supporting two database technologies doubles the difficulty of CRUD.  Multiple inserts, multiple deletes, tons of ways for things to go wrong.
  • Without a relational database enforcing the data structure, poisoned or unreadable data is common.  Being able to store arbitrary data collections means you’ll invariably store buggy data.  You’ll miss some records during upgrades and have to support multiple deserializers.  You will lose customer data in the name of expediency and cost control.
  • It’s more expensive.  You’ll pay for your relational database, NoSQL database, and software to map between the two.  

Conclusion

NoSQL systems solve the scaling problems with setting up User Defined Fields in a relational database.  The scaling comes with high costs in terms of complexity, fragility and costs.

Reducing the complexity, fragility, and costs leads to the upcoming 3rd shift, covered in part 3.

User Defined Field Implementations For CRMs

This series covers a brief history of the 2 historic patterns for implementing User Defined Fields in a CRM, the upcoming hybrid solution that provides the best of both worlds, and how to evolve your existing CRM to the latest pattern.  If you care about CRM performance, scaling, or cost, this series is for you!

What are User Defined Field Patterns?

Every CRM provides a basic fields for defining a customer.  Every CRM’s basic field set is different depending on the CRM’s focus.  So, every user of a CRM needs to expand the basic definition in some way.  Birthdays, purchase history, and interests are three very common additions.

The trick is allowing users to define their own fields in ways that don’t break your CRM.

The Three Patterns

At a high level, there have been three major architectures for implementing Custom Fields.  Most of the design is driven by the strengths and weaknesses of the underlying database architecture.

Pattern 1, generalized columns in a database, spanned the dawn of time until the rise of NoSQL around 2010.

Pattern 2, NoSQL, began around 2010 and continues to today.

Pattern 3, JSON in a relational database, began in the late 2010s and combines the best of the two approaches

Pattern 1 - All in a Relational Database

Before the rise of NoSql there was pretty much one way to build generic user defined fields.

The setup is simple, just 3 tables.  A table of field definitions, a table for contacts, and a relational table with the 2 ids and the value for that contact’s custom field.

The Pros

  • This design is extremely simple and can be implemented by a single developer very quickly.
  • Basic CRUD operations are easy and efficient.

The Cons

  • Building search queries requires complicated techniques like metaprogramming.
  • Every search criteria results in a join against the ContactFields table.  This results in an exponential explosion in query times.
  • The lack of defined table columns handicaps the database’s query optimization strategies.

Conclusion

The classic relational database pattern is easy to set up, but has terrible scaling.  This super simple example would bog down by 1,000 contacts and 50 fields.  

There are lots of ways to redesign for scale, but this is a SHORT history.  Suffice it to say that it takes extremely complex and finicky systems to scale past 100,000 contacts and 1,000 fields.

The solutions to the classic pattern’s scaling led to the NoSQL revolution, covered in part 2.

Cheerfully Going To Fail In Production

This weekend I cheerfully went to lose at a fencing tournament.

That’s not negativity, I’ve been fencing off and on for 30 years and I just started up again after a multi-year hiatus.  I fenced terribly, which I already knew from practice.  What I didn’t know, what practice couldn’t tell me, is where I would break down when I fenced for real.

I was just competent enough at the things I knew to watch for, that my opponents destroyed me in ways I didn’t even remember to practice.

What does this have to do with software and my mantra of Never Rewrite?

After years of not doing maintenance, I am a legacy system.  Practice is literally a testing environment.  If I had spent months working on my known problems, when I finally got to production, I would still have been destroyed by my blind spots.

You can choose to spend months rewriting only to have your system fail when you finally make it to production.

Or, you can choose to cheerfully go to production with what you’ve got so that you can see what breaks and iterate.

Acceptable Beer Bellies in your codebase

How do Beer Bellies begin?

The panel is fully functional, when you push the button a light turns on and the elevator comes.  It is also obviously wrong - the top button is flush with the mount, and the bottom button sticks out.

I found this sad, Beer Belly Elevator Panel, at a high end resort and wondered how it happened.

Certainly whomever installed the mismatched button knew it was wrong.  Did the tech not care?  Was using the wrong button the only way to get the panel repaired?  Was the plan to come back and fix it when the right parts came in?

The hotel maintenance staff had to sign off on it.  Did they care about the quality of the repair?  Were they only able to give a binary assessment of “working” or “not working”?

Did the hotel manager not care?  Were they told to keep costs down?  It isn’t broken now, it would be a waste to fix something that wasn’t broken.

Quality vs Letting Your Gut Hang Out

Employees at the hotel see the mismatched panel every day.  It is a constant reminder that letting things slide, just a little, is acceptable at this hotel.

When you let consistency and quality slide because something works, you’re creating beer bellies in your codebase.

One small button at a time until everyone sees that this is acceptable here.

So long as a light turns on when you hit the button does it matter if the light is green, red or blue?  Does it matter if the light is in the center or on the edge?

But I’m running a SaaS, not a Hotel

Your SaaS may not maintain elevator panels, but your codebase is probably full of beer bellies.

“It works, we’ll clean it up on the next release” bellies.

“This is a hack” bellies.

“This is the legacy version, we’re migrating off of it” bellies.

When you let sad little beer bellies into your codebase, your employees see exactly what you find acceptable.

Chipping Away

You have a goal, you know what it means, and what it implies.

You also know what’s blocking your progress.

It’s time to iterate!

Iterating Against the Blockers

Ask yourself:

  • What’s the first step?
  • What if all I needed was a tool?
  • How will I know if it’s working?

These three primary questions will help you chip away at the Blockers.

You want progress, not victory.  When you keep iterating away at the Blockers, eventually achieving your goals becomes easy.

Besides the questions, the main constraint to keep in mind is that each iteration needs to leave the system in a state where you can walk away from the project.

Unused functionality?  Totally fine.  

Refactored code with no new usage?  Great!

Tools that work but don’t do anything useful?  One day.

Nothing that requires keeping two pieces of code in sync.

Nothing that would prevent other developers from evolving existing code.

Whatever you do, it must go into the system with each iteration.

Example: Continuing from the Async Processing Blockers

I want to make my API asynchronous so that client size doesn’t impact the API’s responsiveness.  But, I can’t make the API asynchronous because:

  • I don’t have a system to queue the requests.
  • I don’t have a system to process the requests off of a queue.
  • I don’t have a way to make the processing visible to the client.

Attempting to do all of this work in one giant step is a recipe for a project that gets delivered 6 months to a year late.

I’m going to hand wave and declare that we are using SQS, AWS’s native queuing system.  This makes setting up the queue trivial and reliable.

I don’t have a system to queue the requests.

What’s the first step?

Write a data model and serializer.  What am I even going to write onto this queue?

What if all I needed was a tool?

Instead of worrying about a system, create a command line tool in your existing codebase to push data to SQS.  It won’t be reliable, it won’t have logging, and it won’t have vizability.

But you’re the only one using it, so that’s fine.

How will I know it’s working?

Manually!  AWS has great observability.  You don’t need to do anything.

Combining your first step, a data model, your tool and AWS observability you’ll be able to push data onto a queue and view what got sent.

The data model will be wrong and the tool will not be production ready!

That’s ok because no existing functionality is blocked or broken.  Getting interrupted doesn’t create risk, which means you can work even if you only have a little time.

I don’t have a system to process the requests off of a queue.

What’s the first step?  

Write a data model and deserializer.  What data do I need to be on the queue in order to recreate the event I need to process?

What if all I needed was a tool?

Create a tool to pull the message off the queue and deserialize.  Send the result to a data validator.  (You’re accepting customer requests from an API, you’d better have a data validator)

How will I know it’s working?

Manually!  AWS has great observability.  You don’t need to do anything.

Combining the three gets the ability to manually pull data off the queue, deserialize and validate.

You can do this before, after or during your work to get the data onto a queue.  It’s not production ready, but it also doesn’t create risk.

I don’t have a way to make the processing visible to the client.

What’s the first step? 

What does visibility look like to the client?  Where does the data go in your UI?  What would you want to know?

What if all I needed was a tool?

Make an endpoint that calls AWS and returns the data you think you need.

How will I know it’s working?

Manually!  Compare what your endpoint tells you with what AWS tells you.  Don’t start until you have tools for adding and removing events from the queue.

Combining the three gets you an endpoint that tells you about the queue.

The endpoint should be safe to deploy to production.  The queue is always empty.

Conclusion

Iterating allows you to chip away at your blockers until there’s nothing stopping you.

Apply the three questions:

  • What’s the first step?
  • What if all I needed was a tool?
  • How will I know if it’s working?

Always keep the system in a state where you can walk away from the project.

Keep iterating against your blockers, and you’ll be amazed at how soon you’ll achieve your goals!

The Blockers

Implications, defining what has to be true for your goal to succeed, are the hardest step.

Step 4, The Blockers, are relatively easy.

What’s Stopping You?

What is stopping you and your team from making your implications true?

I’m not working on it, and there aren’t enough hours in the day, aren’t valid answers.

The Implications are too big, too complex and too scary to tackle head on.  If they weren’t, you wouldn’t be stuck.

Defining the blockers is about naming the steps you can’t take; the chasims you can’t cross.

Continuing with the Async Processing Example

Continuing with the example of using asynchronous processing to make API response time consistent for customers of any size.

Why can’t I make the API asynchronous? 

Three big reasons:

  1. I don’t have a system to queue the requests.
  2. I don’t have a system to process the requests off of a queue.
  3. I don’t have a way to make the processing visible to the client.

The reasons create a giant requirement loop.  I can’t build a job system until I have a queue to read off of, and I can’t queue the work until I can process it.  I also need endpoints and UI to keep the customer informed of how the processing is going, if anything got rejected, and a way to remediate failures.

That’s a lot of work!

Even dropping customer observability, adding queueing and processing is a big step.

Big steps are scary, and not iterative.

Next Steps

Before going on to the next step, pick your 2-3 most important Implications, and write down 3 Blockers for each.

I recommend you don’t write down more than 9 Blockers before moving on to Step 5 - Weakening The Blockers

The Implications Of Your Characteristics

Part Three of my series on iterative delivery 

You have a goal, you have characteristics, now it’s time to ponder the implications.

The implications are things that would have to be true in order for your characteristic to be achieved.  They are levers you can pull in order to make progress.

Let’s work through an example

In part 2 I suggested that a Characteristic of a system that can support clients of any size is that API endpoints respond to requests the same amount of time for clients of any size.

What would need to happen to an existing SaaS in order to make that true?

  • The API couldn’t do direct processing in a synchronous manner.  It would have to queue the client request for async processing.  Adding a message to a queue should be consistent regardless of how large the client, or the queue, becomes.
  • For data requests endpoints would need to be well indexed and have constrained results.
  • Offer async requests for large, open ended, data.  An async request works by quickly returning a token, which can then be used to poll for the results.

Implications are measurable

How much processing can be done asynchronously today?  How much could be done last month?

How many data endpoints allow requests that will perform slowly?  Is that number going up or down?

How robust is the async data request system?  Does it exist at all?

Implications are Levers

Progress on implications pushes your service towards its goal.  Sometimes a little progress will result in lots of movement, sometimes a lot of progress will barely be noticeable.

Speaking to Implications

It is important that you can speak to how the implications drive progress towards your goal.

Asynchronous processing lets your service remain responsive.  It doesn’t mean you can process the data in a timely manner yet.  It sets the stage for parallel processing and other methods of handling large loads.

Next Steps

Before continuing on, try to come up with 3 implications for your most important characteristics.

You’ll want a good selection of implications for the next part - blockers.

We will explore what’s preventing you from moving your system in the direction it needs to go.

Defining Iterative Characteristics

Part 2 of my series on Iterative Delivery

Congratulations, you have a goal!  Now what?

Now, it is time to write down what your goal means to you.  What measurable things will be true about your system by the time you achieve your goal.  What measurable things will be false?

Picking measurable characteristics allows you to iterate towards the goal.

Each one should be 1-2 sentences; short enough that you can still explain them quickly and long enough to remove most ambiguity.

For example, if you are attracting clients that are 10x the size you are used to supporting, a good goal would be something like: We should be able to support clients of any size!

Your characteristics would be something like:

  • Pages in the UI render the same for any size client.  The site should never be ponderous or slow.
  • API endpoints respond to requests the same amount of time for clients of any size.
  • Backend processes always complete fast enough that the customer doesn’t notice them.

Like goal setting, you don’t need to know how to achieve your characteristics.  They still don’t need to be achievable!

You need a general idea of how you will measure the characteristics, but you don’t have to measure or set up measurement tools before proceeding:

Pages in the UI render the same for any size client.  The site should never be ponderous or slow.

Page rendering time can be measured with tools like Chrome’s Lighthouse and network logs.

You should come up with 3-10 characteristics before moving on to the next step, implications.

Picking an Iterative Goal at a Scaleup

Note: This is part of my series on Iterative Delivery

When you are in Scaleup mode, picking a goal to iterate on should be straightforward.

In Scaleup mode, picking an iterative goal should be straightforward.

What can’t you deliver?

Are you attracting larger clients and discovering your software can’t handle their size?

Do you have a swarm of small clients overwhelming the backend?

Does throwing money at your problems keep the software running smoothly, but unprofitably?

Your goal should be a single, short, aspirational sentence.  

If you get stuck, try the “We should be able to ___”  template:

We should be able to support clients of any size!

We should be able to support any number of clients!

We should be able to support clients profitably!

You don’t need to have any idea how to achieve your goal, your goal might not even be achievable.

The important thing is that you can clearly state your goal and explain it to others.

Site Footer