Whose Data Is It Anyway?

The tenancy line is a useful construct for separating SaaS data from client data.  When you have few clients, separating the data may not be worth the effort of having multiple data stores.  As your system grows, ensuring that client data is separated from the SaaS data becomes as critical as ensuring the clients’ data remains separate from each other.

Company data is everything operational.  Was an action successful?  Does it need to be retried?  How many jobs are running right now?  This data is extremely important to make sure that client jobs are run efficiently, but it’s not relevant to clients.  Clients care about how long a job takes to complete, not about your concurrency, load shaping, or retry rate.

While the data is nearly meaningless to your clients, it is useful to you.  It becomes more useful in aggregate.  It has synergy.  A random failure for one client becomes a pattern when you can see across all clients.  When operational data is stored in logically separated databases you quickly lose the ability to check the data.  This is when it becomes important to separate operational data from clients.  

Pull the operational data from a single client into a multi-tenant repository for the SaaS, and suddenly you can see what’s happening system wide.  Instead of only seeing what’s happening to a single client, you see the system.

Once you can see the system, you can shape it.  See this article for a discussion on how.

Other considerations

If visibility isn’t enough, extracting operational data is usually its own reward.

Operational data is usually high velocity – tracking a job’s progress involves updating the status with every state change.  If your operational store is the same as the client store, tracking progress conflicts with the actual work.

One thought on “Whose Data Is It Anyway?

Leave a Reply