⏱ 4 min read
Defining a schema for a given entity we tend to define what attributes this entity has, what types or invariants each of those attributes have and what attributes are required or optional.
In this post I am exposing that the shape of the data (attributes and their types) need to be discussed separately from their optionality. The optionality does not belong to the entity but to the operation being executed. If we discuss and define both concepts together as part of the Entity we are complecting two concepts that should we separate them, the simplicity, clarity and reusability of our system will be greatly improved.
Apart from my own experience some of the concepts described in this post are also described by well-known engineers like Rich Hickey in Maybe Not available in my recommended resources
I will use a Blog Post as the example for the model but the concepts in this post can be applied to any entity and discussion we can think of.
We need to define the data model of a Post that will be used in a blog (any blog). After some meetings we agree that we need the following fields:
- ID: uuid that will identiy the post uniquely
- Title: short text to be shown in links and use as slug
- Body: text where the post will be written to
- Tags: list of keywords to find similar posts
- PublishedAt: datetime when the post was published
Once we agree on the shape of the Post entity, we start a new discussion:
What fields are required and what are optional?
It seems like all are required since we would need to show all of them in the blog. So we move on and decide that we will make all columns NOT NULL in the DB.
After some time discussing, a junior developer asks:
What about drafts?
Another discussion starts about "What is a draft?". Should a draft have the same shape as the Post but with "nullable" fields? what are the required fields in this case?
After some time discussing the team decided to have 2 very similar tables one with drafts where columns (except id) are NULLABLE and posts where all columns are required.
There are two entities that somehow are too closely related. This lead to some problems:
⛔ Entity proliferation. We have all been in projects with too many entities that seem to describe almost the same thing.
⛔ Change management: New fields in Posts will require new fields in Drafts almost always.
⛔ Event ambiguity: If we use EDA or similar, are Drafts / Posts events sent to the same topic? are they really different? how can we see the history of a Post if it starts as a Draft?
✅ DB leverage: we can use DB schemas to control whether the data being stored is correct or not.
Defining the shape of the entity is something we cannot avoid (or if we can we should not). Having a clear description of what an Entity is, and what parts form it is desirable specially if the names and types of the fields are defined by the Domain experts.
The problem started when we discussed about the optionality of the fields. When we started discussion:
What fields are required and what are optional?
The answer to this question should have been another question:
When?
(as engineers we should askwhen
more often than we do)
What are the operations we can do for/on Posts? Let's say that we can do 3 operations with Posts:
Save the post
Publish the post
Unpublish the post
Delete the post
Rules are:
We cannot edit a post if it is published
We cannot delete a post if it is published
Optionality has to do with the operation we want to perform on/with a given Envity. For example, asking:
What are the required fields when publishing a post?
The conclusion would be very similar to the previous discusion.
But if we discussed about what are the required field when saving a Post then the answer will be something around: None.
We can let users save the posts in whatever state they are (missing tags, missing body, etc)
There is only one entity that goes through different states until it is published. The benefits are:
✅ Entity proliferation: so far under control.
✅ Change management: there is only one entity we need to think about.
✅ Event unambiguity: all events belong to the same entity so we have one topic and one history line.
⛔ DB leverage: DB Schema cannot be leveraged to control required fields/columns. The control is in the application layer.
If we changed the rules and we allowed to edit a post while it is published when discussing the optionality during the save
operation
we would need to ask When
again.
When the post is already published the requirements will be different than when the post is not published yet (or was unpublished).