Subentity
After cleaning the tables, we create source specific entities containing the data we need from that data source at the desired level of granularity. Hence, this is the stage where any arrays from the cleaned step are flattened.
This is the beginning of true transformation of the data, whereas the previous steps were more about data cleansing. Multiple subentities can take data from the same cleaned table, but be on different levels of granularity. For instance, a transaction lines cleaned table may have a subentity that is still on the transaction line level, as well as another subentity that is aggregated to the transaction level.
Another example for the orders stream of data, a Shopify orders subentity will contain orders just from Shopify at the granularity of orders, so no order line item level of data is included here. Crossing granularities will primarily exist on the domain layer.
Below is an example of what a subentity may look like in a SQL-based transform:
If there exists arrays from the cleaned step, then they may need to be flattened in order to achieve the data nested inside. To do this in a Snowflake data warehouse, lateral flatten
can be used:
lateral flatten ()
- Link to Snowflake documentation.
- Adding
lateral
in front of theflatten
command will essentially treat the flattened data like a left join to the main query.
- Adding
- Link to Snowflake documentation.
All subentities should follow this naming convention for schemas and table names:
Schema: <source_name>_subentity
Table name: <source_endpoint>_subentity
Hence for example, shopify_subentity.shopify_orders_subentity.
Subentities are source specific entities. The next step would be to begin the master by moving data across sources, thereby removing the idea of "Shopify orders" or "HubSpot sales" for the concept of an entity containing just "orders".
Updated about 1 year ago