After cleaning the tables, we create source specific entities containing the data we need from that data source at the desired level of granularity. Hence, this is the stage where any arrays from the cleaned step are flattened.

This is the beginning of true transformation of the data, whereas the previous steps were more about data cleansing. Multiple subentities can take data from the same cleaned table, but be on different levels of granularity. For instance, a transaction lines cleaned table may have a subentity that is still on the transaction line level, as well as another subentity that is aggregated to the transaction level.

Another example for the orders stream of data, a Shopify orders subentity will contain orders just from Shopify at the granularity of orders, so no order line item level of data is included here. Crossing granularities will primarily exist on the domain layer.

Below is an example of what a subentity may look like in a SQL-based transform:


A subentity for the orders endpoint coming from a Shopify data source. Lines 12-85 have been omitted since they are just pointing at the same fields that exist in the cleaned table, hence no real transformation is operating on those lines.

If there exists arrays from the cleaned step, then they may need to be flattened in order to achieve the data nested inside. To do this in a Snowflake data warehouse, lateral flatten can be used:

  • lateral flatten ()
    • Link to Snowflake documentation.
      • Adding lateral in front of the flatten command will essentially treat the flattened data like a left join to the main query.

All subentities should follow this naming convention for schemas and table names:
Schema: <source_name>_subentity
Table name: <source_endpoint>_subentity

Hence for example, shopify_subentity.shopify_orders_subentity.

Subentities are source specific entities. The next step would be to begin the master by moving data across sources, thereby removing the idea of "Shopify orders" or "HubSpot sales" for the concept of an entity containing just "orders".

What’s Next