Kleene Object Naming Convention
Databases
If your Snowflake Data Warehouse was set up by Kleene you will have one database called PROD
. If your Redshift Data Warehouse was set up by Kleene you will have one database called DEV
. This will be your default database. It is only necessary to reference your database when specifying schemas and tables if you want to use a different database. Our convention is to use a single database.
Schemas
You can create schemas according to your own naming. Our convention at Kleene is to split each connector and each stage of the transformation process as follows:
- <source_name>_raw (landing raw tables directly from our connectors, e.g. salesforce_raw, hubspot_raw etc.)
- <source_name>_cleaned (cleaning indivudal raw source specific tables, e.g. salesforce_cleaned, hubspot_cleaned etc.)
- <source_name>_subentity (houses all subentities related to a given source, e.g. salesforce_subentity, hubspot_subentity etc. Contains surrogate key definition)
- mastering (cross-source entity resolution via surrogate key definitions provided at the subentity, e.g. mastering.customer_mastering)
- entity (cross-source entity modelling and attribute resolution using the mastering table as a base, e.g. entity.customer_entity)
- dw (domain and sub-domain modelling, e.g. dw.customer_domain)
- reporting (for specific pieces of analysis, e.g. out of the box cohort analysis, e.g. reporting.customer_cohort_analysis)
- utility (for test_log, billing and other utility tables, e.g. utility.unit_test_log)
Tables
- Ingested raw tables should always go into the <source_name>_raw schema and and should be suffixed with _raw
- Cleaned tables should always go into the <source_name>_cleaned schema and should be suffixed with _cleaned, this is where any source table level transformation or calculations are done.
- Subentity tables should always go into the <source_name>_subentity schema and should be suffixed with _subentity.
- Mastering tables should always go into the mastering schema and should be suffixed with _mastering, mastering tables are an essential where multiple sources need to be mastered.
- Entity tables should always go into the entity schema and should be suffixed with _entity, this relates to joining between cross-source subentity tables
- Domain tables should always go into the dw schema and should be suffixed with _domain, this relates to joining between entities
- Subdomain tables should always go into the dw schema and should be suffixed with _subdomain, this relates to grouping or pivoting on domains for ease of ingestion into a BI tool
- Reporting tables should always go into the reporting schema and should be named after what style of analysis they are (e.g. cohort_analysis).
Updated about 1 year ago