Kleene Object Naming Convention

Databases

If your Snowflake Data Warehouse was set up by Kleene you will have one database called PROD. If your Redshift Data Warehouse was set up by Kleene you will have one database called DEV. This will be your default database. It is only necessary to reference your database when specifying schemas and tables if you want to use a different database. Our convention is to use a single database.

Schemas

You can create schemas according to your own naming. Our convention at Kleene is to split each connector and each stage of the transformation process as follows:

  • <source_name>_raw (landing raw tables directly from our connectors, e.g. salesforce_raw, hubspot_raw etc.)
  • <source_name>_cleaned (cleaning indivudal raw source specific tables, e.g. salesforce_cleaned, hubspot_cleaned etc.)
  • <source_name>_subentity (houses all subentities related to a given source, e.g. salesforce_subentity, hubspot_subentity etc. Contains surrogate key definition)
  • mastering (cross-source entity resolution via surrogate key definitions provided at the subentity, e.g. mastering.customer_mastering)
  • entity (cross-source entity modelling and attribute resolution using the mastering table as a base, e.g. entity.customer_entity)
  • dw (domain and sub-domain modelling, e.g. dw.customer_domain)
  • reporting (for specific pieces of analysis, e.g. out of the box cohort analysis, e.g. reporting.customer_cohort_analysis)
  • utility (for test_log, billing  and other utility tables, e.g. utility.unit_test_log)

Tables

  • Ingested raw tables should always go into the <source_name>_raw schema and and should be suffixed with _raw
  • Cleaned tables should always go into the <source_name>_cleaned schema and should be suffixed with _cleaned, this is where any source table level transformation or calculations are done.
  • Subentity tables should always go into the <source_name>_subentity schema and should be suffixed with _subentity.
  • Mastering tables should always go into the mastering schema and should be suffixed with _mastering, mastering tables are an essential where multiple sources need to be mastered.
  • Entity tables should always go into the entity schema and should be suffixed with _entity, this relates to joining between cross-source subentity tables
  • Domain tables should always go into the dw schema and should be suffixed with _domain, this relates to joining between entities
  • Subdomain tables should always go into the dw schema and should be suffixed with _subdomain, this relates to grouping or pivoting on domains for ease of ingestion into a BI tool
  • Reporting tables should always go into the reporting schema and should be named after what style of analysis they are (e.g. cohort_analysis).