Create Data Unit Tests
Introduction
Data Unit Tests are blocks of SQL code that run at some point in a Transform (usually at the end), creating records in another table, which indicates if the test passed or failed. Lots of types of test are available, Kleene currently has three types built in - Null in field, Uniqueness and Freshness.
They are useful to reliably monitor the health of data in the pipeline.
The Add unit tests option only appears above a Transform that creates or replaces a table once it has been saved.
Create a Data Unit Test
Click the Add unit tests in the SQL Console menu to open the window, where three tests are available to add to the current transform.
Create the Unit Test log table
Before you create any tests, you'll need to have somewhere to store the results. You can do this the first time the UNIT TEST button is used, by selecting CREATE REQUIRED PROCEDURES, this will generate the stored procedures that are used as unit tests and a table in your warehouse called UNIT_TEST_LOG to store the result. This table can then be queried to check the results of the tests.
If you get an error saying the object already exists but you do not have access please ask the procedure creator or ACCOUNTADMINs/superusers of your warehouse to grant usage permissions to your warehouse role:
Snowflake:
GRANT USAGE PRIVILEGES ON ALL PROCEDURES IN SCHEMA PROD.KLEENE TO ROLE <warehouse role name>;
Redshift:
GRANT EXECUTE PRIVILEGES ON ALL PROCEDURES IN SCHEMA KLEENE TO ROLE <warehouse role name>;
Null in field
Checks none of the entires in a specified column have a NULL entry.
- Select the table to test, then
- Select the column to check and
- Click ADD UNIT TEST.
Repeat this process for as many columns as needed.
Uniqueness
Checks the distinct number of entries in a specified field equals the table row count
- Select the table to test, then
- Select the column to check and
- Click ADD UNIT TEST.
Repeat this process for as many columns as needed.
Freshness
Checks the maximum entry of a specified DATETIME or TIMESTAMP field is within a specified time interval
- Select the table to test, then
- Select the column to check (should be DATE/TIME), then
- Select the period and time frame eg 3 hours
- Click ADD UNIT TEST.
Repeat this process for as many columns or time periods as needed.
SQL Code
When you close the window, you'll notice one or more lines of SQL have been added to the end of the transform.
If you need to delete a test, simple remove the relevant line of SQL.
Updated 9 months ago