Shift left
Transform logic early, avoid duplication: The power of "shifting left" in data pipelines
When building data pipelines, you'll often need to transform raw data into more useful formats - like converting timestamps to dates, splitting combined fields, or calculating derived values. The concept of "shifting left" encourages performing these transformations as early as possible in your data flow.
The term "shift left" comes from visualizing data pipelines flowing from left to right (like in Kleene's pipelines view). Raw data enters on the left, transformations happen in the middle, and final consumption happens on the right. "Shifting left" means moving transformation logic earlier (leftward) in this flow.
For software developers, this connects directly to the principle of DRY (Don't Repeat Yourself). Just as you wouldn't want to duplicate the same validation logic across multiple functions in your code, you shouldn't repeat the same data transformations across multiple queries.
Consider a typical scenario: You have a table of raw sales data, and multiple teams need different views of this data.
Without shifting left, you might write something like:
-- Marketing dashboard
SELECT
DATE(timestamp) as sale_date,
UPPER(product_name) as product,
amount * 0.1 as commission
FROM raw_sales
-- Finance report
SELECT
DATE(timestamp) as sale_date,
UPPER(product_name) as product,
amount * 0.1 as commission
FROM raw_sales;
Notice how we're repeating the same transformations in multiple places? This is where shifting left helps. Instead of transforming data at the point of use, we transform it once, early in the pipeline:
-- Create a transformed view immediately after ingestion
CREATE VIEW cleaned_sales AS
SELECT
DATE(timestamp) as sale_date,
UPPER(product_name) as product,
amount * 0.1 as commission
FROM raw_sales;
Now downstream users just need:
SELECT * FROM cleaned_sales;
The benefits of shifting left include:
- Consistency: When transformations live in one place, there's no risk of different teams implementing the same logic differently.
- Maintenance: Need to change how commission is calculated? Update it in one place instead of hunting through every query.
- Performance: The database can optimize and cache these transformed values once, rather than recalculating them for every query.
- Discoverability: New team members can easily find and understand core business logic in central transformation layers rather than having to hunt through scattered queries.
Remember: The goal isn't to transform everything immediately - some transformations might be specific to particular use cases. The key is to identify common, reusable transformations and shift those left in your pipeline.
Think of it like mise en place in cooking - doing your prep work up front makes everything that follows smoother and more reliable.
Updated about 1 month ago