Start Where You Are

In the previous labs, we defined our strategy and identified the necessary tables. Now, we move to the "engine room" of business analytics: Extract, Transform, and Load (ETL). Raw data is rarely ready for a dashboard because it is optimized for recording transactions, not for high-level analysis.

To make this data useful, we must filter out superfluous information—technical clutter like last_update timestamps or internal address_id keys—and focus on the "signal" that drives business decisions.

The Medallion Architecture

Modern analytics often uses a "Medallion" framework to refine data through three distinct stages:

Technical Spotlight: Choosing Your Join

When moving from Bronze to Silver, the type of "Join" you choose determines which rows are included in your final analysis. Think of these as different ways to handle missing or unmatched data.

Evolution of a "Row Volumes"

A key part of ETL is understanding how the "grain" (what one row represents) changes as it moves toward the Gold layer. I like to always emphasize the importance of being absolutely clear about what each row of your represents from each table you are reading it from.

<aside> 💡

Pro-Tip: The Doctrine of Volumetric Integrity

As you build your ETL, you must track your row counts at every single step. This practice, which I call the Doctrine of Volumetric Integrity, is your primary defense against "garbage in, garbage out."

Imagine you have 100 films, 500 customers, and 4,000 rentals.

If you enter a meeting and your numbers differ from a colleague’s, your ability to demonstrate Volumetric Integrity—showing exactly how and why your row counts changed at each stage—is what proves your analysis is the "source of truth."

</aside>

1. The Customer History Table