"Data! Data! Data! I can't make bricks without clay."

— Arthur Conan Doyle, The Adventure of the Copper Beeches

In the previous chapter, we selected our initiative. We have a goal, a sponsor, and a plan. Now, we need the raw material.

Just as a chef cannot cook without ingredients, an analytics team cannot model without data. But rarely is this data sitting on a silver platter, ready to be served. It is scattered across the organization in legacy systems, hidden in external APIs, or trapped in the heads of sales representatives.

This chapter is about the "Hunt." We will explore where data lives, why the systems that run your business are often terrible for analyzing your business, and how to sniff out "spoiled" ingredients before they ruin the meal.

Where is the Data?

When a manager asks, "Do we have data on X?", the answer is almost always "Yes, but..."

Data exists in two primary ecosystems: Internal and External, but not all internal data are the same.

Internal Data: The Operational Backbone

This is data you own that is generated by the heartbeat of your daily operations, also known as data-generating source systems. These systems are designed to capture transactions, not to answer questions, but they are the bedrock of your analytics strategy.

Internal Data: Parked and Ready to Go?

While source systems generate the data, they are rarely where you should analyze it. You cannot run a complex trend report on your live billing system without risking a slowdown for every customer trying to pay an invoice.

Therefore, we strive to move data into a state we call "Parked and Ready to Go." This state exists on a spectrum of maturity, ranging from simple static archives to fully engineered warehouses.

The "Cold" Park: Static Data Dumps and Archives

The most basic form of "parking" data is the Static Data Dump. This often happens automatically within operational systems to preserve performance.