"Data! Data! Data! I can't make bricks without clay."
— Arthur Conan Doyle, The Adventure of the Copper Beeches
In the previous chapter, we selected our initiative. We have a goal, a sponsor, and a plan. Now, we need the raw material.
Just as a chef cannot cook without ingredients, an analytics team cannot model without data. But rarely is this data sitting on a silver platter, ready to be served. It is scattered across the organization in legacy systems, hidden in external APIs, or trapped in the heads of sales representatives.
This chapter is about the "Hunt." We will explore where data lives, why the systems that run your business are often terrible for analyzing your business, and how to sniff out "spoiled" ingredients before they ruin the meal.
Where is the Data?
When a manager asks, "Do we have data on X?", the answer is almost always "Yes, but..."
Data exists in two primary ecosystems: Internal and External, but not all internal data are the same.
Internal Data: The Operational Backbone
This is data you own that is generated by the heartbeat of your daily operations, also known as data-generating source systems. These systems are designed to capture transactions, not to answer questions, but they are the bedrock of your analytics strategy.
- CRM Systems (e.g., Salesforce, HubSpot): When a new lead clicks the "Learn More" or "Sign up for a free trial" button on your website, this is where their data lands.
- Purpose: Tracking customer interactions, sales funnels, and support tickets .
- Billing & ERP Systems (e.g., SAP, QuickBooks): When a customer makes a purchase—either by generating a purchase order they eventually pay, or by swiping a credit card in a physical store—that financial reality moves through these systems.
- Purpose: Recording the "financial truth"—invoices sent, payments received, and inventory levels .
- HR Systems (e.g., Workday, BambooHR): When a new hire sends in their application, accepts an offer letter, and goes through performance reviews, their digital footprint is stored here.
- Purpose: Storing organizational structures, tenure, and compensation data .
- Web & Audit Logs: When a user navigates your portal, or when an administrator logs into a secure part of the ERP to update sensitive records (like social security numbers), the system creates a granular audit trail.
- Purpose: Tracking every click, scroll, hover, and access event on your digital properties.
- Specialized Operational Data: Depending on your specific industry, this could include fleet management telemetry, sensor data from manufacturing facilities, air quality readings, or temperature logs.
Internal Data: Parked and Ready to Go?
While source systems generate the data, they are rarely where you should analyze it. You cannot run a complex trend report on your live billing system without risking a slowdown for every customer trying to pay an invoice.
Therefore, we strive to move data into a state we call "Parked and Ready to Go." This state exists on a spectrum of maturity, ranging from simple static archives to fully engineered warehouses.
The "Cold" Park: Static Data Dumps and Archives
The most basic form of "parking" data is the Static Data Dump. This often happens automatically within operational systems to preserve performance.
- The Mechanism: To keep the live CRM or ERP snappy, IT administrators will often "archive" older records. They might take all financial transactions from three years ago, export them into a massive CSV or SQL file, and store them on a cheap server or cloud bucket (like Amazon S3 or Azure Blob).