Stream and Real-Time Processing. Making Decisions at Speed

A World in Motion: The Need for Real-Time Insights

In our journey so far, we have largely dealt with data at rest—files stored in cloud buckets, records organized in databases, and large datasets optimized for analysis. This is the world of batch processing, where we typically store data first and process it later. But what about the data that’s being created right now? Every time you see a 'live' sports score update on your phone, get a fraud alert from your bank, or see your social media feed refresh, you are interacting with an event stream processing system.

Consider the complex, real-time ecosystem of a ride-hailing service. At any given moment, thousands of drivers and riders are interacting with the app, generating a constant flow of data: location updates, ride requests, traffic conditions, and payment transactions. To successfully match a rider with a nearby driver within seconds, the system can't wait for a nightly report. It must ingest, analyze, and act on this information as it happens. Decisions about surge pricing, rerouting drivers to high-demand areas, or simply notifying a user that "Your driver is arriving now" must be made in near real-time.

This is the domain of event stream processing. It’s a fundamental shift from asking questions about what happened to asking questions about what is happening. This chapter explores the concepts and architectures that allow businesses to harness the power of live data streams, enabling them to make faster, more intelligent decisions.

What is an Event? The Building Block of Real-Time Data

At its core, an event is simply an observable occurrence at a particular point in time. 2 It’s a record of something that happened. The definition is intentionally broad because events are everywhere in the digital and physical world:

Businesses run on events. Each instance of a recurring activity—every purchase, every website visit, every hotel check-in—is a distinct event. The data generated from these events is incredibly valuable, as it provides a granular, up-to-the-minute view of business operations.

The Continuous Event Stream

Events rarely happen in isolation. They occur over time, forming a continuous sequence known as an event stream. Imagine a timeline where each new event is appended to the end. The first event is the oldest, and new ones are constantly added as they happen. This stream is a time-ordered, immutable record of everything that has occurred.

This concept becomes more complex when events of the same type originate from different sources. For example, a user might visit your website through a Google search, a direct link, or a marketing email. While all of these are "website visit" events, understanding their origin is crucial for analysis. Similarly, a retail business might have events from website traffic, social media interactions, and physical in-store foot traffic. How can an organization make sense of all these disparate streams?

Taming the Chaos: The Unified Log