Post 1: The Myth of i.i.d. — Why Time Series Break the Rules

Why your favorite ML assumptions don’t hold for time series data

Jul 11, 2025

Series: Forecasting Reality — Machine Learning in a Non-Stationary World (Part 1)

Machine learning has a dirty little secret.

Most of the models we know and love—from random forests to neural networks—are built on a foundational assumption that doesn’t hold in one of the most important domains of real-world data: time series forecasting.

That assumption?
The data is independent and identically distributed (i.i.d.).

In image classification, natural language processing, even basic regression tasks, this i.i.d. assumption often holds reasonably well. Each training example is treated as a standalone event drawn from the same underlying distribution.

But time series data laughs in the face of that idea.

📉 The Time Series Problem

Time series data points are not independent.
They are inherently dependent—today’s value depends on yesterday’s, which depends on the day before that, and so on.

They’re often not identically distributed either.
The underlying behavior of the system can change over time—a concept known as non-stationarity. This includes trends, seasonal effects, structural breaks, and sudden shocks.

For example:

Retail sales spike during holidays.
Stock prices respond to unpredictable macroeconomic events.
Sensor data drifts as hardware ages or environments change.

This means that the bedrock on which most statistical learning theory is built—especially Vapnik’s framework of empirical risk minimization—starts to crumble.

🧠 Why i.i.d. Matters (And Fails)

In Vapnik’s statistical learning theory, we’re promised certain things:

That our model will generalize from training to test data
That the empirical risk (training error) will converge to true risk (expected error)

But all these guarantees depend on one major condition:

Training samples must be drawn independently from the same distribution.

Time series forecasting violates both:

Dependence: Observations are temporally linked.
Non-Stationarity: The data-generating process may evolve or drift.

So what happens when we still want to use machine learning for forecasting? Can we bend the rules—or break them—and still get results?

🔍 A Preview of What’s Coming

In this series, we’ll dive into how different classes of machine learning models approach the messy reality of time series data.

Here’s a quick preview of what’s ahead:

Tree-based models (like Random Forest and XGBoost): great at pattern recognition, but assume independence and can’t extrapolate beyond the past.
RNNs and LSTMs: designed for sequential data, they capture temporal patterns but still struggle with regime shifts.
Transformers: the latest deep learning darlings, using attention mechanisms to model long-range dependencies—though they, too, face challenges when the distribution shifts over time.

And most importantly, we’ll see how researchers and practitioners deal with the fact that there are no theoretical free lunches in forecasting. We violate i.i.d. assumptions every time we model time series, so we must rely on smart design choices, clever tricks, and robust evaluation.

🧭 Why This Matters

If you work with time series—sales forecasts, energy demand, stock prices, web traffic—you’re navigating a landscape where the past may not look like the future. Your models need to be flexible, resilient, and aware of their limitations.

Understanding how and why the i.i.d. assumption fails in time series is the first step. Because once we drop that crutch, we can start building forecasting models that actually work in the real world.

Next up in Part 2:
We’ll explore how learning theory has tried to catch up—by relaxing i.i.d. assumptions, dealing with dependent data, and studying “concept drift.” The math gets messier, but the insights are worth it.

Enjoying the series? Subscribe for the rest of the journey as we demystify forecasting in the wild world of non-stationary data.

Valeriy’s Substack

Discussion about this post