Valeriy’s Substack

Valeriy’s Substack

Share this post

Valeriy’s Substack
Valeriy’s Substack
📉 Why Z-Score Normalization Is the Unsung Hero of Time Series Transformers

📉 Why Z-Score Normalization Is the Unsung Hero of Time Series Transformers

ICML 2025 quietly confirms what practitioners already knew: normalization can make or break your model.

Valeriy Manokhin's avatar
Valeriy Manokhin
Jul 25, 2025
∙ Paid

Share this post

Valeriy’s Substack
Valeriy’s Substack
📉 Why Z-Score Normalization Is the Unsung Hero of Time Series Transformers
Share

In the rush to design ever more complex Transformer architectures for time series forecasting, most of the attention has gone to clever attention mechanisms, hierarchical tokenization schemes, and hybrid modules.

But the ICML 2025 paper “A Closer Look at Transformers for Time Series Forecasting” quietly reminds us of something far simpler — and far more important:

Z-score normalization is one of the most critical components for making Transformers work in time series forecasting.

The authors show that removing z-score normalization hurts model performance more than removing positional encodings, skip connections, or even full attention blocks in some settings.

Let’s unpack why this happens — and how z-score normalization fits into the Transformer pipeline for time series tasks.

Keep reading with a 7-day free trial

Subscribe to Valeriy’s Substack to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Valeriy Manokhin
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share