Byte-Pair Encoding Is Compression in Disguise

Dec 16, 2025

∙ Paid

Tokenization is usually introduced as a preprocessing trick.
In reality, it is one of the strongest inductive biases we impose on language models.

Byte-Pair Encoding (BPE) is a particularly revealing case.
What looks like a simple greedy algorithm turns out to sit at the intersection of information theory, Kolmogorov complexity, and Minimum Description Length.

This post explains why BPE works—not operationally, but conceptually.

Continue reading this post for free, courtesy of Valeriy Manokhin.

Or purchase a paid subscription.

Valeriy’s Substack

Byte-Pair Encoding Is Compression in Disguise

Continue reading this post for free, courtesy of Valeriy Manokhin.