CatBoost vs XGBoost: Busting the Kaggle Fairytale

Aug 15, 2025

For years, the lore of Kaggle leaderboards has painted XGBoost as the untouchable champion of tabular data. LightGBM was the nimble challenger. CatBoost? Barely a footnote in the leaderboard fairy tales.

But real, peer-reviewed science tells a different story. And it’s a story with a decisive plot twist: CatBoost isn’t just catching up—it’s pulling ahead.

📊 The Science, Not the Story

Forget anecdotal “it worked for me” claims. These are large-scale, methodologically rigorous benchmarks. The results are hard to ignore.

1. TabArena — The Living Benchmark for Tabular ML

A continuously updated, rigorously designed benchmark spanning diverse real-world datasets.
Key Finding:
CatBoost outperformed XGBoost by over 20% in direct, like-for-like comparisons—ranking consistently at the top for accuracy, AUC, and F1.
The gains were most pronounced on mixed-type and categorical-heavy datasets, where CatBoost’s native categorical handling and ordered boosting architecture delivered wins that XGBoost simply couldn’t match.

2. A Closer Look at Deep Learning on Tabular Data (300+ Datasets)

One of the largest empirical evaluations ever run for tabular ML: over 300 datasets, 32 algorithms compared.
CatBoost repeatedly ranked in the top tier, especially on heterogeneous and categorical-rich data.
XGBoost? Often trailing—sometimes narrowly, sometimes decisively—but rarely overtaking CatBoost’s balance of speed, accuracy, and robustness.

3. When Do Neural Nets Outperform Boosted Trees? (2023)

An extensive study across diverse tabular benchmarks found:
CatBoost outperformed XGBoost by ~6% in accuracy on average, even when both were tuned.
That’s not a rounding error—that’s the kind of gap that decides production wins and Kaggle gold medals.

4. A Comprehensive Benchmark of Machine & Deep Learning Across Tabular Data (2024)

The verdict was clear: CatBoost dominated across both classification and regression tasks, topping the charts not just for accuracy but also for stability across dataset types. XGBoost couldn’t match the consistency.

⚡ Why CatBoost Wins in the Real World

AdvantageWhy It MattersNative Categorical HandlingEliminates tedious preprocessing, preserves signal.Ordered BoostingPrevents target leakage, reduces overfitting.Oblivious TreesSuper-fast inference (2×–15× faster in many cases).Consistency Across Data TypesRobust results on messy, mixed-type datasets.

🏁 Bottom Line

The age of XGBoost dominance was always more leaderboard legend than scientific fact.
Today, the numbers are in: CatBoost consistently outperforms XGBoost—sometimes by over 20%—and it does so with cleaner pipelines, faster inference, and better handling of the categorical reality of most business data.

If you’ve been building with XGBoost out of habit, it’s time to test CatBoost on your own workloads. You might just find your new default.

📚 Want to Go Deeper?

If this post got you curious, you’ll love my upcoming book Mastering CatBoost: The Hidden Gem of Tabular AI.

It’s a practical, research-backed deep dive into CatBoost’s architecture, tuning strategies, and production deployment—packed with case studies, benchmarks, and code examples drawn from real-world projects.

Whether you’re a Kaggle competitor, a data scientist in industry, or an ML engineer looking to squeeze every last drop of performance from tabular data, this book will show you exactly how to make CatBoost your secret weapon.

References:

Awesome CatBoost
TabArena: A Living Benchmark for Machine Learning on Tabular Data (arxiv)
A Closer Look at Deep Learning on Tabular Data (arxiv)
When Do Neural Nets Outperform Boosted Trees on Tabular Data? (2023) (arxiv)
A Comprehensive Benchmark of Machine and Deep Learning Across Diverse Tabular Datasets (2024)

Valeriy’s Substack

Discussion about this post