Reference papers
- CatBoost: unbiased boosting with categorical features
-
Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, Andrey Gulin. NeurIPS, 2018
NeurIPS 2018 paper with explanation of Ordered boosting principles and ordered categorical features statistics.
- CatBoost: gradient boosting with categorical features support
-
Anna Veronika Dorogush, Vasily Ershov, Andrey Gulin. Workshop on ML Systems at NIPS 2017
A paper explaining the CatBoost working principles: how it handles categorical features, how it fights overfitting, how GPU training and fast formula applier are implemented.
- Minimal Variance Sampling in Stochastic Gradient Boosting
-
Bulat Ibragimov, Gleb Gusev. arXiv:1910.13204
A paper about Minimal Variance Sampling, which is the default sampling in CatBoost.
- Finding Influential Training Samples for Gradient Boosted Decision Trees
-
Boris Sharchilev, Yury Ustinovsky, Pavel Serdyukov, Maarten de Rijke. arXiv:1802.06640
A paper explaining several ways of extending the framework for finding influential training samples for a particular case of tree ensemble-based models to non-parametric GBDT ensembles under the assumption that tree structures remain fixed and introducing a general scheme of obtaining further approximations to this method that balance the trade-off between performance and computational complexity.
- A Unified Approach to Interpreting Model Predictions
-
Scott Lundberg, Su-In Lee. arXiv:1705.07874
A paper explaining a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations).
- Consistent feature attribution for tree ensembles
-
Scott M. Lundberg, Su-In Lee. arXiv:1706.06060
A paper explaining fast exact solutions for SHAP (SHapley Additive exPlanation) values, a unique additive feature attribution method based on conditional expectations that is both consistent and locally accurate.
- Winning The Transfer Learning Track of Yahoo!’s Learning To Rank Challenge with YetiRank
-
Andrey Gulin, Igor Kuralenok, Dimitry Pavlov. PMLR 14:63-76
The theory underlying the YetiRank and YetiRankPairwise modes in CatBoost.