Reference papers
CatBoost: unbiased boosting with categorical features
Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, Andrey Gulin. NeurIPS, 2018
NeurIPS 2018 paper with explanation of Ordered boosting principles and ordered categorical features statistics.
CatBoost: gradient boosting with categorical features support
Anna Veronika Dorogush, Vasily Ershov, Andrey Gulin. Workshop on ML Systems at NIPS 2017
A paper explaining the CatBoost working principles: how it handles categorical features, how it fights overfitting, how GPU training and fast formula applier are implemented.
Minimal Variance Sampling in Stochastic Gradient Boosting
Bulat Ibragimov, Gleb Gusev. arXiv:1910.13204
A paper about Minimal Variance Sampling, which is the default sampling in CatBoost.
Finding Influential Training Samples for Gradient Boosted Decision Trees
Boris Sharchilev, Yury Ustinovsky, Pavel Serdyukov, Maarten de Rijke. arXiv:1802.06640
A paper explaining several ways of extending the framework for finding influential training samples for a particular case of tree ensemble-based models to non-parametric GBDT ensembles under the assumption that tree structures remain fixed and introducing a general scheme of obtaining further approximations to this method that balance the trade-off between performance and computational complexity.
A Unified Approach to Interpreting Model Predictions
Scott Lundberg, Su-In Lee. arXiv:1705.07874
A paper explaining a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations).
Consistent feature attribution for tree ensembles
Scott M. Lundberg, Su-In Lee. arXiv:1706.06060
A paper explaining fast exact solutions for SHAP (SHapley Additive exPlanation) values, a unique additive feature attribution method based on conditional expectations that is both consistent and locally accurate.
Winning The Transfer Learning Track of Yahoo!’s Learning To Rank Challenge with YetiRank
Andrey Gulin, Igor Kuralenok, Dimitry Pavlov. PMLR 14:63-76
The theory underlying the YetiRank and YetiRankPairwise modes in CatBoost.