Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, Andrey Gulin. NeurIPS, 2018
NeurIPS 2018 paper with explanation of Ordered boosting principles and ordered categorical features statistics.
Anna Veronika Dorogush, Vasily Ershov, Andrey Gulin. Workshop on ML Systems at NIPS 2017
A paper explaining the CatBoost working principles: how it handles categorical features, how it fights overfitting, how GPU training and fast formula applier are implemented.
Bulat Ibragimov, Gleb Gusev. arXiv:1910.13204
A paper about Minimal Variance Sampling, which is the default sampling in CatBoost.
Boris Sharchilev, Yury Ustinovsky, Pavel Serdyukov, Maarten de Rijke. arXiv:1802.06640
A paper explaining several ways of extending the framework for finding influential training samples for a particular case of tree ensemble-based models to non-parametric GBDT ensembles under the assumption that tree structures remain fixed and introducing a general scheme of obtaining further approximations to this method that balance the trade-off between performance and computational complexity.
Scott Lundberg, Su-In Lee. arXiv:1705.07874
A paper explaining a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations).
Scott M. Lundberg, Su-In Lee. arXiv:1706.06060
A paper explaining fast exact solutions for SHAP (SHapley Additive exPlanation) values, a unique additive feature attribution method based on conditional expectations that is both consistent and locally accurate.
Andrey Gulin, Igor Kuralenok, Dimitry Pavlov. PMLR 14:63-76
The theory underlying the YetiRank and YetiRankPairwise modes in CatBoost.
Ivan Lyzhin, Aleksei Ustimenko, Andrey Gulin, Liudmila Prokhorenkova. arXiv:2204.01500
A paper comparing previously introduced LambdaMART, YetiRank and StochasticRank and proposing an improvement to the YetiRank approach to allow for optimizing specific ranking loss functions.
Aleksei Ustimenko, Artem Beliakov, Liudmila Prokhorenkova. arXiv:2206.05608
This paper shows that gradient boosting based on symmetric decision trees can be equivalently reformulated as a kernel method that converges to the solution of a certain Kernel Ridge Regression problem. Thus, authors obtain the convergence to a Gaussian Process' posterior mean, which, in turn, allows them to easily transform gradient boosting into a sampler from the posterior to provide better knowledge uncertainty estimates through Monte-Carlo estimation of the posterior variance. It is shown that the proposed sampler allows for better knowledge uncertainty estimates leading to improved out-of-domain detection.