CatBoost is a fast, scalable, high performance open-source gradient boosting on decision trees library

Get started

New ways to explore your data

April 20, 2018

It’s time to release CatBoost v0.8. The aim of this release - efficient tools for data and model exploration.

First of all, CatBoost now calculates per object feature importances using SHAP values algorithm from the ‘Consistent feature attribution for tree ensembles’ paper. As you can see on the picture below it's very easy to understand what is the influence of each feature on a given object. See tutorial for more details.

Secondly, CatBoost now has a new algorithm for finding most influential training samples for a given object. This mode calculates the effect of objects from the train dataset on the optimized metric values for the objects from the input dataset:

- Positive values reflect that the optimized metric increases.

- Negative values reflect that the optimized metric decreases.

The higher the deviation from 0, the bigger the impact that an object has on the optimized metric. The method is an implementation of the approach described in the 'Finding Influential Training Samples for Gradient Boosted Decision Trees' paper. See get_object_importance model method in Python package and ostr mode in cli-version. Tutorial for Python is also available.

Third cool staff in 0.8 release is ’save model as code’ feature. For now you could save model as Python code with categorical features and as C++ code without categorical features (сategorical features support for C++ is coming soon). Use --model-format CPP,Python in cli-version and model.save_model(OUTPUT_PYTHON_MODEL_PATH, format="python") in Python.

To find out more details check out release notices on GitHub. As usual we are eager to see your feedback and contribution.

Latest News

0.10.x and 0.9.x releases review

CatBoost team continues to make a lot of improvements and speedups. What new and interesting have we added in our two latest releases and why is it worth to try CatBoost now? We'll discuss it in this post.

CatBoost on GPU talk at GTC 2018

Come and listen our talk about the fastest implementation of Gradient Boosting for GPU at the GTC 2018 Silicon Valley! GTC will take place on March 26-29 and will provide an excellent opportunity to get more details about CatBoost performance on GPU.

Best in class inference and a ton of speedups

New version of CatBoost has industry fastest inference implementation. It's 35 times faster than open-source alternatives and completely production ready. Furthermore 0.6 release contains a lot of speedups and improvements. Find more inside.

Contacts