CatBoost

class CatBoost(params=None)


Purpose

Training and applying models.

Parameters

params

Description

The list of parameters to start training with.

If omitted, default values are used.

Note

Some parameters duplicate the ones specified for the fit method. In these cases the values specified for the fit method take precedence.

Possible types: dict

Default value

None

Attributes

tree_count_

Return the number of trees in the model.

This number can differ from the value specified in the --iterations training parameter in the following cases:

• The training is stopped by the overfitting detector.
• The --use-best-model training parameter is set to True.

feature_importances_

Return the calculated feature importances. The output data depends on the type of the model's loss function:

random_seed_

The random seed used for training.

learning_rate_

The learning rate used for training.

feature_names_

The names of features in the dataset.

evals_result_

Return the values of metrics calculated during the training.

best_score_

Return the best result for each metric calculated on each validation dataset.

best_iteration_

Return the identifier of the iteration with the best result of the evaluation metric or loss function on the last validation set.

classes_

Return the names of classes for classification models. An empty list is returned for all other models.

The order of classes in this list corresponds to the order of classes in resulting predictions.

Methods

Train a model.

predict

Apply the model to the given dataset.

calc_feature_statistics

Calculate and plot a set of statistics for the chosen feature.

calc_leaf_indexes

Returns indexes of leafs to which objects from pool are mapped by model trees.

compare

Draw train and evaluation metrics in Jupyter Notebook for two trained models.

copy

Copy the CatBoost object.

eval_metrics

Calculate the specified metrics for the specified dataset.

get_all_params

Return the values of all training parameters (including the ones that are not explicitly specified by users).

get_best_iteration

Return the identifier of the iteration with the best result of the evaluation metric or loss function on the last validation set.

get_best_score

Return the best result for each metric calculated on each validation dataset.

get_borders

Return the list of borders for numerical features.

get_evals_result

Return the values of metrics calculated during the training.

get_feature_importance

Calculate and return the feature importances.

Return a proxy object with metadata from the model's internal key-value string storage.

get_object_importance

Calculate the effect of objects from the train dataset on the optimized metric values for the objects from the input dataset:

• Positive values reflect that the optimized metric increases.
• Negative values reflect that the optimized metric decreases.

get_param

Return the value of the given parameter if it is explicitly by the user before starting the training. If this parameter is used with the default value, this function returns None.

get_params

Return the values of training parameters that are explicitly specified by the user. If all parameters are used with their default values, this function returns an empty dict.

get_scale_and_bias

Return the scale and bias of the model.

These values affect the results of applying the model, since the model prediction results are calculated as follows:
$\sum leaf\_values \cdot scale + bias$

get_test_eval

Return the formula values that were calculated for the objects from the validation dataset provided for training.

A simple grid search over specified parameter values for a model.

Load the model from a file.

plot_predictions

Sequentially vary the value of the specified features to put them into all buckets and calculate predictions for the input objects accordingly.

plot_tree

Visualize the CatBoost decision trees.

A simple randomized search on hyperparameters.

save_borders

Save the model borders to a file.

select_features

Select the best features from the dataset using the Recursive Feature Elimination algorithm.

set_feature_names

Set names for all features in the model.

set_params

Set the training parameters.

set_scale_and_bias

Set the scale and bias.

shrink

Shrink the model. Only trees with indices from the range [ntree_start, ntree_end) are kept.

staged_predict

Apply the model to the given dataset and calculate the results taking into consideration only the trees in the range [0; i).