CatBoostRegressor

class CatBoostRegressor(iterations=None,
                        learning_rate=None,
                        depth=None,
                        l2_leaf_reg=None,
                        model_size_reg=None,
                        rsm=None,
                        loss_function='RMSE',
                        border_count=None,
                        feature_border_type=None,
                        input_borders=None,
                        output_borders=None,
                        fold_permutation_block=None,
                        od_pval=None,
                        od_wait=None,
                        od_type=None,
                        nan_mode=None,
                        counter_calc_method=None,
                        leaf_estimation_iterations=None,
                        leaf_estimation_method=None,
                        thread_count=None,
                        random_seed=None,
                        use_best_model=None,
                        best_model_min_trees=None,
                        verbose=None,
                        silent=None,
                        logging_level=None,
                        metric_period=None,
                        ctr_leaf_count_limit=None,
                        store_all_simple_ctr=None,
                        max_ctr_complexity=None,
                        has_time=None,
                        allow_const_label=None,
                        one_hot_max_size=None,
                        random_strength=None,
                        name=None,
                        ignored_features=None,
                        train_dir=None,
                        custom_metric=None,
                        eval_metric=None,
                        bagging_temperature=None,
                        save_snapshot=None,
                        snapshot_file=None,
                        snapshot_interval=None,
                        fold_len_multiplier=None,
                        used_ram_limit=None,
                        gpu_ram_part=None,
                        pinned_memory_size=None,
                        allow_writing_files=None,
                        final_ctr_computation_mode=None,
                        approx_on_full_history=None,
                        boosting_type=None,
                        simple_ctr=None,
                        combinations_ctr=None,
                        per_feature_ctr=None,
                        ctr_target_border_count=None,
                        task_type=None,
                        device_config=None,                        
                        devices=None,
                        bootstrap_type=None,
                        subsample=None,                        
                        sampling_unit=None,
                        dev_score_calc_obj_block_size=None,
                        max_depth=None,
                        n_estimators=None,
                        num_boost_round=None,
                        num_trees=None,
                        colsample_bylevel=None,
                        random_state=None,
                        reg_lambda=None,
                        objective=None,
                        eta=None,
                        max_bin=None,
                        gpu_cat_features_storage=None,
                        data_partition=None,
                        metadata=None,
                        early_stopping_rounds=None,
                        cat_features=None,
                        grow_policy=None,
                        min_data_in_leaf=None,
                        max_leaves=None,
                        score_function=None,
                        leaf_estimation_backtracking=None)

Purpose

Training and applying models for the regression problems. When using the applying methods only the predicted class is returned. Provides compatibility with the scikit-learn tools.

Parameters

Parameter Description Default value
metadata The key-value string pairs to store in the model's metadata storage after the training. None
cat_features

A one-dimensional array of categorical columns indices (specified as integers) or names (specified as strings).

This array can contain both indices and names for different elements.

If any features in the cat_features parameter are specified as names instead of indices, feature names must be provided for the training dataset. Therefore, the type of the X parameter in the future calls of the fit function must be either catboost.Pool with defined feature names data or pandas.DataFrame with defined column names.

Note.
  • If this parameter is not None and the training dataset passed as the value of the X parameter to the fit function of this class has the catboost.Pool type, CatBoost checks the equivalence of the categorical features indices specification in this object and the one in the catboost.Pool object.

  • If this parameter is not None, passing objects of the catboost.FeaturesData type as the X parameter to the fit function of this class is prohibited.
None (all features are considered numerical)
Parameter Description Default value
metadata The key-value string pairs to store in the model's metadata storage after the training. None
cat_features

A one-dimensional array of categorical columns indices (specified as integers) or names (specified as strings).

This array can contain both indices and names for different elements.

If any features in the cat_features parameter are specified as names instead of indices, feature names must be provided for the training dataset. Therefore, the type of the X parameter in the future calls of the fit function must be either catboost.Pool with defined feature names data or pandas.DataFrame with defined column names.

Note.
  • If this parameter is not None and the training dataset passed as the value of the X parameter to the fit function of this class has the catboost.Pool type, CatBoost checks the equivalence of the categorical features indices specification in this object and the one in the catboost.Pool object.

  • If this parameter is not None, passing objects of the catboost.FeaturesData type as the X parameter to the fit function of this class is prohibited.
None (all features are considered numerical)

See Python package training parameters for the full list of parameters.

Note. Some parameters duplicate the ones specified for the fit method. In these cases the values specified for the fit method take precedence.

Attributes

Attribute Description
tree_count_

Return the number of trees in the model.

feature_importances_
Return the calculated feature importances. The output data depends on the type of the model's loss function:
random_seed_

The random seed used for training.

learning_rate_

The learning rate used for training.

feature_names_

The names of features in the dataset.

evals_result_

Return the values of metrics calculated during the training.

best_score_

Return the best result for each metric calculated on each validation dataset.

best_iteration_

Return the identifier of the iteration with the best result of the evaluation metric or loss function on the last validation set.

Attribute Description
tree_count_

Return the number of trees in the model.

feature_importances_
Return the calculated feature importances. The output data depends on the type of the model's loss function:
random_seed_

The random seed used for training.

learning_rate_

The learning rate used for training.

feature_names_

The names of features in the dataset.

evals_result_

Return the values of metrics calculated during the training.

best_score_

Return the best result for each metric calculated on each validation dataset.

best_iteration_

Return the identifier of the iteration with the best result of the evaluation metric or loss function on the last validation set.

Methods

Method Description
fit

Train a model.

predict

Apply the model to the given dataset.

calc_feature_statistics

Calculate and plot a set of statistics for the chosen feature.

copy

Copy the CatBoost object.

compare

Draw train and evaluation metrics in Jupyter Notebook for two trained models.

eval_metrics

Calculate the specified metrics for the specified dataset.

get_best_iteration

Return the identifier of the iteration with the best result of the evaluation metric or loss function on the last validation set.

get_best_score

Return the best result for each metric calculated on each validation dataset.

get_evals_result

Return the values of metrics calculated during the training.

get_feature_importance

Calculate and return the feature importances.

get_metadata Return a proxy object with metadata from the model's internal key-value string storage.
get_object_importance
Calculate the effect of objects from the train dataset on the optimized metric values for the objects from the input dataset:
  • Positive values reflect that the optimized metric increases.
  • Negative values reflect that the optimized metric decreases.
get_param

Return the value of the specified training parameter.

get_params

Return the training parameters.

get_test_eval

Return the formula values that were calculated for the objects from the validation dataset provided for training.

is_fitted

Check whether the model is trained.

load_model

Load the model from a file.

plot_tree
Visualize the CatBoost decision trees.
save_borders

Save the model borders to a file.

save_model

Save the model to a file.

score

Calculate the RMSE metric for the objects in the given dataset.

set_params

Set the training parameters.

shrink

Shrink the model. Only trees with indices from the range [ntree_start, ntree_end) are kept.

staged_predict

Apply the model to the given dataset and calculate the results taking into consideration only the trees in the range [0; i).

Method Description
fit

Train a model.

predict

Apply the model to the given dataset.

calc_feature_statistics

Calculate and plot a set of statistics for the chosen feature.

copy

Copy the CatBoost object.

compare

Draw train and evaluation metrics in Jupyter Notebook for two trained models.

eval_metrics

Calculate the specified metrics for the specified dataset.

get_best_iteration

Return the identifier of the iteration with the best result of the evaluation metric or loss function on the last validation set.

get_best_score

Return the best result for each metric calculated on each validation dataset.

get_evals_result

Return the values of metrics calculated during the training.

get_feature_importance

Calculate and return the feature importances.

get_metadata Return a proxy object with metadata from the model's internal key-value string storage.
get_object_importance
Calculate the effect of objects from the train dataset on the optimized metric values for the objects from the input dataset:
  • Positive values reflect that the optimized metric increases.
  • Negative values reflect that the optimized metric decreases.
get_param

Return the value of the specified training parameter.

get_params

Return the training parameters.

get_test_eval

Return the formula values that were calculated for the objects from the validation dataset provided for training.

is_fitted

Check whether the model is trained.

load_model

Load the model from a file.

plot_tree
Visualize the CatBoost decision trees.
save_borders

Save the model borders to a file.

save_model

Save the model to a file.

score

Calculate the RMSE metric for the objects in the given dataset.

set_params

Set the training parameters.

shrink

Shrink the model. Only trees with indices from the range [ntree_start, ntree_end) are kept.

staged_predict

Apply the model to the given dataset and calculate the results taking into consideration only the trees in the range [0; i).