CatBoost
Installation
Overview
Python package installation
Overview
pip install
conda install
Build from source on Linux and macOS
Build from source on Windows
Build a wheel package
Additional packages for data visualization support
Test CatBoost
CatBoost for Apache Spark installation
Overview
For Maven projects
For sbt projects
For PySpark
Build from source using Maven
R package installation
Overview
Install the released version
conda install
Build from source
Install from a local copy on Linux and macOS
Install from a local copy on Windows
Command-line version binary
Overview
Download
Build the binary from a local copy on Linux and macOS
Build the binary from a local copy on Windows
Build the binary with make on Linux (CPU only)
Build the binary with MPI support from a local copy (GPU only)
Build the binary with CMake
Key Features
Training
Training on GPU
Regular prediction
Staged prediction
Cross-validation
Feature importances
User-defined metrics
Using the overfitting detector
Export a model to CoreML
Pre-trained data
Calculate metrics
Categorical features
Text features
Embeddings features
Implemented metrics
Export a model to Python or C++
Export a model to JSON
Object importances
Training parameters
Overview
Common parameters
CTR settings
Input file settings
Multiclassification settings
Output settings
Overfitting detection settings
Performance settings
Processing unit settings
Quantization settings
Text processing parameters
Visualization settings
Python package
Quick start
CatBoost
Overview
fit
predict
Attributes
calc_leaf_indexes
calc_feature_statistics
compare
copy
eval_metrics
get_all_params
get_best_iteration
get_best_score
get_borders
get_evals_result
get_feature_importance
Overview
get_metadata
get_object_importance
get_param
get_params
get_scale_and_bias
get_test_eval
grid_search
is_fitted
load_model
plot_predictions
plot_tree
randomized_search
save_model
save_borders
select_features
set_scale_and_bias
set_feature_names
set_params
shrink
staged_predict
virtual_ensembles_predict
CatBoostClassifier
Overview
fit
predict
predict_proba
Attributes
calc_leaf_indexes
calc_feature_statistics
compare
copy
eval_metrics
get_all_params
get_best_iteration
get_best_score
get_borders
get_evals_result
get_feature_importance
get_metadata
get_object_importance
get_param
get_params
get_probability_threshold
get_scale_and_bias
get_test_eval
grid_search
is_fitted
load_model
plot_predictions
plot_tree
randomized_search
save_borders
save_model
score
select_features
set_feature_names
set_params
set_probability_threshold
set_scale_and_bias
shrink
staged_predict
staged_predict_proba
CatBoostRanker
Overview
fit
predict
Attributes
calc_leaf_indexes
calc_feature_statistics
compare
copy
eval_metrics
get_all_params
get_best_iteration
get_best_score
get_borders
get_evals_result
get_feature_importance
Overview
get_metadata
get_object_importance
get_param
get_params
get_scale_and_bias
get_test_eval
grid_search
is_fitted
load_model
plot_predictions
plot_tree
randomized_search
save_model
save_borders
score
select_features
set_scale_and_bias
set_feature_names
set_params
shrink
staged_predict
virtual_ensembles_predict
CatBoostRegressor
Overview
fit
predict
Attributes
calc_leaf_indexes
calc_feature_statistics
copy
compare
eval_metrics
get_all_params
get_best_iteration
get_best_score
get_borders
get_evals_result
get_feature_importance
get_metadata
get_object_importance
get_param
get_params
get_scale_and_bias
get_test_eval
grid_search
is_fitted
load_model
plot_predictions
plot_tree
randomized_search
save_borders
save_model
score
select_features
set_feature_names
set_params
set_scale_and_bias
shrink
staged_predict
cv
datasets
Overview
adult
amazon
epsilon
higgs
monotonic1
monotonic2
msrank
msrank_10k
rotten_tomatoes
titanic
FeaturesData
Overview
get_cat_feature_count
get_feature_count
get_feature_names
get_num_feature_count
get_object_count
MetricVisualizer
Overview
start
Pool
Overview
Attributes
get_baseline
get_cat_feature_indices
get_embedding_feature_indices
get_features
get_group_id
get_label
get_text_feature_indices
get_weight
is_quantized
num_col
num_row
quantize
save
save_quantization_borders
set_baseline
set_feature_names
set_group_id
set_group_weight
set_pairs
set_pairs_weight
set_subgroup_id
set_timestamp
set_weight
slice
Pool initialization
sum_models
to_classifier
to_regressor
train
Text processing
Overview
Tokenizer
Overview
tokenize
Dictionary
Overview
fit
apply
size
get_token
get_tokens
get_top_tokens
unknown_token_id
end_of_sentence_token_id
min_unused_token_id
load
save
utils
Overview
create_cd
eval_metric
get_confusion_matrix
get_gpu_device_count
get_fnr_curve
get_fpr_curve
get_roc_curve
quantize
select_threshold
Usage examples
CatBoost for Apache Spark
Overview
Quick start
Scala
Python
Spark cluster configuration
API documentation
Known limitations
Usage examples
Scala
Python
R package
Quick start
catboost.load_pool
catboost.save_pool
catboost.train
catboost.load_model
catboost.save_model
catboost.predict
catboost.shrink
catboost.staged_predict
catboost.get_feature_importance
catboost.get_object_importance
catboost.get_model_params
Attributes
Usage examples
Command-line version
Train a model
Cross-validation
Scale and bias
Apply a model
Calculate metrics
Calculate feature importance
Calculate object importance
Metadata manipulation
Select features
Sum models
Distributed learning
Usage examples
Applying models
Overview
C/C++
Overview
Evaluation library
Standalone evaluator
Java
Overview
CatBoostModel
Overview
loadModel
getPredictionDimension
getTreeCount
getUsedCategoricFeatureCount
getUsedNumericFeatureCount
predict
close
CatBoostPredictions
Overview
copyRowMajorPredictions
copyObjectPredictions
get
getObjectCount
getPredictionDimension
CoreML
ONNX
Rust
.NET
Applying the model in ClickHouse
PMML
Models exported as code
C++
Python
Objectives and metrics
Overview
Variables used in formulas
Regression
Multiregression
Classification
Multiclassification
Multilabel Classification
Ranking
Model analysis
Overview
Feature importance
ShapValues
Feature analysis charts
Feature interaction
Object importance
Data format description
Input data
Overview
Columns description
Dataset description in delimiter-separated values format
Dataset description in extended libsvm format
Pairs description
Custom quantization borders and missing value modes
Group weights
Baseline
Output data
Overview
Model values
Feature analysis
Overview
Feature importance
Feature interaction strength
ShapValues
Features selection result
Frequency Based Dictionary
BPE Dictionary
Objects strength
Metrics and time information
Profiler information
Metric
Time information
stdout
Custom quantization borders and missing value modes
ROC curve points
Parameter tuning
Speeding up the training
Data visualization
Overview
Jupyter Notebook
TensorBoard
Algorithm details
How training is performed
Overview
Preliminary calculation of splits
Transforming categorical features to numerical features
Transforming text features to numerical features
Choosing the tree structure
Bootstrap options
Unbiased boosting
Quantization
Overfitting detector
Recovering training after an interruption
Missing values processing
Score functions
FAQ
Educational materials
Tutorials
Reference papers
Videos
Development and contributions
Contacts
Feature analysis
The form at depends on the feature importance type:
Feature importance
Feature interaction strength
ShapValues
Was the article helpful?
Yes
No