Docs for catboostCatBoost
  • Installation
    • Overview
    • Python package installation
      • Overview
      • pip install
      • conda install
      • Build from source on Linux and macOS
      • Build from source on Windows
      • Build a wheel package
      • Additional packages for data visualization support
      • Test CatBoost
    • CatBoost for Apache Spark installation
      • Overview
      • For Maven projects
      • For sbt projects
      • For PySpark
      • Build from source using Maven
    • R package installation
      • Overview
      • Install the released version
      • conda install
      • Build from source
      • Install from a local copy on Linux and macOS
      • Install from a local copy on Windows
    • Command-line version binary
      • Overview
      • Download
      • Build the binary from a local copy on Linux and macOS
      • Build the binary from a local copy on Windows
      • Build the binary with make on Linux (CPU only)
      • Build the binary with MPI support from a local copy (GPU only)
      • Build the binary with CMake
  • Key Features
    • Training
    • Training on GPU
    • Regular prediction
    • Staged prediction
    • Cross-validation
    • Feature importances
    • User-defined metrics
    • Using the overfitting detector
    • Export a model to CoreML
    • Pre-trained data
    • Calculate metrics
    • Categorical features
    • Text features
    • Embeddings features
    • Implemented metrics
    • Export a model to Python or C++
    • Export a model to JSON
    • Object importances
  • Training parameters
    • Overview
    • Common parameters
    • CTR settings
    • Input file settings
    • Multiclassification settings
    • Output settings
    • Overfitting detection settings
    • Performance settings
    • Processing unit settings
    • Quantization settings
    • Text processing parameters
    • Visualization settings
  • Python package
    • Quick start
    • CatBoost
      • Overview
      • fit
      • predict
      • Attributes
      • calc_leaf_indexes
      • calc_feature_statistics
      • compare
      • copy
      • eval_metrics
      • get_all_params
      • get_best_iteration
      • get_best_score
      • get_borders
      • get_evals_result
      • get_feature_importance
        • Overview
      • get_metadata
      • get_object_importance
      • get_param
      • get_params
      • get_scale_and_bias
      • get_test_eval
      • grid_search
      • is_fitted
      • load_model
      • plot_predictions
      • plot_tree
      • randomized_search
      • save_model
      • save_borders
      • select_features
      • set_scale_and_bias
      • set_feature_names
      • set_params
      • shrink
      • staged_predict
      • virtual_ensembles_predict
    • CatBoostClassifier
      • Overview
      • fit
      • predict
      • predict_proba
      • Attributes
      • calc_leaf_indexes
      • calc_feature_statistics
      • compare
      • copy
      • eval_metrics
      • get_all_params
      • get_best_iteration
      • get_best_score
      • get_borders
      • get_evals_result
      • get_feature_importance
      • get_metadata
      • get_object_importance
      • get_param
      • get_params
      • get_probability_threshold
      • get_scale_and_bias
      • get_test_eval
      • grid_search
      • is_fitted
      • load_model
      • plot_predictions
      • plot_tree
      • randomized_search
      • save_borders
      • save_model
      • score
      • select_features
      • set_feature_names
      • set_params
      • set_probability_threshold
      • set_scale_and_bias
      • shrink
      • staged_predict
      • staged_predict_proba
    • CatBoostRanker
      • Overview
      • fit
      • predict
      • Attributes
      • calc_leaf_indexes
      • calc_feature_statistics
      • compare
      • copy
      • eval_metrics
      • get_all_params
      • get_best_iteration
      • get_best_score
      • get_borders
      • get_evals_result
      • get_feature_importance
        • Overview
      • get_metadata
      • get_object_importance
      • get_param
      • get_params
      • get_scale_and_bias
      • get_test_eval
      • grid_search
      • is_fitted
      • load_model
      • plot_predictions
      • plot_tree
      • randomized_search
      • save_model
      • save_borders
      • score
      • select_features
      • set_scale_and_bias
      • set_feature_names
      • set_params
      • shrink
      • staged_predict
      • virtual_ensembles_predict
    • CatBoostRegressor
      • Overview
      • fit
      • predict
      • Attributes
      • calc_leaf_indexes
      • calc_feature_statistics
      • copy
      • compare
      • eval_metrics
      • get_all_params
      • get_best_iteration
      • get_best_score
      • get_borders
      • get_evals_result
      • get_feature_importance
      • get_metadata
      • get_object_importance
      • get_param
      • get_params
      • get_scale_and_bias
      • get_test_eval
      • grid_search
      • is_fitted
      • load_model
      • plot_predictions
      • plot_tree
      • randomized_search
      • save_borders
      • save_model
      • score
      • select_features
      • set_feature_names
      • set_params
      • set_scale_and_bias
      • shrink
      • staged_predict
    • cv
    • datasets
      • Overview
      • adult
      • amazon
      • epsilon
      • higgs
      • monotonic1
      • monotonic2
      • msrank
      • msrank_10k
      • rotten_tomatoes
      • titanic
    • FeaturesData
      • Overview
      • get_cat_feature_count
      • get_feature_count
      • get_feature_names
      • get_num_feature_count
      • get_object_count
    • MetricVisualizer
      • Overview
      • start
    • Pool
      • Overview
      • Attributes
      • get_baseline
      • get_cat_feature_indices
      • get_embedding_feature_indices
      • get_features
      • get_group_id
      • get_label
      • get_text_feature_indices
      • get_weight
      • is_quantized
      • num_col
      • num_row
      • quantize
      • save
      • save_quantization_borders
      • set_baseline
      • set_feature_names
      • set_group_id
      • set_group_weight
      • set_pairs
      • set_pairs_weight
      • set_subgroup_id
      • set_timestamp
      • set_weight
      • slice
      • Pool initialization
    • sum_models
    • to_classifier
    • to_regressor
    • train
    • Text processing
      • Overview
      • Tokenizer
        • Overview
        • tokenize
      • Dictionary
        • Overview
        • fit
        • apply
        • size
        • get_token
        • get_tokens
        • get_top_tokens
        • unknown_token_id
        • end_of_sentence_token_id
        • min_unused_token_id
        • load
        • save
    • utils
      • Overview
      • create_cd
      • eval_metric
      • get_confusion_matrix
      • get_gpu_device_count
      • get_fnr_curve
      • get_fpr_curve
      • get_roc_curve
      • quantize
      • select_threshold
    • Usage examples
  • CatBoost for Apache Spark
    • Overview
    • Quick start
      • Scala
      • Python
    • Spark cluster configuration
    • API documentation
    • Known limitations
    • Usage examples
      • Scala
      • Python
  • R package
    • Quick start
    • catboost.load_pool
    • catboost.save_pool
    • catboost.train
    • catboost.load_model
    • catboost.save_model
    • catboost.predict
    • catboost.shrink
    • catboost.staged_predict
    • catboost.get_feature_importance
    • catboost.get_object_importance
    • catboost.get_model_params
    • Attributes
    • Usage examples
  • Command-line version
    • Train a model
    • Cross-validation
    • Scale and bias
    • Apply a model
    • Calculate metrics
    • Calculate feature importance
    • Calculate object importance
    • Metadata manipulation
    • Select features
    • Sum models
    • Distributed learning
    • Usage examples
  • Applying models
    • Overview
    • C/C++
      • Overview
      • Evaluation library
      • Standalone evaluator
    • Java
      • Overview
      • CatBoostModel
        • Overview
        • loadModel
        • getPredictionDimension
        • getTreeCount
        • getUsedCategoricFeatureCount
        • getUsedNumericFeatureCount
        • predict
        • close
      • CatBoostPredictions
        • Overview
        • copyRowMajorPredictions
        • copyObjectPredictions
        • get
        • getObjectCount
        • getPredictionDimension
    • CoreML
    • ONNX
    • Rust
    • .NET
    • Applying the model in ClickHouse
    • PMML
    • Models exported as code
      • C++
      • Python
  • Objectives and metrics
    • Overview
    • Variables used in formulas
    • Regression
    • Multiregression
    • Classification
    • Multiclassification
    • Multilabel Classification
    • Ranking
  • Model analysis
    • Overview
    • Feature importance
    • ShapValues
    • Feature analysis charts
    • Feature interaction
    • Object importance
  • Data format description
    • Input data
      • Overview
      • Columns description
      • Dataset description in delimiter-separated values format
      • Dataset description in extended libsvm format
      • Pairs description
      • Custom quantization borders and missing value modes
      • Group weights
      • Baseline
    • Output data
      • Overview
      • Model values
      • Feature analysis
        • Overview
        • Feature importance
        • Feature interaction strength
        • ShapValues
      • Features selection result
      • Frequency Based Dictionary
      • BPE Dictionary
      • Objects strength
      • Metrics and time information
      • Profiler information
      • Metric
      • Time information
      • stdout
      • Custom quantization borders and missing value modes
      • ROC curve points
  • Parameter tuning
  • Speeding up the training
  • Data visualization
    • Overview
    • Jupyter Notebook
    • TensorBoard
  • Algorithm details
    • How training is performed
      • Overview
      • Preliminary calculation of splits
      • Transforming categorical features to numerical features
      • Transforming text features to numerical features
      • Choosing the tree structure
      • Bootstrap options
      • Unbiased boosting
    • Quantization
    • Overfitting detector
    • Recovering training after an interruption
    • Missing values processing
    • Score functions
  • FAQ
  • Educational materials
    • Tutorials
    • Reference papers
    • Videos
  • Development and contributions
  • Contacts

CatBoost

CatBoost is a machine learning algorithm that uses gradient boosting on decision trees. It is available as an open source library.
  • Training

    • Training
    • Training on GPU
    • Python train function
    • Cross-validation
    • Overfitting detector
    • Pre-trained data
    • Categorical features
    • Text features
    • Embeddings features
  • Applying models

    • Regular prediction
    • С and C++
    • Java
    • Rust
    • Calculate metrics
    • Staged prediction
    • Applying the model in ClickHouse
  • Model analysis

    • Feature importances
    • Object importances
  • Metrics

    • Implemented metrics
    • User-defined metrics
  • Metrics

    • Recovery
  • Visualization tools

    • Jupyter Notebook
    • TensorBoard
  • Exporting models

    • CoreML
    • Python or C++
    • JSON
    • ONNX
    • PMML
  • Educational materials

    • Tutorials
    • Reference papers
    • Videos