Parameter tuning
CatBoost provides a flexible interface for parameter tuning and can be configured to suit different tasks.
Onehot encoding
Sometimes when categorical features don't have a lot of values, onehot encoding works well.
 N/A if training is performed on CPU in Pairwise scoring mode
 255 if training is performed on GPU and the selected Ctr types require target data that is not available during the training
 10 if training is performed in Ranking mode
 2 if none of the conditions above is met
 N/A if training is performed on CPU in Pairwise scoring mode
 255 if training is performed on GPU and the selected Ctr types require target data that is not available during the training
 10 if training is performed in Ranking mode
 2 if none of the conditions above is met
Commandline version parameters  Python parameters  R parameters  Description  Default value 

onehotmaxsize  one_hot_max_size  one_hot_max_size  Use onehot encoding for all categorical features with a number of different values less than or equal to the given parameter value. Ctrs are not calculated for such features.  The default value depends on various conditions: 
Commandline version parameters  Python parameters  R parameters  Description  Default value 

onehotmaxsize  one_hot_max_size  one_hot_max_size  Use onehot encoding for all categorical features with a number of different values less than or equal to the given parameter value. Ctrs are not calculated for such features.  The default value depends on various conditions: 
Number of trees
It is recommended to check that there is no obvious underfitting or overfitting before tuning any other parameters. In order to do this it is necessary to analyze the metric value on the validation dataset and select the appropriate number of iterations.
This can be done by setting the number of iterations to a large value, using the overfitting detector parameters and turning the use best model options on. In this case the resulting model contains only the first k best iterations, where k is the iteration with the best loss value on the validation dataset.
Also, the metric for choosing the best model may differ from the one used for optimizing the objective value. For example, it is possible to set the optimized function to Logloss and use the AUC function for the overfitting detector. To do so, use the evaluation metric parameter.
 Build the number of trees defined by the training parameters.
 Use the validation dataset to identify the iteration with the optimal value of the metric specified in evalmetric (eval_metric).
 RMSE
 Logloss
 MAE
 CrossEntropy
 Quantile
 LogLinQuantile
 Lq
 MultiClass
 MultiClassOneVsAll
 MAPE
 Poisson
 PairLogit
 PairLogitPairwise
 QueryRMSE
 QuerySoftMax
 SMAPE
 Recall
 Precision
 F1
 TotalF1
 Accuracy
 BalancedAccuracy
 BalancedErrorRate
 Kappa
 WKappa
 LogLikelihoodOfPrediction
 AUC
 R2
 FairLoss
 NumErrors
 MCC
 BrierScore
 HingeLoss
 HammingLoss
 ZeroOneLoss
 MSLE
 MedianAbsoluteError
 Huber
 Expectile
 PairAccuracy
 AverageGain
 PFound
 NDCG
 DCG
 FilteredDCG
 NormalizedGini
 PrecisionAt
 RecallAt
 MAP
 IncToDec
 Iter
 IncToDec — Ignore the overfitting detector when the threshold is reached and continue learning for the specified number of iterations after the iteration with the optimal metric value.
 Iter — Consider the model overfitted and stop training after the specified number of iterations since the iteration with the optimal metric value.
 Build the number of trees defined by the training parameters.
 Use the validation dataset to identify the iteration with the optimal value of the metric specified in evalmetric (eval_metric).
 RMSE
 Logloss
 MAE
 CrossEntropy
 Quantile
 LogLinQuantile
 Lq
 MultiClass
 MultiClassOneVsAll
 MAPE
 Poisson
 PairLogit
 PairLogitPairwise
 QueryRMSE
 QuerySoftMax
 SMAPE
 Recall
 Precision
 F1
 TotalF1
 Accuracy
 BalancedAccuracy
 BalancedErrorRate
 Kappa
 WKappa
 LogLikelihoodOfPrediction
 AUC
 R2
 FairLoss
 NumErrors
 MCC
 BrierScore
 HingeLoss
 HammingLoss
 ZeroOneLoss
 MSLE
 MedianAbsoluteError
 Huber
 Expectile
 PairAccuracy
 AverageGain
 PFound
 NDCG
 DCG
 FilteredDCG
 NormalizedGini
 PrecisionAt
 RecallAt
 MAP
 IncToDec
 Iter
 IncToDec — Ignore the overfitting detector when the threshold is reached and continue learning for the specified number of iterations after the iteration with the optimal metric value.
 Iter — Consider the model overfitted and stop training after the specified number of iterations since the iteration with the optimal metric value.
Commandline version parameters  Python parameters  R parameters  Description 

i iterations  iterations  iterations  The maximum number of trees that can be built when solving machine learning problems. When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter. 
usebestmodel  use_best_model  use_best_model  If this parameter is set, the number of trees that are saved in the resulting model is defined as follows: No trees are saved after this iteration. This option requires a validation dataset to be provided. 
evalmetric  eval_metric  eval_metric  The metric used for overfitting detection (if enabled) and best model selection (if enabled). Some metrics support optional parameters (see the Objectives and metrics section for details on each metric). Format:
Supported metrics: Examples:

Overfitting detection settings  
odtype  od_type  od_type  The type of the overfitting detector to use. Possible values: 
odpval  od_pval  od_pval  The threshold for the IncToDec overfitting detector type. The training is stopped when the specified value is reached. Requires that a validation dataset was input. For best results, it is recommended to set a value in the range . The larger the value, the earlier overfitting is detected. Restriction. Do not use this parameter with the Iter overfitting detector type. 
odwait  od_wait  od_wait  The number of iterations to continue the training after the iteration with the optimal metric value. The purpose of this parameter differs depending on the selected overfitting detector type: 
Commandline version parameters  Python parameters  R parameters  Description 

i iterations  iterations  iterations  The maximum number of trees that can be built when solving machine learning problems. When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter. 
usebestmodel  use_best_model  use_best_model  If this parameter is set, the number of trees that are saved in the resulting model is defined as follows: No trees are saved after this iteration. This option requires a validation dataset to be provided. 
evalmetric  eval_metric  eval_metric  The metric used for overfitting detection (if enabled) and best model selection (if enabled). Some metrics support optional parameters (see the Objectives and metrics section for details on each metric). Format:
Supported metrics: Examples:

Overfitting detection settings  
odtype  od_type  od_type  The type of the overfitting detector to use. Possible values: 
odpval  od_pval  od_pval  The threshold for the IncToDec overfitting detector type. The training is stopped when the specified value is reached. Requires that a validation dataset was input. For best results, it is recommended to set a value in the range . The larger the value, the earlier overfitting is detected. Restriction. Do not use this parameter with the Iter overfitting detector type. 
odwait  od_wait  od_wait  The number of iterations to continue the training after the iteration with the optimal metric value. The purpose of this parameter differs depending on the selected overfitting detector type: 
Learning rate
This setting is used for reducing the gradient step. It affects the overall time of training: the smaller the value, the more iterations are required for training. Choose the value based on the performance expectations.
By default, the learning rate is defined automatically based on the dataset properties and the number of iterations. The automatically defined value should be close to the optimal one.
 There is no overfitting on the last iterations of training (the training does not converge) — increase the learning rate.
 Overfitting is detected — decrease the learning rate.
Commandline version parameters  Python parameters  R parameters  Description 

w learningrate  learning_rate  learning_rate  The learning rate. Used for reducing the gradient step. 
Commandline version parameters  Python parameters  R parameters  Description 

w learningrate  learning_rate  learning_rate  The learning rate. Used for reducing the gradient step. 
Tree depth
In most cases, the optimal depth ranges from 4 to 10. Values in the range from 6 to 10 are recommended.
The maximum depth of the trees is limited to 8 for pairwise modes (YetiRank, PairLogitPairwise and QueryCrossEntropy) when the training is performed on GPU.
CPU — Any integer up to 16.
GPU — Any integer up to 8 pairwise modes (YetiRank, PairLogitPairwise and QueryCrossEntropy) and up to 16 for all other loss functions.
CPU — Any integer up to 16.
GPU — Any integer up to 8 pairwise modes (YetiRank, PairLogitPairwise and QueryCrossEntropy) and up to 16 for all other loss functions.
Commandline version parameters  Python parameters  R parameters  Description 

n depth  depth  depth  Depth of the tree. The range of supported values depends on the processing unit type and the type of the selected loss function: 
Commandline version parameters  Python parameters  R parameters  Description 

n depth  depth  depth  Depth of the tree. The range of supported values depends on the processing unit type and the type of the selected loss function: 
L2 regularization
Try different values for the regularizer to find the best possible.
Commandline version parameters  Python parameters  R parameters  Description 

l2leafreg  l2_leaf_reg  l2_leaf_reg  Coefficient at the L2 regularization term of the cost function. Any positive value is allowed. 
Commandline version parameters  Python parameters  R parameters  Description 

l2leafreg  l2_leaf_reg  l2_leaf_reg  Coefficient at the L2 regularization term of the cost function. Any positive value is allowed. 
Random strength
Try setting different values for the random_strength
parameter.
 QueryCrossEntropy
 YetiRankPairwise
 PairLogitPairwise
 QueryCrossEntropy
 YetiRankPairwise
 PairLogitPairwise
Commandline version parameters  Python parameters  R parameters  Description 

randomstrength  random_strength  random_strength  The amount of randomness to use for scoring splits when the tree structure is selected. Use this parameter to avoid overfitting the model. The value of this parameter is used when selecting splits. On every iteration each possible split gets a score (for example, the score indicates how much adding this split will improve the loss function for the training dataset). The split with the highest score is selected. The scores have no randomness. A normally distributed random variable is added to the score of the feature. It has a zero mean and a variance that decreases during the training. The value of this parameter is the multiplier of the variance. Note. This parameter is not supported for the following loss functions: 
Commandline version parameters  Python parameters  R parameters  Description 

randomstrength  random_strength  random_strength  The amount of randomness to use for scoring splits when the tree structure is selected. Use this parameter to avoid overfitting the model. The value of this parameter is used when selecting splits. On every iteration each possible split gets a score (for example, the score indicates how much adding this split will improve the loss function for the training dataset). The split with the highest score is selected. The scores have no randomness. A normally distributed random variable is added to the score of the feature. It has a zero mean and a variance that decreases during the training. The value of this parameter is the multiplier of the variance. Note. This parameter is not supported for the following loss functions: 
Bagging temperature
Try setting different values for the bagging_temperature
parameter
Commandline version parameters  Python parameters  R parameters  Description 

baggingtemperature  bagging_temperature  bagging_temperature  Defines the settings of the Bayesian bootstrap. It is used by default in classification and regression modes. Use the Bayesian bootstrap to assign random weights to objects. The weights are sampled from exponential distribution if the value of this parameter is set to “1”. All weights are equal to 1 if the value of this parameter is set to “0”. Possible values are in the range . The higher the value the more aggressive the bagging is. This parameter can be used if the selected bootstrap type is Bayesian. 
Commandline version parameters  Python parameters  R parameters  Description 

baggingtemperature  bagging_temperature  bagging_temperature  Defines the settings of the Bayesian bootstrap. It is used by default in classification and regression modes. Use the Bayesian bootstrap to assign random weights to objects. The weights are sampled from exponential distribution if the value of this parameter is set to “1”. All weights are equal to 1 if the value of this parameter is set to “0”. Possible values are in the range . The higher the value the more aggressive the bagging is. This parameter can be used if the selected bootstrap type is Bayesian. 
Border count
The number of splits for numerical features.
By default, it is set to 254 (if training is performed on CPU) or 128 (if training is performed on GPU).
The value of this parameter significantly impacts the speed of training on GPU. The smaller the value, the faster the training is performed (refer to the Number of splits for numerical features section for details).
128 splits are enough for many datasets. However, try to set the value of this parameter to 254 when training on GPU if the best possible quality is required.
The value of this parameter does not significantly impact the speed of training on CPU. Try to set it to 254 for the best possible quality.
CPU — integers from 1 to 65535 inclusively.
GPU — integers from 1 to 255 inclusively.
CPU — integers from 1 to 65535 inclusively.
GPU — integers from 1 to 255 inclusively.
Commandline version parameters  Python parameters  R parameters  Description 

x bordercount  border_count Alias: max_bin  border_count  The number of splits for numerical features. Allowed values depend on the processing unit type: Recommended values are up to 255. Larger values slow down the training. 
Commandline version parameters  Python parameters  R parameters  Description 

x bordercount  border_count Alias: max_bin  border_count  The number of splits for numerical features. Allowed values depend on the processing unit type: Recommended values are up to 255. Larger values slow down the training. 
Internal dataset order
Use this option if the objects in your dataset are given in the required order. In this case, random permutations are not performed during the Transforming categorical features to numerical features and Choosing the tree structure stages.
Commandline version parameters  Python parameters  R parameters  Description 

hastime  has_time  has_time  Use the order of objects in the input data (do not perform random permutations during the Transforming categorical features to numerical features and Choosing the tree structure stages). The Timestamp column type is used to determine the order of objects if specified in the input data. 
Commandline version parameters  Python parameters  R parameters  Description 

hastime  has_time  has_time  Use the order of objects in the input data (do not perform random permutations during the Transforming categorical features to numerical features and Choosing the tree structure stages). The Timestamp column type is used to determine the order of objects if specified in the input data. 
Tree growing policy
By default, CatBoost uses symmetric trees, which are built if the growing policy is set to SymmetricTree.
Such trees are built level by level until the specified depth is reached. On each iteration, all leaves from the last tree level are split with the same condition. The resulting tree structure is always symmetric.
Symmetric trees have a very good prediction speed (roughly 10 times faster than regular trees) and give better quality in many cases.
However, in some cases, other tree growing strategies can give better results than growing symmetric trees.
Try to analyze the results obtained with different growing trees strategies.
 Symmetric trees, that are used by default, can be applied much faster (up to 10 times faster).
 Model analysis tools like ShapValues are currently supported only for symmetric trees.
 Regular trees are currently supported only on GPU.
 SymmetricTree —A tree is built level by level until the specified depth is reached. On each iteration, all leaves from the last tree level are split with the same condition. The resulting tree structure is always symmetric.
 Depthwise — A tree is built level by level until the specified depth is reached. On each iteration, all nonterminal leaves from the last tree level are split. Each leaf is split by condition with the best loss improvement.
 Lossguide — A tree is built leaf by leaf until the specified maximum number of leaves is reached. On each iteration, nonterminal leaf with the best loss improvement is split.
 SymmetricTree —A tree is built level by level until the specified depth is reached. On each iteration, all leaves from the last tree level are split with the same condition. The resulting tree structure is always symmetric.
 Depthwise — A tree is built level by level until the specified depth is reached. On each iteration, all nonterminal leaves from the last tree level are split. Each leaf is split by condition with the best loss improvement.
 Lossguide — A tree is built leaf by leaf until the specified maximum number of leaves is reached. On each iteration, nonterminal leaf with the best loss improvement is split.
Commandline version parameters  Python parameters  R parameters  Description 

growpolicy  grow_policy  grow_policy  The tree growing policy. Defines how to perform greedy tree construction. Possible values: Note. The Depthwise and Lossguide growing policies are currently supported only in training and prediction modes. They are not supported for model analysis (such as Feature importance and ShapValues) and exporting to different model formats (such as AppleCoreML , onnx and json) . 
mindatainleaf  min_data_in_leaf Alias: min_child_samples  min_data_in_leaf  The minimum number of training samples in a leaf. CatBoost does not search for new splits in leaves with samples count less than the specified value. Can be used only with the Lossguide and Depthwise growing policies. 
maxleaves  max_leaves Alias: num_leaves  max_leaves  The maximum number of leafs in the resulting tree. Can be used only with the Lossguide growing policy. Tip. It is not recommended to use values greater than 64, since it can significantly slow down the training process. 
Commandline version parameters  Python parameters  R parameters  Description 

growpolicy  grow_policy  grow_policy  The tree growing policy. Defines how to perform greedy tree construction. Possible values: Note. The Depthwise and Lossguide growing policies are currently supported only in training and prediction modes. They are not supported for model analysis (such as Feature importance and ShapValues) and exporting to different model formats (such as AppleCoreML , onnx and json) . 
mindatainleaf  min_data_in_leaf Alias: min_child_samples  min_data_in_leaf  The minimum number of training samples in a leaf. CatBoost does not search for new splits in leaves with samples count less than the specified value. Can be used only with the Lossguide and Depthwise growing policies. 
maxleaves  max_leaves Alias: num_leaves  max_leaves  The maximum number of leafs in the resulting tree. Can be used only with the Lossguide growing policy. Tip. It is not recommended to use values greater than 64, since it can significantly slow down the training process. 
Golden features
An increased number of borders should not be set for all features. It is recommended to set it for one or two golden features.
perfloatfeaturequantization 0:border_count=1024
In this example, the feature indexed 0 has 1024 borders.
perfloatfeaturequantization 0:border_count=1024;1:border_count=1024
In this example, features indexed 0 and 1 have 1024 borders.
per_float_feature_quantization='0:border_count=1024'
In this example, the feature indexed 0 has 1024 borders.
per_float_feature_quantization=['0:border_count=1024', '1:border_count=1024']
In this example, features indexed 0 and 1 have 1024 borders.
per_float_feature_quantization = '0:border_count=1024')
In this example, the feature indexed 0 has 1024 borders.
per_float_feature_quantization = c('0:border_count=1024', '1:border_count=1024'
In this example, features indexed 0 and 1 have 1024 borders.
Parameter  Description 

perfloatfeaturequantization  A semicolon separated list of quantization descriptions. Format: 
Parameter  Description 

perfloatfeaturequantization  A semicolon separated list of quantization descriptions. Format: 
Examples:
Parameter  Description 

per_float_feature_quantization  The quantization description for the specified feature or list of features. Description format for a single feature:

Parameter  Description 

per_float_feature_quantization  The quantization description for the specified feature or list of features. Description format for a single feature:

Parameter  Description 

per_float_feature_quantization  The quantization description for the specified feature or list of features. Description format for a single feature:

Parameter  Description 

per_float_feature_quantization  The quantization description for the specified feature or list of features. Description format for a single feature:

Examples: