Speeding up the training

CatBoost provides several settings that can speed up the training.

Note

Certain changes to these parameters can decrease the quality of the resulting model.

Training on GPU

If the dataset is large enough (starting from tens of thousands of objects), training on GPU gives a significant speedup compared to training on CPU. The larger the dataset, the more significant is the speedup. For example, the speedup for training on datasets with millions of objects on Volta GPUs is around 40-50 times.

Iterations and learning rate

By default, CatBoost builds 1000 trees. The number of iterations can be decreased to speed up the training.

When the number of iterations decreases, the learning rate needs to be increased. By default, the value of the learning rate is defined automatically depending on the number of iterations and the input dataset. Changing the number of iterations to a smaller value is a good starting point for optimization.

The default learning rate is close to optimal one, but it can be tuned to get the best possible quality. Look at evaluation metric values on each iteration to tune the learning rate:

  • Decrease the learning rate if overfitting is observed.
  • Increase the learning rate if there is no overfitting and the error on the evaluation dataset still reduces on the last iteration.
Parameters
Command-line version parameters Python parameters R parameters
-i

--iterations
iterations

Aliases:
- num_boost_round
- n_estimators
- num_trees
iterations
-w

--learning-rate
learning_rate

Alias:eta
learning_rate

Boosting type

By default, the boosting type is set to for small datasets. This prevents overfitting but it is expensive in terms of computation. Try to set the value of this parameter to  to speed up the training.

Parameters
Command-line version parameters Python parameters R parameters
--boosting-type boosting_type boosting_type

Bootstrap type

By default, the method for sampling the weights of objects is set to . The training is performed faster if the method is set and the value for the sample rate for bagging is smaller than 1.

Parameters
Command-line version parameters Python parameters R parameters
--bootstrap-type bootstrap_type bootstrap_type
--subsample subsample subsample

One-hot encoding

By default, the maximum number of different values of categorical features for using one-hot encoding depends on various conditions:

  • N/A if training is performed on CPU in Pairwise scoring mode
Read more about Pairwise scoring

The following loss functions use Pairwise scoring:

  • YetiRankPairwise
  • PairLogitPairwise
  • QueryCrossEntropy

Pairwise scoring is slightly different from regular training on pairs, since pairs are generated only internally during the training for the corresponding metrics. One-hot encoding is not available for these loss functions.

  • 255 if training is performed on GPU and the selected Ctr types require target data that is not available during the training
  • 10 if training is performed in Ranking mode
  • 2 if none of the conditions above is met

Statistics are calculated for all other categorical features. This is more time consuming than using one-hot encoding.

Set a larger value for this parameter to speed up the training.

Parameters
Command-line version parameters Python parameters R parameters
--one-hot-max-size one_hot_max_size one_hot_max_size

Random subspace method

For datasets with hundreds of features this parameter speeds up the training and usually does not affect the quality. It is not recommended to change the default value of this parameter for datasets with few (10-20) features.

For example, set the parameter to 0.1. In this case, the training requires roughly 20% more iterations to converge. But each iteration is performed roughly ten times faster. Therefore, the training time is much shorter even though the resulting model contains more trees.

Parameters
Command-line version parameters Python parameters R parameters
--rsm rsm

Alias:colsample_bylevel
rsm

Leaf estimation iterations

This parameter defines the rules for calculating leaf values after selecting the tree structures. The default value depends on the training objective and can slow down the training for datasets with a small number of features (for example, 10 features).

Try setting the value to 1 or 5 to speed up the training on datasets with a small number of features.

Parameters
Command-line version parameters Python parameters R parameters
--leaf-estimation-iterations leaf_estimation_iterations leaf_estimation_iterations

Number of categorical features to combine

By default, the combinations of categorical features are generated in a greedy way. This slows down the training.

Try turning off the generation of categorical feature combinations or limiting the number of categorical features that can be combined to two to speed up the training.

This parameter can affect the training time only if the dataset contains categorical features.

Parameters
Command-line version parameters Python parameters R parameters
--max-ctr-complexity max_ctr_complexity max_ctr_complexity

Number of splits for numerical features

This parameter defines the number of splits considered for each feature.

The default value depends on the processing unit type and other parameters:

  • CPU: 254
  • GPU in PairLogitPairwise and YetiRankPairwise modes: 32
  • GPU in all other modes: 128

The value of this parameter significantly impacts the speed of training on GPU. The smaller the value, the faster the training is performed.

Try to set the value of this parameter to 32 if training is performed on GPU. In many cases, this does not affect the quality of the model but significantly speeds up the training.

The value of this parameter does not significantly impact the speed of training on CPU. Try to set it to 254 for the best possible quality.

Parameters
Command-line version parameters Python parameters R parameters
-x

--border-count
border_count

Alias:max_bin
border_count

Reusing quantized datasets in Python

By default, the train and test datasets are quantized each time that the boosting is run.

If the dataset and quantization parameters are the same across multiple runs, the total wall clock time can be reduced by saving and reusing the quantized dataset. This optimization is applicable only for datasets without categorical features.

Example:

import numpy as np
from catboost import Pool, CatBoostRegressor


train_data = np.random.randint(1, 100, size=(10000, 10))
train_labels = np.random.randint(2, size=(10000))
quantized_dataset_path = 'quantized_dataset.bin'

# save quantized dataset
train_dataset = Pool(train_data, train_labels)
train_dataset.quantize()
train_dataset.save(quantized_dataset_path)

# fit multiple models w/o dataset quantization
quantized_train_dataset = Pool(data='quantized://' + quantized_dataset_path)

model_depth_four = CatBoostRegressor(depth=4)
model_depth_four.fit(quantized_train_dataset)

model_depth_eight = CatBoostRegressor(depth=8)
model_depth_eight.fit(quantized_train_dataset)

Using pandas.Categorical type instead of object

Use the pandas.Categorical type instead of the object type to speed up the preprocessing for datasets with categorical features up to 200 times.