train

train(pool=None, 
      params=None, 
      dtrain=None, 
      logging_level=None, 
      verbose=None, 
      iterations=None,
      num_boost_round=None, 
      evals=None, 
      eval_set=None, 
      plot=None, 
      verbose_eval=None, 
      metric_period=None,
      early_stopping_rounds=None, 
      save_snapshot=None, 
      snapshot_file=None, 
      snapshot_interval=None,
      init_model=None)

Purpose

Train a model.

Note.

Training on GPU requires NVIDIA Driver of version 390.xx or higher.

Parameters

Parameter Possible types Description Default value Supported processing units

pool

Alias: dtrain

catboost.Pool

The input training dataset.

Required parameter

CPU and GPU

params dict

The list of parameters to start training with.

Required parameter

CPU and GPU

logging_level string

The logging level to output to stdout.

Possible values:
  • Silent — Do not output any logging information to stdout.

  • Verbose — Output the following data to stdout:

    • optimized metric
    • elapsed time of training
    • remaining time of training
  • Info — Output additional information and the number of trees.

  • Debug — Output debugging information.
Restriction. Should not be used with the verbose parameter.
None (corresponds to the Verbose logging level)

CPU and GPU

verbose

Alias: verbose_eval

  • bool
  • int

The purpose of this parameter depends on the type of the given value:

  • bool — Defines the logging level:
    • “True”  corresponds to the Verbose logging level
    • “False” corresponds to the Silent logging level
  • int — Use the Verbose logging level and set the logging period to the value of this parameter.
Restriction. Do not use this parameter with the logging_level parameter.
1

CPU and GPU

iterations

Alias: num_boost_round

int

The maximum number of trees that can be built when solving machine learning problems.

When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter.

1000

CPU and GPU

eval_set

Alias: evals

  • catboost.Pool
  • list of catboost.Pool
  • tuple (x, y)
  • list of tuples (x, y)
  • string (path to the dataset file)
  • list of strings (paths to dataset files)
The validation dataset or datasets used for the following processes:
None

CPU and GPU

Note. Only a single validation dataset can be input if the training is performed on GPU
plot bool
Plot the following information during training:
  • the metric values;
  • the custom loss values;
  • the time has passed since training started;
  • the remaining time until the end of training.
This option can be used if training is performed in Jupyter notebook.
False

CPU

metric_period int

The frequency of iterations to calculate the values of objectives and metrics. The value should be a positive integer.

The usage of this parameter speeds up the training.

Note.

It is recommended to increase the value of this parameter to maintain training speed if a GPU processing unit type is used.

1

CPU and GPU

early_stopping_rounds int Sets the overfitting detector type to Iter and stops the training after the specified number of iterations since the iteration with the optimal metric value. False

CPU and GPU

save_snapshot bool

Enable snapshotting for restoring the training progress after an interruption. If enabled, the default period for making snapshots is 600 seconds. Use the snapshot_interval parameter to change this period.

Note. This parameter is not supported in the params parameter of the cv function.
None

CPU and GPU

snapshot_file string

The name of the file to save the training progress information in. This file is used for recovering training after an interruption.

Depending on whether the specified file exists in the file system:
  • Missing — Write information about training progress to the specified file.
  • Exists — Load data from the specified file and continue training from where it left off.
Note. This parameter is not supported in the params parameter of the cv function.

experiment...

CPU and GPU

snapshot_interval int

The interval between saving snapshots in seconds.

The first snapshot is taken after the specified number of seconds since the start of training. Every subsequent snapshot is taken after the specified number of seconds since the previous one. The last snapshot is taken at the end of the training.

Note. This parameter is not supported in the params parameter of the cv function.
600

CPU and GPU

init_model
The model to continue learning from.
Note. The initial model must have the same problem type as the one being solved in the current training (binary classification, multiclassification or regression/ranking).
None (incremental learning is not used) CPU

The initial model object.

string

The path to the input file that contains the initial model.

Parameter Possible types Description Default value Supported processing units

pool

Alias: dtrain

catboost.Pool

The input training dataset.

Required parameter

CPU and GPU

params dict

The list of parameters to start training with.

Required parameter

CPU and GPU

logging_level string

The logging level to output to stdout.

Possible values:
  • Silent — Do not output any logging information to stdout.

  • Verbose — Output the following data to stdout:

    • optimized metric
    • elapsed time of training
    • remaining time of training
  • Info — Output additional information and the number of trees.

  • Debug — Output debugging information.
Restriction. Should not be used with the verbose parameter.
None (corresponds to the Verbose logging level)

CPU and GPU

verbose

Alias: verbose_eval

  • bool
  • int

The purpose of this parameter depends on the type of the given value:

  • bool — Defines the logging level:
    • “True”  corresponds to the Verbose logging level
    • “False” corresponds to the Silent logging level
  • int — Use the Verbose logging level and set the logging period to the value of this parameter.
Restriction. Do not use this parameter with the logging_level parameter.
1

CPU and GPU

iterations

Alias: num_boost_round

int

The maximum number of trees that can be built when solving machine learning problems.

When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter.

1000

CPU and GPU

eval_set

Alias: evals

  • catboost.Pool
  • list of catboost.Pool
  • tuple (x, y)
  • list of tuples (x, y)
  • string (path to the dataset file)
  • list of strings (paths to dataset files)
The validation dataset or datasets used for the following processes:
None

CPU and GPU

Note. Only a single validation dataset can be input if the training is performed on GPU
plot bool
Plot the following information during training:
  • the metric values;
  • the custom loss values;
  • the time has passed since training started;
  • the remaining time until the end of training.
This option can be used if training is performed in Jupyter notebook.
False

CPU

metric_period int

The frequency of iterations to calculate the values of objectives and metrics. The value should be a positive integer.

The usage of this parameter speeds up the training.

Note.

It is recommended to increase the value of this parameter to maintain training speed if a GPU processing unit type is used.

1

CPU and GPU

early_stopping_rounds int Sets the overfitting detector type to Iter and stops the training after the specified number of iterations since the iteration with the optimal metric value. False

CPU and GPU

save_snapshot bool

Enable snapshotting for restoring the training progress after an interruption. If enabled, the default period for making snapshots is 600 seconds. Use the snapshot_interval parameter to change this period.

Note. This parameter is not supported in the params parameter of the cv function.
None

CPU and GPU

snapshot_file string

The name of the file to save the training progress information in. This file is used for recovering training after an interruption.

Depending on whether the specified file exists in the file system:
  • Missing — Write information about training progress to the specified file.
  • Exists — Load data from the specified file and continue training from where it left off.
Note. This parameter is not supported in the params parameter of the cv function.

experiment...

CPU and GPU

snapshot_interval int

The interval between saving snapshots in seconds.

The first snapshot is taken after the specified number of seconds since the start of training. Every subsequent snapshot is taken after the specified number of seconds since the previous one. The last snapshot is taken at the end of the training.

Note. This parameter is not supported in the params parameter of the cv function.
600

CPU and GPU

init_model
The model to continue learning from.
Note. The initial model must have the same problem type as the one being solved in the current training (binary classification, multiclassification or regression/ranking).
None (incremental learning is not used) CPU

The initial model object.

string

The path to the input file that contains the initial model.