train

train(pool=None,
      params=None,
      dtrain=None,
      logging_level=None,
      verbose=None,
      iterations=None,
      num_boost_round=None,
      evals=None,
      eval_set=None,
      plot=None,
      verbose_eval=None,
      metric_period=None,
      early_stopping_rounds=None,
      save_snapshot=None,
      snapshot_file=None,
      snapshot_interval=None,
      init_model=None)

Purpose

Train a model.

Note

Training or inference on CUDA-enabled GPU requires NVIDIA Driver of version 450.80.02 or higher.

Parameters

pool

Alias: dtrain

Description

The input training dataset.

Possible types

catboost.Pool

Default value

Required parameter

Supported processing units

CPU and GPU

params

Description

The list of parameters to start training with.

Possible types

dict

Default value

Required parameter

Supported processing units

CPU and GPU

logging_level

Description

The logging level to output to stdout.

Possible values:

  • Silent — Do not output any logging information to stdout.

  • Verbose — Output the following data to stdout:

    • optimized metric
    • elapsed time of training
    • remaining time of training
  • Info — Output additional information and the number of trees.

  • Debug — Output debugging information.

Alert

Should not be used with the verbose parameter.

Possible types

string

Default value

None (corresponds to the Verbose logging level)

Supported processing units

CPU and GPU

verbose

Alias: verbose_eval

Description

The purpose of this parameter depends on the type of the given value:

  • bool — Defines the logging level:

    • True  corresponds to the Verbose logging level
    • False corresponds to the Silent logging level
  • int — Use the Verbose logging level and set the logging period to the value of this parameter.

Alert

Do not use this parameter with the logging_level parameter.

Possible types

  • bool
  • int

Default value

1

Supported processing units

CPU and GPU

iterations

Alias: num_boost_round

Description

The maximum number of trees that can be built when solving machine learning problems.

When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter.

Possible types

int

Default value

1000

Supported processing units

CPU and GPU

eval_set

Alias: evals

Description

The validation dataset or datasets used for the following processes:

Possible types

  • catboost.Pool
  • list of catboost.Pool
  • tuple (X, y)
  • list of tuples (X, y)
  • string (path to the dataset file)
  • list of strings (paths to dataset files)

Default value

None

Supported processing units

CPU and GPU

Note

Only a single validation dataset can be input if the training is performed on GPU

plot

Description

Plot the following information during training:

  • the metric values;
  • the custom loss values;
  • the loss function change during feature selection;
  • the time has passed since training started;
  • the remaining time until the end of training.
    This option can be used if training is performed in Jupyter notebook.

Possible types

bool

Default value

False

Supported processing units

CPU

metric_period

Description

The frequency of iterations to calculate the values of objectives and metrics. The value should be a positive integer.

The usage of this parameter speeds up the training.

Note

It is recommended to increase the value of this parameter to maintain training speed if a GPU processing unit type is used.

Possible types

int

Default value

1

Supported processing units

CPU and GPU

early_stopping_rounds

Description

Sets the overfitting detector type to Iter and stops the training after the specified number of iterations since the iteration with the optimal metric value.

Possible types

int

Default value

False

Supported processing units

CPU and GPU

save_snapshot

Description

Enable snapshotting for restoring the training progress after an interruption. If enabled, the default period for making snapshots is 600 seconds. Use the snapshot_interval parameter to change this period.

Note

This parameter is not supported in the params parameter of the cv function.

Possible types

bool

Default value

None

Supported processing units

CPU and GPU

snapshot_file

Description

The name of the file to save the training progress information in. This file is used for recovering training after an interruption.

Depending on whether the specified file exists in the file system:

  • Missing — Write information about training progress to the specified file.
  • Exists — Load data from the specified file and continue training from where it left off.

Note

This parameter is not supported in the params parameter of the cv function.

Possible types

string

Default value

experiment...

experiment.cbsnapshot

Supported processing units

CPU and GPU

snapshot_interval

Description

The interval between saving snapshots in seconds.

The first snapshot is taken after the specified number of seconds since the start of training. Every subsequent snapshot is taken after the specified number of seconds since the previous one. The last snapshot is taken at the end of the training.

Note

This parameter is not supported in the params parameter of the cv function.

Possible types

int

Default value

600

Supported processing units

CPU and GPU

init_model

Description

The model to continue learning from.

Note

The initial model must have the same problem type as the one being solved in the current training (binary classification, multiclassification or regression/ranking).

Possible types

catboost.CatBoost, catboost.CatBoostClassifier, catboost.CatBoostRegressor
string

The path to the input file that contains the initial model.

Default value

None (incremental learning is not used)

Supported processing units

CPU