train
train(pool=None,
params=None,
dtrain=None,
logging_level=None,
verbose=None,
iterations=None,
num_boost_round=None,
evals=None,
eval_set=None,
plot=None,
verbose_eval=None,
metric_period=None,
early_stopping_rounds=None,
save_snapshot=None,
snapshot_file=None,
snapshot_interval=None,
init_model=None)
Purpose
Train a model.
Note
Training or inference on CUDA-enabled GPUs requires NVIDIA Driver of version 450.80.02 or higher.
Parameters
pool
Alias: dtrain
Description
The input training dataset.
Possible types
catboost.Pool
Default value
Required parameter
Supported processing units
CPU and GPU
params
Description
The list of parameters to start training with.
Possible types
dict
Default value
Required parameter
Supported processing units
CPU and GPU
logging_level
Description
The logging level to output to stdout.
Possible values:
-
Silent — Do not output any logging information to stdout.
-
Verbose — Output the following data to stdout:
- optimized metric
- elapsed time of training
- remaining time of training
-
Info — Output additional information and the number of trees.
-
Debug — Output debugging information.
Alert
Should not be used with the verbose
parameter.
Possible types
string
Default value
None (corresponds to the Verbose logging level)
Supported processing units
CPU and GPU
verbose
Alias: verbose_eval
Description
The purpose of this parameter depends on the type of the given value:
-
bool — Defines the logging level:
True
corresponds to the Verbose logging levelFalse
corresponds to the Silent logging level
-
int — Use the Verbose logging level and set the logging period to the value of this parameter.
Alert
Do not use this parameter with the logging_level
parameter.
Possible types
- bool
- int
Default value
1
Supported processing units
CPU and GPU
iterations
Alias: num_boost_round
Description
The maximum number of trees that can be built when solving machine learning problems.
When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter.
Possible types
int
Default value
1000
Supported processing units
CPU and GPU
eval_set
Alias: evals
Description
The validation dataset or datasets used for the following processes:
- overfitting detector
- best iteration selection
- monitoring metrics' changes
Possible types
- catboost.Pool
- list of catboost.Pool
- tuple (X, y)
- list of tuples (X, y)
- string (path to the dataset file)
- list of strings (paths to dataset files)
Default value
None
Supported processing units
CPU and GPU
Note
Only a single validation dataset can be input if the training is performed on GPU
plot
Description
Plot the following information during training:
- the metric values;
- the custom loss values;
- the loss function change during feature selection;
- the time has passed since training started;
- the remaining time until the end of training.
This option can be used if training is performed in Jupyter notebook.
Possible types
bool
Default value
False
Supported processing units
CPU
metric_period
Description
The frequency of iterations to calculate the values of objectives and metrics. The value should be a positive integer.
The usage of this parameter speeds up the training.
Note
It is recommended to increase the value of this parameter to maintain training speed if a GPU processing unit type is used.
Possible types
int
Default value
1
Supported processing units
CPU and GPU
early_stopping_rounds
Description
Sets the overfitting detector type to Iter and stops the training after the specified number of iterations since the iteration with the optimal metric value.
Possible types
int
Default value
False
Supported processing units
CPU and GPU
save_snapshot
Description
Enable snapshotting for restoring the training progress after an interruption. If enabled, the default period for making snapshots is 600 seconds. Use the snapshot_interval
parameter to change this period.
Note
This parameter is not supported in the params
parameter of the cv function.
Possible types
bool
Default value
None
Supported processing units
CPU and GPU
snapshot_file
Description
The name of the file to save the training progress information in. This file is used for recovering training after an interruption.
Depending on whether the specified file exists in the file system:
- Missing — Write information about training progress to the specified file.
- Exists — Load data from the specified file and continue training from where it left off.
Note
This parameter is not supported in the params
parameter of the cv function.
Possible types
string
Default value
experiment...
experiment.cbsnapshot
Supported processing units
CPU and GPU
snapshot_interval
Description
The interval between saving snapshots in seconds.
The first snapshot is taken after the specified number of seconds since the start of training. Every subsequent snapshot is taken after the specified number of seconds since the previous one. The last snapshot is taken at the end of the training.
Note
This parameter is not supported in the params
parameter of the cv function.
Possible types
int
Default value
600
Supported processing units
CPU and GPU
init_model
Description
The model to continue learning from.
Note
The initial model must have the same problem type as the one being solved in the current training (binary classification, multiclassification or regression/ranking).
Possible types
catboost.CatBoost, catboost.CatBoostClassifier, catboost.CatBoostRegressor
The initial model object:
string
The path to the input file that contains the initial model.
Default value
None (incremental learning is not used)
Supported processing units
CPU