fit

Train a model.

Note. Set the task_type parameter in the class constructor to GPU to train the model on GPU. Training on GPU requires NVIDIA Driver of version 390.xx or higher.

Method call format

fit(X, 
    y=None, 
    cat_features=None, 
    pairs=None, 
    sample_weight=None, 
    group_id=None,
    group_weight=None,
    subgroup_id=None,
    pairs_weight=None 
    baseline=None, 
    use_best_model=None, 
    eval_set=None,
    verbose=None,
    logging_level=None, 
    plot=False,
    column_description=None,
    verbose_eval=None, 
    metric_period=None, 
    silent=None, 
    early_stopping_rounds=None
    save_snapshot=None, 
    snapshot_file=None, 
    snapshot_interval=None,
    init_model=None)

Parameters

Some parameters duplicate the ones specified in the constructor of the CatBoost class. In these cases the values specified for the fit method take precedence. The rest of the training parameters must be set in the constructor of the CatBoost class.

Parameter Possible types Description Default value Supported processing units
X catboost.Pool

The input training dataset.

Note.

If a nontrivial value of the cat_features parameter is specified in the constructor of this class, CatBoost checks the equivalence of categorical features indices specification from the constructor parameters and in this Pool class.

Required parameter

CPU and GPU

  • list
  • numpy.array
  • pandas.DataFrame
  • pandas.Series
  • string

The input training dataset in the form of a two-dimensional feature matrix.

catboost.FeaturesData

The input training dataset.

Note.

If a nontrivial value of the cat_features parameter is specified in the constructor of this class, it is prohibited to pass objects of this type.

y
  • list
  • numpy.array
  • pandas.DataFrame
  • pandas.Series

The target variables (in other words, the objects' label values) for the training dataset.

Must be in the form of a one-dimensional array. The type of data in the array depends on the machine learning task being solved:
  • Regression and ranking  — Numeric values.
  • Binary classification — Numeric values.

    The interpretation of numeric values depends on the selected loss function:

    • Logloss — The value is considered a positive class if it is strictly grater than the value of the border parameter of the loss function. Otherwise, it is considered a negative class.
    • CrossEntropy — The value is interpreted as the probability that the dataset object belongs to the positive class. Possible values are in the range [0; 1].
  • Multiclassification — Integers or strings that represents the labels of the classes.
Note.

Do not use this parameter if the input training dataset (specified in the X parameter) type is catboost.Pool.

None

CPU and GPU

cat_features
  • list
  • numpy.array

A one-dimensional array of categorical columns indices.

Use it only if the X parameter is a two-dimensional feature matrix (has one of the following types: list, numpy.ndarray, pandas.DataFrame, pandas.Series).

Note.

The cat_features parameter can also be specified in the constructor of the class. If it is, CatBoost checks the equivalence of the cat_features parameter specified in this method and in the constructor of the class.

None (all features are considered numerical)

CPU and GPU

pairs
  • list
  • numpy.array
  • pandas.DataFrame

The pairs description in the form of a two-dimensional matrix of shape N by 2:

  • N is the number of pairs.
  • The first element of the pair is the zero-based index of the winner object from the input dataset for pairwise comparison.
  • The second element of the pair is the zero-based index of the loser object from the input dataset for pairwise comparison.

This information is used for calculation and optimization of Pairwise metrics .

None

Pairwise metrics require pairs data. If this data is not provided explicitly by specifying this parameter, pairs are generated automatically in each group using object label values.

CPU and GPU

sample_weight
  • list
  • numpy.array
  • pandas.DataFrame
  • pandas.Series

The weight of each object in the input data in the form of a one-dimensional array-like data.

By default, it is set to 1 for all objects.

None

CPU and GPU

group_id
  • list
  • numpy.array
Group identifiers for all input objects. Supported identifier types are:
  • int
  • string types (string or unicode for Python 2 and bytes or string for Python 3).

None

CPU

group_weight
  • list
  • numpy.array