predict

Apply the model to the given dataset.

Note.
The model prediction results will be correct only if the data parameter with feature values contains all the features used in the model. Typically, the order of these features must match the order of the corresponding columns that is provided during the training. But if feature names are provided both during the training and when applying the model, they can be matched by names instead of columns order. Feature names can be specified if the data parameter has one of the following types:

Method call format

predict(data,
    prediction_type='RawFormulaVal', 
    ntree_start=0, 
    ntree_end=0, 
    thread_count=-1 (the number of threads is equal to the number of processor cores),
    verbose=None)

Parameters

Parameter Possible types Description Default value
data

For multiple objects:

  • catboost.Pool
  • list of lists
  • numpy.ndarray of shape (object_count, feature_count)
  • pandas.DataFrame
  • pandas.Series

For a single object:

  • list of feature values
  • one-dimensional numpy.ndarray with feature values

Feature values data.

The format depends on the number of input objects:

  • Multiple — Matrix-like data of shape (object_count, feature_count)
  • Single — An array
Required parameter
prediction_type string

The required prediction type.

Supported prediction types:
  • Probability
  • Class
  • RawFormulaVal
RawFormulaVal
ntree_start int

To reduce the number of trees to use when the model is applied or the metrics are calculated, set the range of the tree indices to [ntree_start; ntree_end).

This parameter defines the index of the first tree to be used when applying the model or calculating the metrics (the inclusive left border of the range). Indices are zero-based.

0
ntree_end int

To reduce the number of trees to use when the model is applied or the metrics are calculated, set the range of the tree indices to [ntree_start; ntree_end) and the step of the trees to use to eval_period.

This parameter defines the index of the first tree not to be used when applying the model or calculating the metrics (the exclusive right border of the range). Indices are zero-based.

0 (the index of the last tree to use equals to the number of trees in the model minus one)
thread_count int

The number of threads to use during training.

Optimizes the speed of execution. This parameter doesn't affect results.

-1 (the number of threads is equal to the number of processor cores)
verbose bool

Output the measured evaluation metric to stderr.

None
Parameter Possible types Description Default value
data

For multiple objects:

  • catboost.Pool
  • list of lists
  • numpy.ndarray of shape (object_count, feature_count)
  • pandas.DataFrame
  • pandas.Series

For a single object:

  • list of feature values
  • one-dimensional numpy.ndarray with feature values

Feature values data.

The format depends on the number of input objects:

  • Multiple — Matrix-like data of shape (object_count, feature_count)
  • Single — An array
Required parameter
prediction_type string

The required prediction type.

Supported prediction types:
  • Probability
  • Class
  • RawFormulaVal
RawFormulaVal
ntree_start int

To reduce the number of trees to use when the model is applied or the metrics are calculated, set the range of the tree indices to [ntree_start; ntree_end).

This parameter defines the index of the first tree to be used when applying the model or calculating the metrics (the inclusive left border of the range). Indices are zero-based.

0
ntree_end