predict
Apply the model to the given dataset.
Note
The model prediction results will be correct only if the data
parameter with feature values contains all the features used in the model. Typically, the order of these features must match the order of the corresponding columns that is provided during the training. But if feature names are provided both during the training and when applying the model, they can be matched by names instead of columns order. Feature names can be specified if the data
parameter has one of the following types:
- FeaturesData
- catboost.Pool
- pandas.DataFrame (in this case, feature names are taken from column names)
Method call format
predict(data,
prediction_type=None,
ntree_start=0,
ntree_end=0,
thread_count=-1 (the number of threads is equal to the number of processor cores),
verbose=None)
Parameters
data
Description
Feature values data.
The format depends on the number of input objects:
- Multiple — Matrix-like data of shape
(object_count, feature_count)
- Single — An array
Possible types
For multiple objects:
-
scipy.sparse.spmatrix (all subclasses except dia_matrix)
- catboost.Pool
- list of lists
- numpy.ndarray of shape
(object_count, feature_count)
- pandas.DataFrame
- pandas.SparseDataFrame
- pandas.Series
- catboost.FeaturesData
For a single object:
- list of feature values
- one-dimensional numpy.ndarray with feature values
Default value
Required parameter
prediction_type
Description
The required prediction type.
Supported prediction types:
- Probability
- Class
- RawFormulaVal
- Exponent
- LogProbability
Possible types
string
Default value
None (Exponent for Poisson and Tweedie, RawFormulaVal for all other loss functions)
ntree_start
Description
To reduce the number of trees to use when the model is applied or the metrics are calculated, set the range of the tree indices to [ntree_start; ntree_end)
.
This parameter defines the index of the first tree to be used when applying the model or calculating the metrics (the inclusive left border of the range). Indices are zero-based.
Possible types
int
Default value
0
ntree_end
Description
To reduce the number of trees to use when the model is applied or the metrics are calculated, set the range of the tree indices to [ntree_start; ntree_end)
.
This parameter defines the index of the first tree to be used when applying the model or calculating the metrics (the inclusive left border of the range). Indices are zero-based.
Possible types
int
Default value
0 (the index of the last tree to use equals to the number of trees in the model minus one)
thread_count
Description
The number of threads to calculate prediction.
Optimizes the speed of execution. This parameter doesn't affect results.
Possible types
int
Default value
-1 (the number of threads is equal to the number of processor cores)
verbose
Description
Output the measured evaluation metric to stderr.
Possible types
bool
Default value
None
Return value
Predictions for the given dataset.
The return value type depends on the number of input objects:
-
Single object — The returned value depends on the specified value of the
prediction_type
parameter:-
RawFormulaVal — Raw formula value.
-
Class — Class label.
-
Probability — One-dimensional numpy.ndarray with the probability for every class.
-
-
Multiple objects — The returned value depends on the specified value of the
prediction_type
parameter:-
RawFormulaVal — One-dimensional numpy.ndarray of raw formula values (one for each object).
-
Class — One-dimensional numpy.ndarray of class label (one for each object).
-
Probability — Two-dimensional numpy.ndarray of shape
(number_of_objects, number_of_classes)
with the probability for every class for each object.
-