grid_search

A simple grid search over specified parameter values for a model.

Note. After searching, the model is trained and ready to use.

Method call format

grid_search(param_grid,
            X,
            y=None,
            cv=3,
            partition_random_seed=0,
            calc_cv_statistics=True,
            search_by_train_test_split=True,
            refit=True,
            shuffle=True,
            stratified=None,
            train_size=0.8,
            verbose=True,
            plot=False)

Parameters

Parameter Possible types Description Default value

param_grid

  • dict
  • list

Dictionary with parameters names (string) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored.

This enables searching over any sequence of parameter settings.

Required parameter

X

catboost.Pool

The input training dataset.

Note.

If a nontrivial value of the cat_features parameter is specified in the constructor of this class, CatBoost checks the equivalence of categorical features indices specification from the constructor parameters and in this Pool class.

Required parameter
  • numpy.array
  • pandas.DataFrame

The input training dataset in the form of a two-dimensional feature matrix.

y
  • numpy.array
  • pandas.Series

The target variables (in other words, the objects' label values) for the training dataset.

Must be in the form of a one-dimensional array. The type of data in the array depends on the machine learning task being solved:
  • Regression and ranking  — Numeric values.
  • Binary classification — Numeric values.

    The interpretation of numeric values depends on the selected loss function:

    • Logloss — The value is considered a positive class if it is strictly grater than the value of the border parameter of the loss function. Otherwise, it is considered a negative class.
    • CrossEntropy — The value is interpreted as the probability that the dataset object belongs to the positive class. Possible values are in the range [0; 1].
  • Multiclassification — Integers or strings that represents the labels of the classes.
Note.

Do not use this parameter if the input training dataset (specified in the X parameter) type is catboost.Pool.

None
cv
  • int
  • scikit-learn splitter object
  • cross-validation generator
  • iterable

The cross-validation splitting strategy.

The interpretation of this parameter depends on the input data type:
  • None — Use the default three-fold cross-validation.
  • int — The number of folds in a (Stratified)KFold
  • object — One of the scikit-learn Splitter Classes with the split method.

  • An iterable yielding train and test splits as arrays of indices.
None

partition_random_seed

int

Use this as the seed value for random permutation of the data.

The permutation is performed before splitting the data for cross-validation.

Each seed generates unique data splits.</