randomized_search

A simple randomized search on hyperparameters.

In contrast to grid search, not all parameter values are tried out, but rather a fixed number of parameter settings is sampled from the specified distributions. The number of parameter settings that are tried is specified in the n_iter parameter.

Note. After searching, the model is trained and ready to use.

Method call format

randomized_search(param_distributions,
                  X,
                  y=None,
                  cv=3,
                  n_iter=10,
                  partition_random_seed=0,
                  calc_cv_statistics=True, 
                  search_by_train_test_split=True,
                  refit=True, 
                  shuffle=True, 
                  stratified=None, 
                  train_size=0.8, 
                  verbose=True)

Parameters

Parameter Possible types Description Default value

param_distributions

dict

Dictionary with parameters names (string) as keys and distributions or lists of parameter settings to try. Distributions must provide a rvs method for sampling (such as those from scipy.stats.distributions).

If a list is given, it is sampled uniformly.

Required parameter

X

catboost.Pool

The input training dataset.

Note.

If a nontrivial value of the cat_features parameter is specified in the constructor of this class, CatBoost checks the equivalence of categorical features indices specification from the constructor parameters and in this Pool class.

Required parameter
  • numpy.array
  • pandas.DataFrame

The input training dataset in the form of a two-dimensional feature matrix.

y
  • numpy.array
  • pandas.Series

The target variables (in other words, the objects' label values) for the training dataset.

Must be in the form of a one-dimensional array. The type of data in the array depends on the machine learning task being solved:
  • Regression and ranking  — Numeric values.
  • Binary classification — Numeric values.

    The interpretation of numeric values depends on the selected loss function:

    • Logloss — The value is considered a positive class if it is strictly grater than the value of the border parameter of the loss function. Otherwise, it is considered a negative class.
    • CrossEntropy — The value is interpreted as the probability that the dataset object belongs to the positive class. Possible values are in the range [0; 1].
  • Multiclassification — Integers or strings that represents the labels of the classes.
Note.

Do not use this parameter if the input training dataset (specified in the X parameter) type is catboost.Pool.