select_threshold

Return the probability boundary required to achieve the specified false positive or false negative rate.

Method call format

select_threshold(model=None, 
                 data=None, 
                 curve=None, 
                 FPR=None,
                 FNR=None,
                 thread_count=-1)

Parameters

model

Description

The trained model.

Possible types

catboost.CatBoost

Default value

None

data

Description

A set of samples to build the FNR curve with.

Should not be used with the curve parameter.

Possible types

  • catboost.Pool
  • list of catboost.Pool

Default value

None

curve

Description

ROC curve points.

Should not be used with the data parameter.

Required if the data and model parameters are set to None.

It is strictly recommended to use the output of the get_roc_curve function as the value of this parameter.

The input data must certain criteria:

  • The threshold values should not increase.
  • There should not be any repetitions of the fpr-tpr- threshold triplets.

Possible types

tuple of three arrays (fpr, tpr, thresholds)

Default value

None

FPR

Description

Return the boundary at which the given FPR value is reached. Possible values of the parameter are in the range [0; 1].

Should not be used with the FNR parameter.

Possible types

float

Default value

None.

In this case the conditions for measuring the boundary depend on the value of the FNR parameter:

  • None — The boundary should satisfy the FNR=FPR expression
  • float in the [0; 1] range The boundary should satisfy the given FNR value

FNR

Description

Return the boundary at which the given FNR value is reached. Possible values of the parameter are in the range [0; 1].

Should not be used with the FPR parameter.

Possible types

float

Default value

None.

In this case the conditions for measuring the boundary depend on the value of the FPR parameter:

  • None — The boundary should satisfy the FNR=FPR expression
  • float in the [0; 1] range — The boundary should satisfy the given FPR value

thread_count

Description

The number of threads to use.

Optimizes the speed of execution. This parameter doesn't affect results.

Possible types

int

Default value

-1 (the number of threads is equal to the number of processor cores)

Type of return value

float

Usage examples

from catboost import CatBoostClassifier, Pool
from catboost.utils import get_roc_curve, select_threshold

train_data = [[1,4],
              [2,5],
              [4,3],
              [0,4]]
train_labels = [1,1,0,1]
catboost_pool = Pool(train_data, train_labels)

model = CatBoostClassifier(learning_rate=0.03)
model.fit(train_data, train_labels, verbose=False)
roc_curve_values = get_roc_curve(model, catboost_pool)

boundary = select_threshold(model, 
                            curve=roc_curve_values,  
                            FPR=0.01)
print(boundary)

Output:

0.506369291052