eval_metric

Calculate the specified metric on raw approximated values of the formula and label values.

Method call format

eval_metric(label,
            approx,
            metric,
            weight=None,
            group_id=None,
            subgroup_id=None,
            pairs=None,
            thread_count=-1)

Parameters

label

Description

A list of target variables (in other words, the label values of the objects).

Possible types

  • list
  • numpy.ndarray
  • pandas.DataFrame
  • pandas.Series

Default value

Required parameter

approx

Description

A list of approximate values for all input objects.

Possible types

  • list
  • numpy.ndarray
  • pandas.DataFrame
  • pandas.Series

Default value

Required parameter

metric

Description

The evaluation metric to calculate.

Supported metrics
  • RMSE

  • Logloss

  • MAE

  • CrossEntropy

  • Quantile

  • LogLinQuantile

  • Lq

  • MultiRMSE

  • MultiClass

  • MultiClassOneVsAll

  • MultiLogloss

  • MultiCrossEntropy

  • MAPE

  • Poisson

  • PairLogit

  • PairLogitPairwise

  • QueryRMSE

  • QuerySoftMax

  • Tweedie

  • SMAPE

  • Recall

  • Precision

  • F

  • F1

  • TotalF1

  • Accuracy

  • BalancedAccuracy

  • BalancedErrorRate

  • Kappa

  • WKappa

  • LogLikelihoodOfPrediction

  • AUC

  • QueryAUC

  • R2

  • FairLoss

  • NumErrors

  • MCC

  • BrierScore

  • HingeLoss

  • HammingLoss

  • ZeroOneLoss

  • MSLE

  • MedianAbsoluteError

  • Cox

  • Huber

  • Expectile

  • MultiRMSE

  • PairAccuracy

  • AverageGain

  • PFound

  • NDCG

  • DCG

  • FilteredDCG

  • NormalizedGini

  • PrecisionAt

  • RecallAt

  • MAP

  • CtrFactor

  • YetiRank

  • YetiRankPairwise

  • StochasticFilter

  • StochasticRank

  • LambdaMart

Possible types

string

Default value

Required parameter

weight

Description

The weights of objects.

Possible types

  • list
  • numpy.ndarray
  • pandas.DataFrame
  • pandas.Series

Default value

None

group_id

Description

Group identifiers for all input objects. Supported identifier types are:

  • int
  • string types (string or unicode for Python 2 and bytes or string for Python 3).

Possible types

  • list
  • numpy.ndarray
  • pandas.DataFrame
  • pandas.Series

Default value

None

subgroup_id

Description

Subgroup identifiers for all input objects.

Possible types

  • list
  • numpy.ndarray

Default value

None

pairs

Description

The description is different for each group of possible types.

Possible types

list, numpy.ndarray, pandas.DataFrame

The pairs description in the form of a two-dimensional matrix of shape N by 2:

  • N is the number of pairs.
  • The first element of the pair is the zero-based index of the winner object from the input dataset for pairwise comparison.
  • The second element of the pair is the zero-based index of the loser object from the input dataset for pairwise comparison.

This information is used for calculation and optimization of Pairwise metrics.

string

The path to the input file that contains the pairs description.

This information is used for calculation and optimization of Pairwise metrics.

Default value

None

thread_count

Description

The number of threads to use.

Optimizes the speed of execution. This parameter doesn't affect results.

Possible types

int

Default value

-1 (the number of threads is equal to the number of processor cores)

Type of return value

list with metric values.

Usage examples

The following is an example of usage with a regression metric:

from catboost.utils import eval_metric

labels = [0.2, -1, 0.4]
predictions = [0.4, 0.1, 0.9]

rmse = eval_metric(labels, predictions, 'RMSE')

The following is an example of usage with a classification metric:

from catboost.utils import eval_metric
from math import log

labels = [1, 0, 1]
probabilities = [0.4, 0.1, 0.9]

# In binary classification it is necessary to apply the logit function
# to the probabilities to get approxes.

logit = lambda x: log(x / (1 - x))
approxes = list(map(logit, probabilities))

accuracy = eval_metric(labels, approxes, 'Accuracy')

The following is an example of usage with a ranking metric:

from catboost.utils import eval_metric

# The dataset consists of five objects. The first two belong to one group
# and the other three to another.
group_ids = [1, 1, 2, 2, 2]

labels = [0.9, 0.1, 0.5, 0.4, 0.8]

# In ranking tasks it is not necessary to predict the same labels.
# It is important to predict the right order of objects.
good_predictions = [0.5, 0.4, 0.2, 0.1, 0.3]
bad_predictions = [0.4, 0.5, 0.2, 0.3, 0.1]

good_ndcg = eval_metric(labels, good_predictions, 'NDCG', group_id=group_ids)
bad_ndcg = eval_metric(labels, bad_predictions, 'NDCG', group_id=group_ids)