Ranking: objectives and metrics

Pairwise metrics

Pairwise metrics use special labeled information — pairs of dataset objects where one object is considered the winner and the other is considered the loser. This information might be not exhaustive (not all possible pairs of objects are labeled in such a way). It is also possible to specify the weight for each pair.

If GroupId is specified, then all pairs must have both members from the same group if this dataset is used in pairwise modes.

Read more about GroupId

The identifier of the object's group. An arbitrary string, possibly representing an integer.

If the labeled pairs data is not specified for the dataset, then pairs are generated automatically in each group using per-object label values (labels must be specified and must be numerical). The object with a greater label value in the pair is considered the winner.

The following variables are used in formulas of the described pairwise metrics:

• $p$ is the positive object in the pair.
• $n$ is the negative object in the pair.

See all common variables in Variables used in formulas.

PairLogit

$\displaystyle\frac{-\sum\limits_{p, n \in Pairs} w_{pn} \left(log(\displaystyle\frac{1}{1 + e^{- (a_{p} - a_{n})}})\right)}{\sum\limits_{p, n \in Pairs} w_{pn}}$

Note

The object weights are not used to calculate and optimize the value of this metric. The weights of object pairs are used instead.

Usage information See more.

User-defined parameters

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

max_pairs

The maximum number of generated pairs in each group. Takes effect if no pairs are given and therefore are generated without repetition.

Default: All possible pairs are generated in each group

PairLogitPairwise

$\displaystyle\frac{-\sum\limits_{p, n \in Pairs} w_{pn} \left(log(\displaystyle\frac{1}{1 + e^{- (a_{p} - a_{n})}})\right)}{\sum\limits_{p, n \in Pairs} w_{pn}}$

This metric may give more accurate results on large datasets compared to PairLogit but it is calculated significantly slower.

This technique is described in the Winning The Transfer Learning Track of Yahoo!’s Learning To Rank Challenge with YetiRank paper.

Usage information See more.

Note

The object weights are not used to calculate and optimize the value of this metric. The weights of object pairs are used instead.

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

max_pairs

The maximum number of generated pairs in each group. Takes effect if no pairs are given and therefore are generated without repetition.

Default: All possible pairs are generated in each group

PairAccuracy

$\displaystyle\frac{\sum\limits_{p, n \in Pairs} w_{pn} [a_{p} > a_{n}] }{\sum\limits_{p, n \in Pairs} w_{pn} }$

Note

The object weights are not used to calculate the value of this metric. The weights of object pairs are used instead.

Can't be used for optimization. See more.

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

Groupwise metrics

YetiRank

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

An approximation of ranking metrics (such as NDCG and PFound). Allows to use ranking metrics for optimization.

The value of this metric can not be calculated. The metric that is written to output data if YetiRank is optimized depends on the range of all N target values ($i \in [1; N]$) of the dataset:

• $target_{i} \in [0; 1]$ — PFound
• $target_{i} \notin [0; 1]$ — NDCG

This metric gives less accurate results on big datasets compared to YetiRankPairwise but it is significantly faster.

Note

The object weights are not used to optimize this metric. The group weights are used instead.

This objective is used to optimize PairLogit. Automatically generated object pairs are used for this purpose. These pairs are generated independently for each object group. Use the Group weights file or the GroupWeight column of the Columns description file to change the group importance. In this case, the weight of each generated pair is multiplied by the value of the corresponding group weight.

Usage information See more.

User-defined parameters

decay

The probability of search continuation after reaching the current object.

Default: 0.85

permutations

The number of permutations.

Default: 10

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

YetiRankPairwise

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

An approximation of ranking metrics (such as NDCG and PFound). Allows to use ranking metrics for optimization.

The value of this metric can not be calculated. The metric that is written to output data if YetiRank is optimized depends on the range of all N target values ($i \in [1; N]$) of the dataset:

• $target_{i} \in [0; 1]$ — PFound
• $target_{i} \notin [0; 1]$ — NDCG

This metric gives more accurate results on big datasets compared to YetiRank but it is significantly slower.

This technique is described in the Winning The Transfer Learning Track of Yahoo!’s Learning To Rank Challenge with YetiRank paper.

Note

The object weights are not used to optimize this metric. The group weights are used instead.

This objective is used to optimize PairLogit. Automatically generated object pairs are used for this purpose. These pairs are generated independently for each object group. Use the Group weights file or the GroupWeight column of the Columns description file to change the group importance. In this case, the weight of each generated pair is multiplied by the value of the corresponding group weight.

Usage information See more.

User-defined parameters

decay

The probability of search continuation after reaching the current object.

Default: 0.85

permutations

The number of permutations.

Default: 10

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

StochasticFilter

Directly optimize the FilteredDCG metric calculated for a pre-defined order of objects for filtration of objects under a fixed ranking. As a result, the FilteredDCG metric can be used for optimization.

$FilteredDCG = \sum\limits_{i=1}^{n}\displaystyle\frac{t_{i}}{i} { , where}$

$t_{i}$ is the relevance of an object in the group and the sum is computed over the documents with $a > 0$.

The filtration is defined via the raw formula value:

Zeros correspond to filtered instances and ones correspond to the remaining ones.

The ranking is defined by the order of objects in the dataset.

Warning

Sort objects by the column you are interested in before training with this loss function and use the --has-timefor the Command-line version option to avoid further objects reordering.

For optimization, a distribution of filtrations is defined:

$\mathbb{P}(\text{filter}|x) = \sigma(a) { , where}$

• $\sigma(z) = \displaystyle\frac{1}{1 + \text{e}^{-z}}$
• The gradient is estimated via REINFORCE.

Refer to the Learning to Select for a Predefined Ranking paper for calculation details.

Usage information See more.

User-defined parameters

sigma

The scale for multiplying predictions.

Default: 1

num_estimations

The number of gradient samples.

Default: 1

StochasticRank

Directly optimize the selected metric. The value of the selected metric is written to output data

Refer to the StochasticRank: Global Optimization of Scale-Free Discrete Functions paper for details.

Usage information See more.

User-defined parameters

Common parameters:

metric

The metric that should be optimized.

Default: Obligatory parameter
Supported values: DCG, NDCG, PFound.

num_estimations

The number of gradient estimation iterations.

Default: 1

mu

Controls the penalty for coinciding predictions (aka ties).

Default: 0

Metric-specific parameters:

Available if the corresponding metric is set in the metric parameter.

DCG

top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

Default: –1 (all label values are used).

type

Metric calculation principles.

Default: Base.
Possible values: Base, Exp.

denominator

Metric denominator type.

Default: Default: LogPosition.
Possible values: LogPosition, Position.

NDCG

top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

Default: –1 (all label values are used).

type

Metric calculation principles.

Default: Base.
Possible values: Base, Exp.

denominator

Metric denominator type.

Default: LogPosition.
Possible values: LogPosition, Position.

PFound

decay

The probability of search continuation after reaching the current object.

Default: 0.85

top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

Default: –1 (all label values are used).

QueryCrossEntropy

$QueryCrossEntropy(\alpha) = (1 - \alpha) \cdot LogLoss + \alpha \cdot LogLoss_{group}$

See the QueryCrossEntropy section for more details.

Usage information See more.

User-defined parameters

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

alpha

The coefficient used in quantile-based losses.

Default: 0.95

QueryRMSE

$\displaystyle\sqrt{\displaystyle\frac{\sum\limits_{Group \in Groups} \sum\limits_{i \in Group} w_{i} \left( t_{i} - a_{i} - \displaystyle\frac{\sum\limits_{j \in Group} w_{j} (t_{j} - a_{j})}{\sum\limits_{j \in Group} w_{j}} \right)^{2}} {\sum\limits_{Group \in Groups} \sum\limits_{i \in Group} w_{i}}}$

Usage information See more.

User-defined parameters

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

QuerySoftMax

$- \displaystyle\frac{\sum\limits_{Group \in Groups} \sum\limits_{i \in Group}w_{i} t_{i} \log \left(\displaystyle\frac{w_{i} e^{\beta a_{i}}}{\sum\limits_{j\in Group} w_{j} e^{\beta a_{j}}}\right)} {\sum\limits_{Group \in Groups} \sum_{i\in Group} w_{i} t_{i}}$

Usage information See more.

User-defined parameters

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

beta

The input scale coefficient.

Default: 1

PFound

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

$PFound(top, decay) =$

$= \sum_{group \in groups} PFound(group, top, decay)$

See the PFound section for more details

Can't be used for optimization. See more.

User-defined parameters

decay

The probability of search continuation after reaching the current object.

Default: 0.85

top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

Default: –1 (all label values are used).

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

NDCG

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

$nDCG(top) = \frac{DCG(top)}{IDCG(top)}$

See the NDCG section for more details.

Can't be used for optimization. See more.

User-defined parameters

top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

Default: –1 (all label values are used).

type

Metric calculation principles.

Default: Base.
Possible values: Base, Exp.

denominator

Metric denominator type.

Default: Position.
Possible values: LogPosition, Position.

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

DCG

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

$DCG(top)$

See the NDCG section for more details.

Can't be used for optimization. See more.

User-defined parameters

top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

Default: –1 (all label values are used).

type

Metric calculation principles.

Default: Base.
Possible values: Base, Exp.

denominator

Metric denominator type.

Default: Position.
Possible values: LogPosition, Position.

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

FilteredDCG

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

See the FilteredDCG section for more details.

Can't be used for optimization. See more.

User-defined parameters

type

Metric calculation principles.

Default: Base.
Possible values: Base, Exp.

denominator

Metric denominator type.

Default: Position.
Possible values: LogPosition, Position.

AverageGain

Represents the average value of the label values for objects with the defined top $M$ label values.

See the AverageGain section for more details.

Can't be used for optimization. See more.

User-defined parameters

top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

Default: This parameter is obligatory (the default value is not defined).

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

PrecisionAt

The calculation of this function consists of the following steps:

1. The objectsare sorted in descending order of predicted relevancies ($a_{i}$)

2. The metric is calculated as follows:

$PrecisionAt(top, border) = \frac{\sum\limits_{i=1}^{top} Relevant_{i}}{top} { , where}$

• $Relevant_{i} = \begin{cases} 1 { , } & t_{i} > {border} \\ 0 { , } & {in other cases} \end{cases}$

Can't be used for optimization. See more.

User-defined parameters

top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

Default: –1 (all label values are used).

border

The label value border. If the value is strictly greater than this threshold, it is considered a positive class. Otherwise it is considered a negative class.

Default: 0

RecallAt

The calculation of this function consists of the following steps:

1. The objectsare sorted in descending order of predicted relevancies ($a_{i}$)

2. The metric is calculated as follows:
$RecalAt(top, border) = \frac{\sum\limits_{i=1}^{top} Relevant_{i}}{\sum\limits_{i=1}^{N} Relevant_{i}}$

• $Relevant_{i} = \begin{cases} 1 { , } & t_{i} > {border} \\ 0 { , } & {in other cases} \end{cases}$

Can't be used for optimization. See more.

User-defined parameters

top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

Default: –1 (all label values are used).

border

The label value border. If the value is strictly greater than this threshold, it is considered a positive class. Otherwise it is considered a negative class.

Default: 0

MAP

1. The objectsare sorted in descending order of predicted relevancies ($a_{i}$)

2. The metric is calculated as follows:
$MAP(top, border) = \frac{1}{N_{groups}} \sum\limits_{j = 1}^{N_{groups}} AveragePrecisionAt_{j}(top, border) { , where}$

• $N_{groups}$ is the number of groups
• $AveragePrecisionAt(top, border) = \frac{\sum\limits_{i=1}^{top} Relevant_{i} * PrecisionAt_{i}}{\sum\limits_{i=1}^{top} Relevant_{i} }$

The value is calculated individually for each j-th group.

• $Relevant_{i} = \begin{cases} 1 { , } & t_{i} > {border} \\ 0 { , } & {in other cases} \end{cases}$
• $PrecisionAt_{i} = \frac{\sum\limits_{j=1}^{i} Relevant_{j}}{i}$

Can't be used for optimization. See more.

User-defined parameters

top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

Default: –1 (all label values are used).

border

The label value border. If the value is strictly greater than this threshold, it is considered a positive class. Otherwise it is considered a negative class.

Default: 0

ERR

$ERR = \frac{1}{|Q|} \sum_{q=1}^{|Q|} ERR_q$

$ERR_q = \sum_{i=1}^{top} \frac{1}{i} t_{q,i} \prod_{j=1}^{i-1} (1 - t_{q,j})$

Targets should be from the range [0, 1].

$t_{q,i} \in [0, 1]$

Can't be used for optimization. See more.

User-defined parameters

top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

Default: –1 (all label values are used).

MRR

$MRR = \frac{1}{|Q|} \sum_{q=1}^{|Q|} \frac{1}{rank_q}$, where $rank_q$ refers to the rank position of the first relevant document for the q-th query.

Can't be used for optimization. See more.

User-defined parameters

top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

Default: –1 (all label values are used).

border

The label value border. If the value is strictly greater than this threshold, it is considered a positive class. Otherwise it is considered a negative class.

Default: 0

AUC

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

The type of AUC. Defines the metric calculation principles.

Classic type

$\displaystyle\frac{\sum I(a_{i}, a_{j}) \cdot w_{i} \cdot w_{j}} {\sum w_{i} \cdot w_{j}}$
The sum is calculated on all pairs of objects $(i,j)$ such that:

• $t_{i} = 0$
• $t_{j} = 1$
• $I(x, y) = \begin{cases} 0 { , } & x < y \\ 0.5 { , } & x=y \\ 1 { , } & x>y \end{cases}$

Refer to the Wikipedia article for details.

If the target type is not binary, then every object with target value $t$ and weight $w$ is replaced with two objects for the metric calculation:

• $o_{1}$ with weight $t \cdot w$ and target value 1
• $o_{2}$ with weight $(1 – t) \cdot w$ and target value 0.

Target values must be in the range [0; 1].

Ranking type

$\displaystyle\frac{\sum I(a_{i}, a_{j}) \cdot w_{i} \cdot w_{j}} {\sum w_{i} * w_{j}}$

The sum is calculated on all pairs of objects $(i,j)$ such that:

• $t_{i} < t_{j}$
• $I(x, y) = \begin{cases} 0 { , } & x < y \\ 0.5 { , } & x=y \\ 1 { , } & x>y \end{cases}$

Can't be used for optimization. See more.

User-defined parameters

type

The type of AUC. Defines the metrics calculation principles.

Default: Classic.
Possible values: Classic, Ranking.
Examples: AUC:type=Classic, AUC:type=Ranking.

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: False for Classic type, True for Ranking type.
Examples: AUC:type=Ranking;use_weights=False.

QueryAUC

Classic type

$\displaystyle\frac{ \sum_q \sum_{i, j \in q} \sum I(a_{i}, a_{j}) \cdot w_{i} \cdot w_{j}} { \sum_q \sum_{i, j \in q} \sum w_{i} \cdot w_{j}}$
The sum is calculated on all pairs of objects $(i,j)$ such that:

• $t_{i} = 0$
• $t_{j} = 1$
• $I(x, y) = \begin{cases} 0 { , } & x < y \\ 0.5 { , } & x=y \\ 1 { , } & x>y \end{cases}$

Refer to the Wikipedia article for details.

If the target type is not binary, then every object with target value $t$ and weight $w$ is replaced with two objects for the metric calculation:

• $o_{1}$ with weight $t \cdot w$ and target value 1
• $o_{2}$ with weight $(1 – t) \cdot w$ and target value 0.

Target values must be in the range [0; 1].

Ranking type

$\displaystyle\frac{ \sum_q \sum_{i, j \in q} \sum I(a_{i}, a_{j}) \cdot w_{i} \cdot w_{j}} { \sum_q \sum_{i, j \in q} \sum w_{i} * w_{j}}$

The sum is calculated on all pairs of objects $(i,j)$ such that:

• $t_{i} < t_{j}$
• $I(x, y) = \begin{cases} 0 { , } & x < y \\ 0.5 { , } & x=y \\ 1 { , } & x>y \end{cases}$

Can't be used for optimization. See more.

User-defined parameters

type

The type of QueryAUC. Defines the metric calculation principles.

Default: Ranking.
Possible values: Classic, Ranking.
Examples: QueryAUC:type=Classic, QueryAUC:type=Ranking.

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: False.
Examples: QueryAUC:type=Ranking;use_weights=False.

Used for optimization

Name Optimization GPU Support
PairLogit + +
PairLogitPairwise + +
PairAccuracy - -
YetiRank + +
YetiRankPairwise + +
StochasticFilter + -
StochasticRank + -
QueryCrossEntropy + +
QueryRMSE + +
QuerySoftMax + +
PFound - -
NDCG - -
DCG - -
FilteredDCG - -
AverageGain - -
PrecisionAt - -
RecallAt - -
MAP - -
ERR - -
MRR - -
AUC - -
QueryAUC - -