# Ranking: objectives and metrics

## Pairwise metrics

Pairwise metrics use special labeled information — pairs of dataset objects where one object is considered the winner and the other is considered the loser. This information might be not exhaustive (not all possible pairs of objects are labeled in such a way). It is also possible to specify the weight for each pair.

If GroupId is specified, then all pairs must have both members from the same group if this dataset is used in pairwise modes.

The identifier of the object's group. An arbitrary string, possibly representing an integer.

If the labeled pairs data is not specified for the dataset, then pairs are generated automatically in each group using per-object label values (labels must be specified and must be numerical). The object with a greater label value in the pair is considered the winner.

### Specific variables used

The following variables are used in formulas of the described pairwise metrics:

• $p$ is the positive object in the pair.
• $n$ is the negative object in the pair.

### PairLogit

Calculation principles

$\displaystyle\frac{-\sum\limits_{p, n \in Pairs} w_{pn} \left(log(\displaystyle\frac{1}{1 + e^{- (a_{p} - a_{n})}})\right)}{\sum\limits_{p, n \in Pairs} w_{pn}}$

Note

The object weights are not used to calculate and optimize the value of this metric. The weights of object pairs are used instead.

User-defined parameters:

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

max_pairs

The maximum number of generated pairs in each group. Takes effect if no pairs are given and therefore are generated without repetition.

Default: All possible pairs are generated in each group

### PairLogitPairwise

Calculation principles

$\displaystyle\frac{-\sum\limits_{p, n \in Pairs} w_{pn} \left(log(\displaystyle\frac{1}{1 + e^{- (a_{p} - a_{n})}})\right)}{\sum\limits_{p, n \in Pairs} w_{pn}}$

This metric may give more accurate results on large datasets compared to PairLogit but it is calculated significantly slower.

This technique is described in the Winning The Transfer Learning Track of Yahoo!’s Learning To Rank Challenge with YetiRank paper.

Note

The object weights are not used to calculate and optimize the value of this metric. The weights of object pairs are used instead.

User-defined parameters:

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

max_pairs

The maximum number of generated pairs in each group. Takes effect if no pairs are given and therefore are generated without repetition.

Default: All possible pairs are generated in each group

### PairAccuracy

Calculation principles

$\displaystyle\frac{\sum\limits_{p, n \in Pairs} w_{pn} [a_{p} > a_{n}] }{\sum\limits_{p, n \in Pairs} w_{pn} }$

Note

The object weights are not used to calculate the value of this metric. The weights of object pairs are used instead.

User-defined parameters:

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

### Used for optimization

Name Optimization
PairLogit +
PairLogitPairwise +
PairAccuracy -

## Groupwise metrics

### YetiRank

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

An approximation of ranking metrics (such as NDCG and PFound). Allows to use ranking metrics for optimization.

The value of this metric can not be calculated. The metric that is written to output data if YetiRank is optimized depends on the range of all N target values ($i \in [1; N]$) of the dataset:

• $target_{i} \in [0; 1]$ — PFound
• $target_{i} \notin [0; 1]$ — NDCG

This metric gives less accurate results on big datasets compared to YetiRankPairwise but it is significantly faster.

Note

The object weights are not used to optimize this metric. The group weights are used instead.

This objective is used to optimize PairLogit. Automatically generated object pairs are used for this purpose. These pairs are generated independently for each object group. Use the Group weights file or the GroupWeight column of the Columns description file to change the group importance. In this case, the weight of each generated pair is multiplied by the value of the corresponding group weight.

User-defined parameters:

• decay

The probability of search continuation after reaching the current object.

Default:0.99

• permutations

The number of permutations.

10

• use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

true

### YetiRankPairwise

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

An approximation of ranking metrics (such as NDCG and PFound). Allows to use ranking metrics for optimization.

The value of this metric can not be calculated. The metric that is written to output data if YetiRank is optimized depends on the range of all N target values ($i \in [1; N]$) of the dataset:

• $target_{i} \in [0; 1]$ — PFound
• $target_{i} \notin [0; 1]$ — NDCG

This metric gives more accurate results on big datasets compared to YetiRank but it is significantly slower.

This technique is described in the Winning The Transfer Learning Track of Yahoo!’s Learning To Rank Challenge with YetiRank paper.

Note

The object weights are not used to optimize this metric. The group weights are used instead.

This objective is used to optimize PairLogit. Automatically generated object pairs are used for this purpose. These pairs are generated independently for each object group. Use the Group weights file or the GroupWeight column of the Columns description file to change the group importance. In this case, the weight of each generated pair is multiplied by the value of the corresponding group weight.

User-defined parameters:

• The probability of search continuation after reaching the current object.

The probability of search continuation after reaching the current object.

Default:0.99

• The number of permutations.

The number of permutations.

10

• Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

true

### StochasticFilter

Directly optimize the FilteredDCG metric calculated for a pre-defined order of objects for filtration of objects under a fixed ranking. As a result, the FilteredDCG metric can be used for optimization.

$FilteredDCG = \sum\limits_{i=1}^{n}\displaystyle\frac{t_{i}}{i} { , where}$

$t_{i}$ is the relevance of an object in the group and the sum is computed over the documents with $a > 0$.

The filtration is defined via the raw formula value: Zeros correspond to filtered instances and ones correspond to the remaining ones.

The ranking is defined by the order of objects in the dataset.

Sort objects by the column you are interested in before training with this loss function and use the --has-timefor the Command-line version option to avoid further objects reordering.

For optimization, a distribution of filtrations is defined:

$\mathbb{P}(\text{filter}|x) = \sigma(a) { , where}$

• $\sigma(z) = \displaystyle\frac{1}{1 + \text{e}^{-z}}$
• The gradient is estimated via REINFORCE.

Refer to the Learning to Select for a Predefined Ranking paper for calculation details.

User-defined parameters:

• sigma

The scale for multiplying predictions.

1

• num_estimations

1

### StochasticRank

Directly optimize the selected metric. The value of the selected metric is written to output data

Refer to the StochasticRank: Global Optimization of Scale-Free Discrete Functions paper for details.

User-defined parameters:

Common parameters:

• metric

The metric that should be optimized.

Supported values:

• DCG
• NDCG
• PFound

Obligatory parameter

• num_estimations

The number of gradient estimation iterations.

1

• mu

Controls the penalty for coinciding predictions (aka ties).

0

Metric-specific parameters (available if the corresponding metric is set in the metric parameter):

#### DCG

• top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

–1 (all label values are used)

• type

Metric calculation principles.

Possible values:
- Base
- Exp

Base

• denominator

Metric denominator type.

Possible values:
- LogPosition
- Position

LogPosition

#### NDCG

• top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

–1 (all label values are used)

• type

Metric calculation principles.

Possible values:
- Base
- Exp

Base

• denominator

Metric denominator type.

Possible values:
- LogPosition
- Position

LogPosition

#### PFound

• decay

The probability of search continuation after reaching the current object.

Default:0.85

• top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

–1 (all label values are used)

### QueryCrossEntropy

Calculation principles

QueryCrossEntropy(\alpha) = (1 – \alpha) \cdot LogLoss + \alpha \cdot LogLoss_{group}

See the QueryCrossEntropy section for more details.

User-defined parameters:

• use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

true

• alpha

The coefficient used in quantile-based losses.

0.95

### QueryRMSE

Calculation principles

$\displaystyle\sqrt{\displaystyle\frac{\sum\limits_{Group \in Groups} \sum\limits_{i \in Group} w_{i} \left( t_{i} - a_{i} - \displaystyle\frac{\sum\limits_{j \in Group} w_{j} (t_{j} - a_{j})}{\sum\limits_{j \in Group} w_{j}} \right)^{2}} {\sum\limits_{Group \in Groups} \sum\limits_{i \in Group} w_{i}}}$

User-defined parameters:

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

### QuerySoftMax

Calculation principles

$- \displaystyle\frac{\sum\limits_{Group \in Groups} \sum\limits_{i \in Group}w_{i} t_{i} \log \left(\displaystyle\frac{w_{i} e^{\beta a_{i}}}{\sum\limits_{j\in Group} w_{j} e^{\beta a_{j}}}\right)} {\sum\limits_{Group \in Groups} \sum_{i\in Group} w_{i} t_{i}}$

User-defined parameters:

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

• beta

The input scale coefficient.

1

### PFound

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

Calculation principles

$PFound(top, decay) =$

$= \sum_{group \in groups} PFound(group, top, decay)$

See the PFound section for more details

User-defined parameters:

• The probability of search continuation after reaching the current object.

The probability of search continuation after reaching the current object.

Default:0.85

• The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

–1 (all label values are used)

• use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

true

### NDCG

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

Calculation principles

$nDCG(top) = \frac{DCG(top)}{IDCG(top)}$

See the NDCG section for more details.

User-defined parameters:

• The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

–1 (all label values are used)

• Metric calculation principles.

Possible values:
- Base
- Exp

Metric calculation principles.

Possible values:
- Base
- Exp

Base

• Metric denominator type.

Possible values:
- LogPosition
- Position

Metric denominator type.

Possible values:
- LogPosition
- Position

LogPosition

• use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

true

### DCG

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

Calculation principles

$DCG(top)$

See the NDCG section for more details.

User-defined parameters:

• The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

–1 (all label values are used)

• Metric calculation principles.

Possible values:
- Base
- Exp

Metric calculation principles.

Possible values:
- Base
- Exp

Base

• Metric denominator type.

Possible values:
- LogPosition
- Position

Metric denominator type.

Possible values:
- LogPosition
- Position

LogPosition

• use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

true

### FilteredDCG

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

Calculation principles

See the FilteredDCG section for more details.

User-defined parameters:

• type

Metric calculation principles.

Possible values:
- Base
- Exp

Base

• denominator" %}

Metric denominator type.

Possible values:
- LogPosition
- Position

Position

### AverageGain

Represents the average value of the label values for objects with the defined top $M$ label values.

See the AverageGain section for more details.

User-defined parameters:

• top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

This parameter is obligatory (the default value is not defined)

• use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

true

### PrecisionAt

Calculation principles

The calculation of this function consists of the following steps:

1. The objectsare sorted in descending order of predicted relevancies ($a_{i}$)

2. The metric is calculated as follows:

$PrecisionAt(top, border) = \frac{\sum\limits_{i=1}^{top} Relevant_{i}}{top} { , where}$

• $Relevant_{i} = \begin{cases} 1 { , } & t_{i} > {border} \\ 0 { , } & {in other cases} \end{cases}$

User-defined parameters:

• top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

–1 (all label values are used)

• border

The label value border. If the value is strictly greater than this threshold, it is considered a positive class. Otherwise it is considered a negative class.

0.5

### RecallAt

Calculation principles

The calculation of this function consists of the following steps:

1. The objectsare sorted in descending order of predicted relevancies ($a_{i}$)

2. The metric is calculated as follows:
$RecalAt(top, border) = \frac{\sum\limits_{i=1}^{top} Relevant_{i}}{\sum\limits_{i=1}^{N} Relevant_{i}}$

• $Relevant_{i} = \begin{cases} 1 { , } & t_{i} > {border} \\ 0 { , } & {in other cases} \end{cases}$

User-defined parameters:

• top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

–1 (all label values are used)

• border

The label value border. If the value is strictly greater than this threshold, it is considered a positive class. Otherwise it is considered a negative class.

0.5

### MAP

Calculation principles

1. The objectsare sorted in descending order of predicted relevancies ($a_{i}$)

2. The metric is calculated as follows:
$MAP(top, border) = \frac{1}{N_{groups}} \sum\limits_{j = 1}^{N_{groups}} AveragePrecisionAt_{j}(top, border) { , where}$

• $N_{groups}$ is the number of groups
• $AveragePrecisionAt(top, border) = \frac{\sum\limits_{i=1}^{top} Relevant_{i} * PrecisionAt_{i}}{\sum\limits_{i=1}^{top} Relevant_{i} }$

The value is calculated individually for each j-th group.

• $Relevant_{i} = \begin{cases} 1 { , } & t_{i} > {border} \\ 0 { , } & {in other cases} \end{cases}$
• $PrecisionAt_{i} = \frac{\sum\limits_{j=1}^{i} Relevant_{j}}{i}$

User-defined parameters:

• top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

–1 (all label values are used)

• border

The label value border. If the value is strictly greater than this threshold, it is considered a positive class. Otherwise it is considered a negative class.

0.5

### ERR

Calculation principles

$ERR = \frac{1}{|Q|} \sum_{q=1}^{|Q|} ERR_q$

$ERR_q = \sum_{i=1}^{top} \frac{1}{i} t_{q,i} \prod_{j=1}^{i-1} (1 - t_{q,j})$

Targets should be from the range [0, 1].

$t_{q,i} \in [0, 1]$

User-defined parameters:

• top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

–1 (all label values are used)

## AUC

### AUC

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

#### Classic

$\displaystyle\frac{\sum I(a_{i}, a_{j}) \cdot w_{i} \cdot w_{j}} {\sum w_{i} \cdot w_{j}}$
The sum is calculated on all pairs of objects $(i,j)$ such that:

• $t_{i} = 0$
• $t_{j} = 1$
• $I(x, y) = \begin{cases} 0 { , } & x < y \\ 0.5 { , } & x=y \\ 1 { , } & x>y \end{cases}$

Refer to the Wikipedia article for details.

If the target type is not binary, then every object with target value $t$ and weight $w$ is replaced with two objects for the metric calculation:

• $o_{1}$ with weight $t \cdot w$ and target value 1
• $o_{2}$ with weight (1 – t) \cdot w and target value 0.

Target values must be in the range [0; 1].

#### Ranking

$\displaystyle\frac{\sum I(a_{i}, a_{j}) \cdot w_{i} \cdot w_{j}} {\sum w_{i} * w_{j}}$

The sum is calculated on all pairs of objects $(i,j)$ such that:

• $t_{i} < t_{j}$
• $I(x, y) = \begin{cases} 0 { , } & x < y \\ 0.5 { , } & x=y \\ 1 { , } & x>y \end{cases}$

User-defined parameters:

• use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

false

• type

The type of AUC. Defines the metric calculation principles.

Possible values:

• Classic
• Ranking

Examples:

AUC:type=Classic

AUC:type=Ranking


## Used for optimization

### MRR

Calculation principles

$MRR = \frac{1}{|Q|} \sum_{q=1}^{|Q|} \frac{1}{rank_q}$, where $rank_q$ refers to the rank position of the first relevant document for the q-th query.

User-defined parameters:

• top

The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

–1 (all label values are used)

• border

The label value border. If the value is strictly greater than this threshold, it is considered a positive class. Otherwise it is considered a negative class.

0.5

### QueryAUC

• type

The type of AUC. Defines the metric calculation principles.

Possible values:

• Classic
• Ranking

Examples:

AUC:type=Classic

AUC:type=Ranking


#### Classic

$\displaystyle\frac{ \sum_q \sum_{i, j \in q} \sum I(a_{i}, a_{j}) \cdot w_{i} \cdot w_{j}} { \sum_q \sum_{i, j \in q} \sum w_{i} \cdot w_{j}}$
The sum is calculated on all pairs of objects $(i,j)$ such that:

• $t_{i} = 0$
• $t_{j} = 1$
• $I(x, y) = \begin{cases} 0 { , } & x < y \\ 0.5 { , } & x=y \\ 1 { , } & x>y \end{cases}$

Refer to the Wikipedia article for details.

If the target type is not binary, then every object with target value $t$ and weight $w$ is replaced with two objects for the metric calculation:

• $o_{1}$ with weight $t \cdot w$ and target value 1
• $o_{2}$ with weight (1 – t) \cdot w and target value 0.

Target values must be in the range [0; 1].

#### Ranking

$\displaystyle\frac{ \sum_q \sum_{i, j \in q} \sum I(a_{i}, a_{j}) \cdot w_{i} \cdot w_{j}} { \sum_q \sum_{i, j \in q} \sum w_{i} * w_{j}}$

The sum is calculated on all pairs of objects $(i,j)$ such that:

• $t_{i} < t_{j}$
• $I(x, y) = \begin{cases} 0 { , } & x < y \\ 0.5 { , } & x=y \\ 1 { , } & x>y \end{cases}$

User-defined parameters:

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: false

### Used for optimization

Name Optimization
YetiRank +
YetiRankPairwise +
StochasticFilter +
StochasticRank +
DCG +
QueryCrossEntropy +
QueryRMSE +
QuerySoftMax +
PFound -
NDCG -
DCG -
FilteredDCG -
AverageGain -
PrecisionAt -
RecallAt -
MAP -
ERR -
MRR -
AUC -
QueryAUC -