Ranking: objectives and metrics

Pairwise metrics

Pairwise metrics use special labeled information — pairs of dataset objects where one object is considered the winner and the other is considered the loser. This information might be not exhaustive (not all possible pairs of objects are labeled in such a way). It is also possible to specify the weight for each pair.

If GroupId is specified, then all pairs must have both members from the same group if this dataset is used in pairwise modes.

Read more about GroupId

The identifier of the object's group. An arbitrary string, possibly representing an integer.

If the labeled pairs data is not specified for the dataset, then pairs are generated automatically in each group using per-object label values (labels must be specified and must be numerical). The object with a greater label value in the pair is considered the winner.

Specific variables used

The following variables are used in formulas of the described pairwise metrics:

  • pp is the positive object in the pair.
  • nn is the negative object in the pair.

Objectives and metrics

PairLogit

Calculation principles

p,nPairswpn(log(11+e(apan)))p,nPairswpn\displaystyle\frac{-\sum\limits_{p, n \in Pairs} w_{pn} \left(log(\displaystyle\frac{1}{1 + e^{- (a_{p} - a_{n})}})\right)}{\sum\limits_{p, n \in Pairs} w_{pn}}

Note

The object weights are not used to calculate and optimize the value of this metric. The weights of object pairs are used instead.

User-defined parameters:

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

max_pairs

The maximum number of generated pairs in each group. Takes effect if no pairs are given and therefore are generated without repetition.

Default: All possible pairs are generated in each group

PairLogitPairwise

Calculation principles

p,nPairswpn(log(11+e(apan)))p,nPairswpn\displaystyle\frac{-\sum\limits_{p, n \in Pairs} w_{pn} \left(log(\displaystyle\frac{1}{1 + e^{- (a_{p} - a_{n})}})\right)}{\sum\limits_{p, n \in Pairs} w_{pn}}

This metric may give more accurate results on large datasets compared to PairLogit but it is calculated significantly slower.

This technique is described in the Winning The Transfer Learning Track of Yahoo!’s Learning To Rank Challenge with YetiRank paper.

Note

The object weights are not used to calculate and optimize the value of this metric. The weights of object pairs are used instead.

User-defined parameters:

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

max_pairs

The maximum number of generated pairs in each group. Takes effect if no pairs are given and therefore are generated without repetition.

Default: All possible pairs are generated in each group

PairAccuracy

Calculation principles

p,nPairswpn[ap>an]p,nPairswpn\displaystyle\frac{\sum\limits_{p, n \in Pairs} w_{pn} [a_{p} > a_{n}] }{\sum\limits_{p, n \in Pairs} w_{pn} }

Note

The object weights are not used to calculate the value of this metric. The weights of object pairs are used instead.

User-defined parameters:

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

Used for optimization

Name Optimization
PairLogit +
PairLogitPairwise +
PairAccuracy -

Groupwise metrics

YetiRank

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

An approximation of ranking metrics (such as NDCG and PFound). Allows to use ranking metrics for optimization.

The value of this metric can not be calculated. The metric that is written to output data if YetiRank is optimized depends on the range of all N target values (i[1;N]i \in [1; N]) of the dataset:

  • targeti[0;1]target_{i} \in [0; 1] — PFound
  • targeti[0;1]target_{i} \notin [0; 1] — NDCG

This metric gives less accurate results on big datasets compared to YetiRankPairwise but it is significantly faster.

Note

The object weights are not used to optimize this metric. The group weights are used instead.

This objective is used to optimize PairLogit. Automatically generated object pairs are used for this purpose. These pairs are generated independently for each object group. Use the Group weights file or the GroupWeight column of the Columns description file to change the group importance. In this case, the weight of each generated pair is multiplied by the value of the corresponding group weight.

User-defined parameters:

  • decay

    The probability of search continuation after reaching the current object.

    Default:0.99

  • permutations

    The number of permutations.

    10

  • use_weights

    Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

    true

YetiRankPairwise

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

An approximation of ranking metrics (such as NDCG and PFound). Allows to use ranking metrics for optimization.

The value of this metric can not be calculated. The metric that is written to output data if YetiRank is optimized depends on the range of all N target values (i[1;N]i \in [1; N]) of the dataset:

  • targeti[0;1]target_{i} \in [0; 1] — PFound
  • targeti[0;1]target_{i} \notin [0; 1] — NDCG

This metric gives more accurate results on big datasets compared to YetiRank but it is significantly slower.

This technique is described in the Winning The Transfer Learning Track of Yahoo!’s Learning To Rank Challenge with YetiRank paper.

Note

The object weights are not used to optimize this metric. The group weights are used instead.

This objective is used to optimize PairLogit. Automatically generated object pairs are used for this purpose. These pairs are generated independently for each object group. Use the Group weights file or the GroupWeight column of the Columns description file to change the group importance. In this case, the weight of each generated pair is multiplied by the value of the corresponding group weight.

User-defined parameters:

  • The probability of search continuation after reaching the current object.

    The probability of search continuation after reaching the current object.

    Default:0.99

  • The number of permutations.

    The number of permutations.

    10

  • Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

    Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

    true

StochasticFilter

Directly optimize the FilteredDCG metric calculated for a pre-defined order of objects for filtration of objects under a fixed ranking. As a result, the FilteredDCG metric can be used for optimization.

FilteredDCG=i=1ntii,whereFilteredDCG = \sum\limits_{i=1}^{n}\displaystyle\frac{t_{i}}{i} { , where}

tit_{i} is the relevance of an object in the group and the sum is computed over the documents with a>0a > 0.

The filtration is defined via the raw formula value:

Zeros correspond to filtered instances and ones correspond to the remaining ones.

The ranking is defined by the order of objects in the dataset.

Sort objects by the column you are interested in before training with this loss function and use the --has-timefor the Command-line version option to avoid further objects reordering.

For optimization, a distribution of filtrations is defined:

P(filterx)=σ(a),where\mathbb{P}(\text{filter}|x) = \sigma(a) { , where}

  • σ(z)=11+ez\sigma(z) = \displaystyle\frac{1}{1 + \text{e}^{-z}}
  • The gradient is estimated via REINFORCE.

Refer to the Learning to Select for a Predefined Ranking paper for calculation details.

User-defined parameters:

  • sigma

    The scale for multiplying predictions.

    1

  • num_estimations

    The number of gradient samples.

    1

StochasticRank

Directly optimize the selected metric. The value of the selected metric is written to output data

Refer to the StochasticRank: Global Optimization of Scale-Free Discrete Functions paper for details.

User-defined parameters:

Common parameters:

  • metric

    The metric that should be optimized.

    Supported values:

    • DCG
    • NDCG
    • PFound

    Obligatory parameter

  • num_estimations

    The number of gradient estimation iterations.

    1

  • mu

    Controls the penalty for coinciding predictions (aka ties).

    0

Metric-specific parameters (available if the corresponding metric is set in the metric parameter):

DCG

  • top

    The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

    –1 (all label values are used)

  • type

    Metric calculation principles.

    Possible values:
    - Base
    - Exp

    Base

  • denominator

    Metric denominator type.

    Possible values:
    - LogPosition
    - Position

    LogPosition

NDCG

  • top

    The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

    –1 (all label values are used)

  • type

    Metric calculation principles.

    Possible values:
    - Base
    - Exp

    Base

  • denominator

    Metric denominator type.

    Possible values:
    - LogPosition
    - Position

    LogPosition

PFound

  • decay

    The probability of search continuation after reaching the current object.

    Default:0.85

  • top

    The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

    –1 (all label values are used)

QueryCrossEntropy

Calculation principles

QueryCrossEntropy(\alpha) = (1 – \alpha) \cdot LogLoss + \alpha \cdot LogLoss_{group}

See the QueryCrossEntropy section for more details.

User-defined parameters:

  • use_weights

    Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

    true

  • alpha

    The coefficient used in quantile-based losses.

    0.95

QueryRMSE

Calculation principles

GroupGroupsiGroupwi(tiaijGroupwj(tjaj)jGroupwj)2GroupGroupsiGroupwi\displaystyle\sqrt{\displaystyle\frac{\sum\limits_{Group \in Groups} \sum\limits_{i \in Group} w_{i} \left( t_{i} - a_{i} - \displaystyle\frac{\sum\limits_{j \in Group} w_{j} (t_{j} - a_{j})}{\sum\limits_{j \in Group} w_{j}} \right)^{2}} {\sum\limits_{Group \in Groups} \sum\limits_{i \in Group} w_{i}}}

User-defined parameters:

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

QuerySoftMax

Calculation principles

GroupGroupsiGroupwitilog(wieβaijGroupwjeβaj)GroupGroupsiGroupwiti- \displaystyle\frac{\sum\limits_{Group \in Groups} \sum\limits_{i \in Group}w_{i} t_{i} \log \left(\displaystyle\frac{w_{i} e^{\beta a_{i}}}{\sum\limits_{j\in Group} w_{j} e^{\beta a_{j}}}\right)} {\sum\limits_{Group \in Groups} \sum_{i\in Group} w_{i} t_{i}}

User-defined parameters:

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: true

  • beta

    The input scale coefficient.

    1

PFound

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

Calculation principles

PFound(top,decay)=PFound(top, decay) =

=groupgroupsPFound(group,top,decay)= \sum_{group \in groups} PFound(group, top, decay)

See the PFound section for more details

User-defined parameters:

  • The probability of search continuation after reaching the current object.

    The probability of search continuation after reaching the current object.

    Default:0.85

  • The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

    The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

    –1 (all label values are used)

  • use_weights

    Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

    true

NDCG

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

Calculation principles

nDCG(top)=DCG(top)IDCG(top)nDCG(top) = \frac{DCG(top)}{IDCG(top)}

See the NDCG section for more details.

User-defined parameters:

  • The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

    The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

    –1 (all label values are used)

  • Metric calculation principles.

    Possible values:
    - Base
    - Exp

    Metric calculation principles.

    Possible values:
    - Base
    - Exp

    Base

  • Metric denominator type.

    Possible values:
    - LogPosition
    - Position

    Metric denominator type.

    Possible values:
    - LogPosition
    - Position

    LogPosition

  • use_weights

    Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

    true

DCG

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

Calculation principles

DCG(top)DCG(top)

See the NDCG section for more details.

User-defined parameters:

  • The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

    The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

    –1 (all label values are used)

  • Metric calculation principles.

    Possible values:
    - Base
    - Exp

    Metric calculation principles.

    Possible values:
    - Base
    - Exp

    Base

  • Metric denominator type.

    Possible values:
    - LogPosition
    - Position

    Metric denominator type.

    Possible values:
    - LogPosition
    - Position

    LogPosition

  • use_weights

    Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

    true

FilteredDCG

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

Calculation principles

See the FilteredDCG section for more details.

User-defined parameters:

  • type

    Metric calculation principles.

    Possible values:
    - Base
    - Exp

    Base

  • denominator" %}

    Metric denominator type.

    Possible values:
    - LogPosition
    - Position

    Position

AverageGain

Represents the average value of the label values for objects with the defined top MM label values.

See the AverageGain section for more details.

User-defined parameters:

  • top

    The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

    This parameter is obligatory (the default value is not defined)

  • use_weights

    Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

    true

PrecisionAt

Calculation principles

The calculation of this function consists of the following steps:

  1. The objectsare sorted in descending order of predicted relevancies (aia_{i})

  2. The metric is calculated as follows:

    PrecisionAt(top,border)=i=1topRelevantitop,wherePrecisionAt(top, border) = \frac{\sum\limits_{i=1}^{top} Relevant_{i}}{top} { , where}

    • Relevanti={1,ti>border0,inothercasesRelevant_{i} = \begin{cases} 1 { , } & t_{i} > {border} \\ 0 { , } & {in other cases} \end{cases}

User-defined parameters:

  • top

    The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

    –1 (all label values are used)

  • border

    The label value border. If the value is strictly greater than this threshold, it is considered a positive class. Otherwise it is considered a negative class.

    0.5

RecallAt

Calculation principles

The calculation of this function consists of the following steps:

  1. The objectsare sorted in descending order of predicted relevancies (aia_{i})

  2. The metric is calculated as follows:
    RecalAt(top,border)=i=1topRelevantii=1NRelevantiRecalAt(top, border) = \frac{\sum\limits_{i=1}^{top} Relevant_{i}}{\sum\limits_{i=1}^{N} Relevant_{i}}

    • Relevanti={1,ti>border0,inothercasesRelevant_{i} = \begin{cases} 1 { , } & t_{i} > {border} \\ 0 { , } & {in other cases} \end{cases}

User-defined parameters:

  • top

    The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

    –1 (all label values are used)

  • border

    The label value border. If the value is strictly greater than this threshold, it is considered a positive class. Otherwise it is considered a negative class.

    0.5

MAP

Calculation principles

  1. The objectsare sorted in descending order of predicted relevancies (aia_{i})

  2. The metric is calculated as follows:
    MAP(top,border)=1Ngroupsj=1NgroupsAveragePrecisionAtj(top,border),whereMAP(top, border) = \frac{1}{N_{groups}} \sum\limits_{j = 1}^{N_{groups}} AveragePrecisionAt_{j}(top, border) { , where}

    • NgroupsN_{groups} is the number of groups
    • AveragePrecisionAt(top,border)=i=1topRelevantiPrecisionAtii=1topRelevantiAveragePrecisionAt(top, border) = \frac{\sum\limits_{i=1}^{top} Relevant_{i} * PrecisionAt_{i}}{\sum\limits_{i=1}^{top} Relevant_{i} }

    The value is calculated individually for each j-th group.

    • Relevanti={1,ti>border0,inothercasesRelevant_{i} = \begin{cases} 1 { , } & t_{i} > {border} \\ 0 { , } & {in other cases} \end{cases}
    • PrecisionAti=j=1iRelevantjiPrecisionAt_{i} = \frac{\sum\limits_{j=1}^{i} Relevant_{j}}{i}

User-defined parameters:

  • top

    The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

    –1 (all label values are used)

  • border

    The label value border. If the value is strictly greater than this threshold, it is considered a positive class. Otherwise it is considered a negative class.

    0.5

ERR

Calculation principles

ERR=1Qq=1QERRqERR = \frac{1}{|Q|} \sum_{q=1}^{|Q|} ERR_q

ERRq=i=1top1itq,ij=1i1(1tq,j)ERR_q = \sum_{i=1}^{top} \frac{1}{i} t_{q,i} \prod_{j=1}^{i-1} (1 - t_{q,j})

Targets should be from the range [0, 1].

tq,i[0,1]t_{q,i} \in [0, 1]

User-defined parameters:

  • top

    The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

    –1 (all label values are used)

AUC

AUC

The calculation of this metric is disabled by default for the training dataset to speed up the training. Use the hints=skip_train~false parameter to enable the calculation.

Classic

I(ai,aj)wiwjwiwj\displaystyle\frac{\sum I(a_{i}, a_{j}) \cdot w_{i} \cdot w_{j}} {\sum w_{i} \cdot w_{j}}
The sum is calculated on all pairs of objects (i,j)(i,j) such that:

  • ti=0t_{i} = 0
  • tj=1t_{j} = 1
  • I(x,y)={0,x<y0.5,x=y1,x>yI(x, y) = \begin{cases} 0 { , } & x < y \\ 0.5 { , } & x=y \\ 1 { , } & x>y \end{cases}

Refer to the Wikipedia article for details.

If the target type is not binary, then every object with target value tt and weight ww is replaced with two objects for the metric calculation:

  • o1o_{1} with weight twt \cdot w and target value 1
  • o2o_{2} with weight (1 – t) \cdot w and target value 0.

Target values must be in the range [0; 1].

Ranking

I(ai,aj)wiwjwiwj\displaystyle\frac{\sum I(a_{i}, a_{j}) \cdot w_{i} \cdot w_{j}} {\sum w_{i} * w_{j}}

The sum is calculated on all pairs of objects (i,j)(i,j) such that:

  • ti<tjt_{i} < t_{j}
  • I(x,y)={0,x<y0.5,x=y1,x>yI(x, y) = \begin{cases} 0 { , } & x < y \\ 0.5 { , } & x=y \\ 1 { , } & x>y \end{cases}

User-defined parameters:

  • use_weights

    Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

    false

  • type

    The type of AUC. Defines the metric calculation principles.

    Possible values:

    • Classic
    • Ranking

    Examples:

    AUC:type=Classic
    
    AUC:type=Ranking
    

Used for optimization

MRR

Calculation principles

MRR=1Qq=1Q1rankqMRR = \frac{1}{|Q|} \sum_{q=1}^{|Q|} \frac{1}{rank_q}, where rankqrank_q refers to the rank position of the first relevant document for the q-th query.

User-defined parameters:

  • top

    The number of top samples in a group that are used to calculate the ranking metric. Top samples are either the samples with the largest approx values or the ones with the lowest target values if approx values are the same.

    –1 (all label values are used)

  • border

    The label value border. If the value is strictly greater than this threshold, it is considered a positive class. Otherwise it is considered a negative class.

0.5

QueryAUC

  • type

    The type of AUC. Defines the metric calculation principles.

    Possible values:

    • Classic
    • Ranking

    Examples:

    AUC:type=Classic
    
    AUC:type=Ranking
    

Classic

qi,jqI(ai,aj)wiwjqi,jqwiwj\displaystyle\frac{ \sum_q \sum_{i, j \in q} \sum I(a_{i}, a_{j}) \cdot w_{i} \cdot w_{j}} { \sum_q \sum_{i, j \in q} \sum w_{i} \cdot w_{j}}
The sum is calculated on all pairs of objects (i,j)(i,j) such that:

  • ti=0t_{i} = 0
  • tj=1t_{j} = 1
  • I(x,y)={0,x<y0.5,x=y1,x>yI(x, y) = \begin{cases} 0 { , } & x < y \\ 0.5 { , } & x=y \\ 1 { , } & x>y \end{cases}

Refer to the Wikipedia article for details.

If the target type is not binary, then every object with target value tt and weight ww is replaced with two objects for the metric calculation:

  • o1o_{1} with weight twt \cdot w and target value 1
  • o2o_{2} with weight (1 – t) \cdot w and target value 0.

Target values must be in the range [0; 1].

Ranking

qi,jqI(ai,aj)wiwjqi,jqwiwj\displaystyle\frac{ \sum_q \sum_{i, j \in q} \sum I(a_{i}, a_{j}) \cdot w_{i} \cdot w_{j}} { \sum_q \sum_{i, j \in q} \sum w_{i} * w_{j}}

The sum is calculated on all pairs of objects (i,j)(i,j) such that:

  • ti<tjt_{i} < t_{j}
  • I(x,y)={0,x<y0.5,x=y1,x>yI(x, y) = \begin{cases} 0 { , } & x < y \\ 0.5 { , } & x=y \\ 1 { , } & x>y \end{cases}

User-defined parameters:

use_weights

Use object/group weights to calculate metrics if the specified value is true and set all weights to 1 regardless of the input data if the specified value is false.

Default: false

Used for optimization

Name Optimization
YetiRank +
YetiRankPairwise +
StochasticFilter +
StochasticRank +
DCG +
QueryCrossEntropy +
QueryRMSE +
QuerySoftMax +
PFound -
NDCG -
DCG -
FilteredDCG -
AverageGain -
PrecisionAt -
RecallAt -
MAP -
ERR -
MRR -
AUC -
QueryAUC -