msrank_10k
Load a smaller version of the Microsoft Learning to Rank Dataset. This dataset is a shrunk version of the msrank dataset.
The training dataset contains 10000 objects. Each object is described by 138 columns. The first column contains the label value, the second one contains the identifier of the object's group (GroupId
). All other columns contain features.
The validation dataset contains 10000 objects. The structure is identical to the training dataset.
Method call format
msrank_10k()
Type of return value
A two pandas.DataFrame tuple (for train and validation datasets).
Usage examples
from catboost.datasets import msrank_10k
msrank_10k_train, msrank_10k_test = msrank_10k()
print(msrank_10k_train.head(3))
The output of this example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ... 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137
0 2.0 1 3 3 0 0 3 1.0 1.0 0.000000 0.0 1.0 156 4 0 ... -4.474452 -23.634899 -28.119826 -13.581932 3 62 11089534 2 116 64034 13 3 0 0 0.0
1 2.0 1 3 0 3 0 3 1.0 0.0 1.000000 0.0 1.0 406 0 5 ... -24.041386 -5.143860 -28.119826 -11.411068 2 54 11089534 2 124 64034 1 2 0 0 0.0
2 0.0 1 3 0 2 0 3 1.0 0.0 0.666667 0.0 1.0 146 0 3 ... -24.041386 -14.689844 -28.119826 -11.436378 3 45 3 1 124 3344 14 67 0 0 0.0
[3 rows x 138 columns]