Load the UCI Adult Data Set.

This dataset is best suited for binary classification.
The training dataset contains 32561 objects. Each object is described by 15 columns of numerical and categorical features. The label column is not precisely specified.

The validation dataset contains 16281 objects. The structure is identical to the training dataset.

Method call format


Type of return value

A two pandas.DataFrame tuple (for train and validation datasets).

Usage examples

from catboost.datasets import adult
adult_train, adult_test = adult()


The output of this example:

    age         workclass    fnlwgt  education  education-num      marital-status         occupation   relationship   race   sex  capital-gain  capital-loss  hours-per-week native-country income
0  39.0         State-gov   77516.0  Bachelors           13.0       Never-married       Adm-clerical  Not-in-family  White  Male        2174.0           0.0            40.0  United-States  <=50K
1  50.0  Self-emp-not-inc   83311.0  Bachelors           13.0  Married-civ-spouse    Exec-managerial        Husband  White  Male           0.0           0.0            13.0  United-States  <=50K
2  38.0           Private  215646.0    HS-grad            9.0            Divorced  Handlers-cleaners  Not-in-family  White  Male           0.0           0.0            40.0  United-States  <=50K