titanic
Load the dataset from Kaggle Titanic: Machine Learning from Disaster.
This dataset is best suited for binary classification.
The training dataset contains 891 objects. Each object is described by 12 columns of numerical and categorical features. The Survived
column is often used as the label.
The validation dataset contains 418 objects. The structure is similar to the training dataset except for the Survived
column which is omitted.
Method call format
titanic()
Type of return value
A two pandas.DataFrame tuple (for train and validation datasets).
Usage examples
from catboost.datasets import titanic
titanic_train, titanic_test = titanic()
print(titanic_train.head(3))
The output of this example:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S