CatBoost is a high-performance open source library for gradient boosting on decision trees

Careem's Destination Prediction Service uses CatBoost

February 20, 2019

In Dubai, there are around 8,000 restaurants, the world’s tallest building, over 50 malls, lots of beaches, theme parks and much more. The number of destinations is vast - so where are you going next?

You’re not the only one trying to answer that question.

Careem is constantly trying to figure out where its users will go next. And not just in Dubai, but in all 15 of the countries in which we're present and for all 33 million of the app’s users.

Why does a ride-hailing service need the answer?

Imagine, instead of asking a user to search his destination on a map by manually inputting the address, we simply give them a shortlist of most probable locations to choose from.

One-click booking makes things faster and easier for the customer, and knowing where a customer would like to go next - even before they tell us - allows us to improve our service by ensuring the busier areas have a greater supply of transport options.

The Careem data science team asked the question, “if we build a supervised machine-learning  model to solve this problem, what are the most important and informative features needed to predict where a user goes?”

For example, "morning-home-work" sounds like a reasonable guess, if it is supported by historical trips. Or, for example, "weekend-home-mall", if the person likes shopping. And after verification with the data, it becomes clearer that the most relevant features are location, time, and users' interactions with the location at particular times of day.

Here, CatBoost comes into play with its superpower of handling categorical features. CatBoost is a state-of-the-art machine learning algorithm which allows users to quickly handle categorical features for a large data set.

Day of a week, and hour of a day are categorical features. Each of the locations, or a small subset of locations, are also categorical features. And a city contains thousands of such locations. And if you have to encode them manually it becomes a giant problem - it would also drastically increase the data set size, slow down the training and still give very poor results.

This is how things used to be before Yandex open-sourced CatBoost. It doesn’t only take into consideration the categorical features independently, it also takes advantage of the categorical features combination, which is exactly what we need to handle patterns like "morning-home-work".

Using this, we now have the algorithm - one that can very precisely predict the next user’s move. Careem has 150+ cities in its network, millions of trips performed and a huge number of requests being made all the time. This is a lot of data to be processed at any given time.

There's a general belief that 300 milliseconds is a really good response time for a machine learning system, but in Careem's case, that's just not good enough and would result in a customer experience that's too slow. At Careem we needed to do much better than that... so we did.

CatBoost’s performance in conjunction with the Careem app and A.I. exceeds all expectations providing our customers with a single-digit response time.

The model - known as Destination Prediction Service - is currently working in the Careem app serving users and ensuring that our system is optimal.

So the next time you hit "Yalla!" on your Careem app, remember that it's not just like sending an email, you've actually just started off a process that involves some serious artificial intelligence, and some of the fastest responses times for a machine learning system.

When you book on our app nearly 500 actions happen from the second you press the button to your ride arriving.

Careem’s A.I. platform – called Yoda – has a state-of-the-art machine-learning cycle. It can even predict with extremely high accuracy what the demand in a certain place will be in two weeks’ time and where drivers will be needed. This helps us ensure that waiting times are a low as possible and our drivers can secure more fares.

And if you use the Careem service for six months, the A.I. will be much more in-tune with your needs because it keeps learning how to provide you with a better service.

Where are you going next? Yeah, we thought so.

This is a repost of Careem’s blog post “How Careem’s Destination Prediction Service speeds up your ride”.

Latest News

CatBoost papers on NeurIPS 2018

On December 2018, on NeurIPS conference in Montreal, Yandex team presented two papers related to CatBoost, an open-source machine learning library developed by Yandex.

Contacts