Code Monkey home page Code Monkey logo

melidatachallenge2020's Introduction

MeliDataChallenge2020

ML Model for Mercado Libre Data Challenge 2020

This solution is called Efecto Bolo! and in honor of this lovely name, it was the first place of the final leaderboard!

The challenge

"Build a Machine Learning model to predict next purchase based on the user’s navigation history."

https://ml-challenge.mercadolibre.com/

Data

The dataset provided for this competition contains users' navigation histories (list of visited items and searches before the purchase), and the purchased item (target).

The training set contains 413.163 rows, and the test set used for submissions has 177.070 rows.

Additionally, there is a dataset with the catalog of all available items for making predictions with the catalog of all available items for making predictions (2M items). It contains some extra information about the products, such as the price, the listing's title, and categories.

https://ml-challenge.mercadolibre.com/downloads

Evaluation

Submissions were evaluated using the average NDCG score of the top ten predicted items. This metric takes into account the order of the set of recommendations, assigns a relevance score if we hit the target purchase, and also a lower score for those predicted items with the same domains (it's like a category) of the target. The latter means that the solution could focus on item domains. There are 7,894 domains in the catalog, so this is an easier task than item predictions.

https://ml-challenge.mercadolibre.com/rule

Models

The strategy used for this challenge has several parts:

Implicit rating

I started with an exploratory analysis, the highlight is that almost 30% of the targets are in its user's history. Another finding was that the latest items visited have more probability of purchase. (notebook)

So a simple first model could be a function to score the viewed items in the user's session. The implicit rating function I proposed is:

r_ui = sum(1 / log10(positon_in_history + 1))

r_ui is the implicit rating of the user "u" over the item "i". The sum is over all pageviews of "i" in the user session "u". The position 1 is the latest pageview.

So the implicit rating increases if the item has more pageviews, and also if these were more recently.

This simple approach (filling the recommendation set with popular items from the domain of the highest scored item), has an NDCG metric of 0.2639, it means, in the top 20 positions of the competition. Nothing bad!

The same approach, combined with a "Items to Item" collaborative filtering model strategy, had an NDCG score of 0.28755 (in top 10 of the final leaderboard). This wasn't the final model strategy, but it's a good and lite one (notebook)

Matrix Factorization

Alternating Least Squares (ALS) is a matrix factorization model, this model uses implicit ratings of user-items interactions, instead of explicit rating (like stars movies), or the binary representation commonly used in implicit interactions. So it's great for the implicit ratings already created.

In addition to the items and users encoding in the model, I added features items and search terms in the user-item matrix representation, to enrich te semantic of the recommendations, especially for items with few pageviews (cold start problem), and to tend to recomend items of the same domain. (notebook)

Finally, the last submission was a weighted ensemble of the implicit rating of the items viewed in the user session, and the predicted scores of two models ALS with different settings. (notebook)

Results

Model Local score Public score Private score
Baseline (sorted visited items) 0.2085
Simple implicit rating 0.2639
Items to items (first submission) 0.2889 0.28755
ALS Ensamble (last submission) 0.3180 0.31293 0.30920

https://ml-challenge.mercadolibre.com/final_results

melidatachallenge2020's People

Contributors

leolnn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.