Code Monkey home page Code Monkey logo

rosetta-ai / rosetta_recsys2019 Goto Github PK

View Code? Open in Web Editor NEW
58.0 4.0 16.0 165 KB

The 4th Place Solution to the 2019 ACM Recsys Challenge by Team RosettaAI

Home Page: https://blog.rosetta.ai/the-5th-place-approach-to-the-2019-acm-recsys-challenge-by-team-rosettaai-eb3c4e6178c4

License: Apache License 2.0

Jupyter Notebook 30.06% Python 69.93% Shell 0.02%
recommender-system deep-learning data-science machine-learning artificial-intelligence neural-network python boosting-tree lightgbm xgboost

rosetta_recsys2019's Introduction

The 4th Place Solution to the 2019 ACM RecSys Challenge

Team Members

Kung-hsiang (Steeve), Huang (Rosetta.ai); Yi-fu, Fu; Yi-ting, Lee; Tzong-hann, Lee; Yao-chun, Chan (National Taiwan University); Yi-hui, Lee (University of Texas, Dallas); Shou-de, Lin (National Taiwan University)

Contact: [email protected]

Introduction

This repository contains RosettaAI's approach to the 2019 ACM Recys Challenge (paper, writeup). Instead of treating it as a ranking problem, we use Binary Cross Entropy as our loss function. Three different models were implemented:

  1. Neural Networks (based on DeepFM and this Youtube paper)
  2. LightGBM
  3. XGBoost

Environment

  • Ubuntu 16.04
  • CUDA 9.0
  • Python==3.6.8
  • Numpy==1.16
  • Pandas==0.24.2
  • PyTorch==1.1.0
  • Sklearn==0.21.2
  • Scipy==1.3.0
  • LightGBM==2.2.4
  • XGBoost==0.9
  • timezonefinder==4.0.3
  • geopy==1.20.0

Project Structure

├── input
├── output
├── src
└── weights

Setup

Run the following commands to create directories that conform to the structure of the project, then place the unzipped data into the input directory.:

. setup.sh

Run the two python scripts to picklize the input data and obtain the utc offsets from countries:

cd src
python picklization.py
python country2utc.py

To enable the model to train on the whole data, set debug and subsample to False in the config.py file.

class Configuration(object):

    def __init__(self):
        ...
        self.debug = False
        self.sub_sample = False
        ...

Training & Submission

The models are all trained in an end-to-end fashion. To train and predict each of the three models, simply run the following commands:

python run_nn.py
python run_lgb.py
python run_xgb.py

The submission files are stored in the output directory.

The results generated from LightGBM alone would place us at the 5th position in the public leaderboard. To ensemble these three models, change the output name of each model in Merge.ipynb and run it.

Performance

Model Local Validation MRR Public Leaderboard MRR
LightGBM 0.685787 N/A
XGBoost 0.684521 0.681128
NN 0.675206 0.672117

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.