Code Monkey home page Code Monkey logo

sim's Introduction

Search-based User Interest Modeling with Lifelong Sequential Behavior Data for Click-Through Rate Prediction

Implementation of Search-based User Interest Modeling with Lifelong Sequential Behavior Data for Click-Through Rate Prediction using tensorflow

Prerequisites

  • Python 2.x
  • Tensorflow 1.15.0

Data

Getting Started

First we need to prepare data.

Amazon Prepare

  • Because getting and processing the data is time consuming,we had processed Amazon data and upload it for you.
tar -xzf data.tar.gz

Running

usage: train_taobao_and_book.py [-h] [-mode MODE] [-seed SEED]
                                [-use_first_att USE_FIRST_ATT]
                                [-first_att_top_k FIRST_ATT_TOP_K]
                                [-use_vec_loss USE_VEC_LOSS]
                                [-long_seq_split LONG_SEQ_SPLIT]
                                [-short_seq_split SHORT_SEQ_SPLIT]
                                [-short_model_type SHORT_MODEL_TYPE]
                                [-long_model_type LONG_MODEL_TYPE]
                                [-save_iter SAVE_ITER]
                                [-test_iter TEST_ITER] [-max_len MAX_LEN]
                                [-seq_len SEQ_LEN]
                                [-epoch EPOCH] [-memory_size MEMORY_SIZE]
                                [-batch_size BATCH_SIZE]
                                [-search_mode SEARCH_MODE] [-level LEVEL]
                                [-data_type DATA_TYPE]
                                [-att_func ATT_FUNC]

Base Model

The example for DNN

python train.py -mode train \
-data_type book \
 -max_len 100 \
 -short_model_type DIN \
 -short_seq_split '90:100' \
 -long_model_type DNN \
 -long_seq_split '0:90' \
 -seed 2  \
  -epoch 2 \
 -save_iter 10 \
 -test_iter 20 \
 -search_mode 'None' 

The model type below had been supported:

  • DNN
  • DIN
  • MIMN

SIM

You can train SIM with two kinds of search unit:

  • hard-search
python train.py -mode train \
-data_type book \
 -max_len 100 \
 -short_model_type DIN \
 -short_seq_split '90:100' \
 -long_model_type DIN \
 -long_seq_split '0:90' \
 -seed 2  \
  -epoch 2 \
 -save_iter 10 \
 -test_iter 20 \
 -search_mode 'cate' \
 -att_func 'dot' 
  • soft-search
python train.py -mode train \
-data_type book \
 -max_len 100 \
 -short_model_type DIN \
 -short_seq_split '90:100' \
 -long_model_type DIN \
 -long_seq_split '0:90' \
 -seed 2  \
  -epoch 2 \
 -data_thread_num 5 \
 -save_iter 10 \
 -test_iter 20 \
 -search_mode 'None' \
  -use_first_att True \
 -first_att_top_k 50 \
 -use_vec_loss True \
  -att_func 'dot' 

sim's People

Contributors

tttwwy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

sim's Issues

DIEN not available for the SIM model

Hi,

We are trying to reproduce the paper's accuracy fir SIM soft on Amazon dataset. In the paper there is AUC ROC = 0.7510, however when running the code in this repo we get only 0.7221. This is probably we use DIN whereas in the paper there is DIEN used in second stage of the model. Will you provide DIEN model in this repo to fully reproduce the original's paper accuracy?

关于开源淘宝数据集

在SIM的论文中,实验使用了三个数据集,其中的淘宝数据集规模比现在开源的Amazon Book数据集要大不少,并且有Amazon Book没有的Timeinfo。论文里提到The dataset will be published soon. 请问最近有开源淘宝数据集的计划吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.