Code Monkey home page Code Monkey logo

tencent-ads-algo-comp-2020's Introduction

Tencent-Ads-Algo-Comp-2020

MIT license

Git repo for Tencent Advertisement Algorithm Competition


Quick Start

cd ./Script
. prerequisite.sh
python3 input_generate.py
python3 input_split.py fine
python3 train_w2v.py creative 128
python3 train_w2v.py ad 128
python3 train_w2v.py advertiser 128
python3 train_w2v.py product 128
python3 train_w2v.py industry 64
python3 train_w2v.py product_category 64
python3 train_v2_age_final_pre_ln_tf_multiInp.py 40 2048 100 1e-3

Script Documentation

Model Training V2

  • How to run training script

    Syntax: python3 train_v2_{some script name}.py 40 2048 100 1e-3

    Argument:

    1. (Required,INT) target epoch to train
    2. (Required,INT) batch size for training
    3. (Required,INT) maximal length of input sequence, smaller length can help train withb larger batch size
    4. (Required,FLOAT) learning rate for adam optimizer
    5. (Optional, INT) If nothing specified then the model will be trained from scratch, otherwise it indicates the epoch to resume training
    6. (Optional, INT) If nothing specified then the model will be trained from scratch, otherwise it indicates the training file to resume training
      • Example: 9, 2 indicates resume training from epoch 9 file 2.
  • Training script inventory

    |--Script
      |--data_loader_v2.py
      |
      |--clf_lstm.py             # Model based on stacked LSTM
      |--clf_gnmt.py             # Model based on GNMT (Google Neural Translation Machine)
      |--clf_tf_enc.py           # Model based on Encoder part of Transformer
      |--clf_esim.py             # Model based on ESIM (Enhanced Sequential Inference Model)
      |--clf_pre_ln_tf.py        # Model based on pre Layer Normalization Transformer
      |--clf_final.py            # Model for final submission
    

Legacy - Model Training V1

  • How to run training script

    Syntax: python3 train_{some script name}.py 0 10 256 100 1e-3 split

    Argument:

    1. (Required,INT) 0 means training from scratch and a positive number means loading the corresponding epoch and start training from there
    2. (Required,INT) number of epoches to train
    3. (Required,INT) batch size for training
    4. (Required,INT) maximal length of input sequence, smaller length can help train withb larger batch size
    5. (Required,FLOAT) learning rate for adam optimizer
    6. (Optional) If nothing specified then the model will be trained using unsplitted files. If python3 input_split.py fine has been executed and a value is specified the model will be trained using a list of splitted files.
  • Training script inventory

    |--Script
      |--data_loader.py
      |
      |--multi_seq_lstm_classifier.py
      |--train_age_multi_seq_lstm_classifier.py
      |--train_gender_multi_seq_lstm_classifier.py
      |
      |--transformer_encoder_classifier.py
      |--train_age_transformer_encoder_classifier_with_creative.py
      |
      |--GNMT_classifier.py
      |--train_age_GNMT_classifier_with_creative.py
      |
      |--multi_seq_GNMT_classifier.py
      |--train_age_multi_seq_GNMT_classifier.py
    

Data Preparation

  • Step 1: run
cd ./Script
. prerequisite.sh

Note that if the instance has no public internet connection, download train file and test file and put them under /Script. You should have the following files and directories after execution.

|--Script
  |--train_artifact
    |--user.csv
    |--click_log.csv
    |--ad.csv
  |--test_artifact
    |--click_log.csv
    |--ad.csv
  |--input_artifact
  |--embed_artifact
  |--model_artifact
  |--output_artifact
  • Step 2: run
python3 input_generate.py
python3 input_split.py

For machine with small memory please replace the second line with python3 input_split.py fine.You should have the following files after execution.

|--Script
  |--input_artifact
    |--train_idx_shuffle.npy
    |--train_age.npy
    |--train_gender.npy
    |--train_creative_id_seq.pkl
    |--train_ad_id_seq.pkl
    |--train_advertiser_id_seq.pkl
    |--train_product_id_seq.pkl
    |--test_idx_shuffle.npy
    |--test_creative_id_seq.pkl
    |--test_ad_id_seq.pkl
    |--test_advertiser_id_seq.pkl
    |--test_product_id_seq.pkl
  |--embed_artifact
    |--embed_train_creative_id_seq.pkl
    |--embed_train_ad_id_seq.pkl
    |--embed_train_advertiser_id_seq.pkl
    |--embed_train_product_id_seq.pkl
  |--model_artifact
  |--output_artifact
  |--train_artifact
  |--test_artifact
  • Step 3: run
python3 train_w2v.py creative 128
python3 train_w2v.py ad 128
python3 train_w2v.py advertiser 128
python3 train_w2v.py product 128
python3 train_w2v.py industry 64
python3 train_w2v.py product_category 64

You should have the following files after exection.

|--Script
  |--embed_artifact
    |--w2v_registry.json
    |--wv_registry.json
    |--creative_sg_embed_s256_{random token}
    |--...
  |--model_artifact
  |--input_artifact
  |--output_artifact
  |--train_artifact
  |--test_artifact

Note that w2v_registry.json stores all the w2v model artifact paths and wv_registry.json stores all the KeyedVector artifact paths.

Materials

tencent-ads-algo-comp-2020's People

Contributors

liuzeyu1994 avatar ywu94 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.