Code Monkey home page Code Monkey logo

metaspore's People

Contributors

andafterall avatar cheng-su avatar codingfun2022 avatar dependabot[bot] avatar dmetasoul-opensource avatar dmetasoul01 avatar gufenqing avatar hades-888 avatar intelligencegear avatar is-shidian avatar javyxu avatar liusy12138 avatar longborn avatar qinyy907 avatar raphael-jin avatar trellixvulnteam avatar wikty avatar xuchen-plus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metaspore's Issues

[demo] Text-to-Image Multimodal Retrieval Demo

Text-to-image semantic retrieval demo based on Unsplash Lite 2.5K image dataset. It enables user search image by natural language text.

The demo including the following parts:

  • online retrieval pipeline service
  • offline model export, data fetch and index

[demo] DCN V2

Add DCN V2 to CTR Demo and give benchmark on MovieLens and Criteo datasets.

DCN V2
criteo_d5:
Train AUC: 0.7487 Test AUC: 0.7290
1m uid mid:
Train AUC: 0.8901 Test AUC: 0.8611
25m uid mid:
Train AUC: 0.8888 Test AUC: 0.8323

[demo] DCN

Add DCN to CTR Demo and give benchmark on MovieLens and Criteo datasets.

25m uid mid:
Train AUC: 0.8972 Test AUC: 0.8430
1m uid mid:
Train AUC: 0.9021 Test AUC: 0.8746
criteo_d5:
Train AUC: 0.7413 Test AUC: 0.7304

[demo] MaximalMarginalRelevanceDiversifier demo

  1. Add diversification algorithms. We implemented a diversification model, named "Maximize Marginal Relevance Disperser" which refers to the paper "The Use of MMR, Diversity-Based Reranking for Reordering The dispersing method mentioned in Documents and Producing Summaries". Compared with SimpleDiversifier, MaximalMarginalRelevanceDiversifier can take into account information in multiple dimensions.
  2. Integrating the MaximalMarginalRelevanceDiversifier into the pipeline after we completed the unit test of MaximalMarginalRelevanceDiversifier. In addition, we have updated the configuration information of the diversify method in the Consul file.
    .

[movielens demo] add python.zip when we submit `fg_movielens.py` PySpark job

For the MovieLens Demo, we would better add python.zip before we submit submit fg_movielens.py PySpark job.

def init_spark():
    ## add a line of code here
    subprocess.run(['zip', '-r', './python.zip', 'fg_neg_sampler.py', 'fg_sparse_features_extractor.py', 'fg_gbm_features_extractor.py' ], cwd='./')
    spark = (SparkSession.builder
        .appName('MovieLens Demo')
        .config("spark.executor.memory","10G")
        .config("spark.submit.pyFiles", "python.zip")
        .config("spark.executor.instances","4")
        .config("spark.network.timeout","500")
        .getOrCreate())
    ... 

Moreover, please change the function name generate_spare_features to generate_sparse_features in fg_sparse_features_extractor.py

[demo] DeepFM

Add DeepFM to CTR Demo and give benchmark on MovieLens and Criteo datasets.

25m uid mid:
Train AUC: 0.8908 Test AUC: 0.8359
1m uid mid:
Train AUC: 0.8891 Test AUC: 0.8658
criteo_d5:
Train AUC: 0.7531 Test AUC: 0.7271

[demo] PNN

Add PNN to CTR Demo and give benchmark on MovieLens and Criteo datasets.

iPNN
criteo_d5:
Train AUC: 0.7544 Test AUC: 0.7292
1m uid mid:
Train AUC: 0.8914 Test AUC: 0.8649
25m uid mid:
Train AUC: 0.8916 Test AUC: 0.8362

oPNN
criteo_d5:
Train AUC: 0.7533 Test AUC: 0.7287
1m uid mid:
Train AUC: 0.8896 Test AUC: 0.8633
25m uid mid:
Train AUC: 0.8905 Test AUC: 0.8353

[demo] Wide&Deep

Add Wide&Deep to CTR Demo and give benchmark on MovieLens and Criteo datasets.

25m uid mid:
Train AUC: 0.8898 Test AUC: 0.8343
1m uid mid:
Train AUC: 0.8937 Test AUC: 0.8682
criteo_d5:
Train AUC: 0.7394 Test AUC: 0.7294

[demo] Unify data processing for demo projects

Unify the data processing for movieLens-1m, movielens-25m, criteo-5d and other datasets, including feature generation, match dataset generation, ranking dataset generation, negative sampling, etc.

[demo] QA Multimodal Retrieval Demo

QA is a text-to-text semantic retrieval demo based on 1M Baike-Question-Answer database.

The demo including the following parts

  1. online system: an end-to-end online retrieval services.
  2. offline system: model training and export, data fetch and index.

[training] Support kubeflow pipeline build

  1. Refactor code organization with seperate algo, runner, component and pipeline definitions.
  2. Auto export kubeflow components of built-in algo runners and also a python decorator for customized use.
  3. Load components by name to construct kubeflow pipeline and upload it automatically.

[demo] AutoInt

Add AutoInt to CTR Demo and give benchmark on MovieLens and Criteo datasets.

AutoInt
criteo_d5:
Train AUC: 0.7558 Test AUC: 0.7361
1m uid mid:
Train AUC: 0.9028 Test AUC: 0.8741
25m uid mid:
Train AUC: 0.8968 Test AUC: 0.8421

[demo] xDeepFM

Add xDeepFM to CTR Demo and give benchmark on MovieLens and Criteo datasets.

xDeepFM
criteo_d5:
Train AUC: 0.7541 Test AUC: 0.7300
1m uid mid:
Train AUC: 0.8892 Test AUC: 0.8641
25m uid mid:
Train AUC: 0.8911 Test AUC: 0.8367

[serving] IPC framework between cpp and python

Goal

To provide a framework for calling python method in user custom scripts. Python code is executed in a separate process rather than embeded in cpp process. The CPython interpreters are run on a per-thread basis to avoid GIL contention.

Design

  1. Control plane via gRPC and unix domain socket between cpp and python;
  2. Data plane via either gRPC for small data and shared memory for large data;
  3. Model packaged with customized python venv and user scripts;
  4. For each cpp compute thread, run a CPython interpreter process with user entry script;
  5. Provide an async iterator style interface for python.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.