Code Monkey home page Code Monkey logo

samantha's Introduction

What is Samantha

  • A generic recommender and predictor server for both offline machine learning and recommendation modeling and fast online production serving.
  • MIT licence, oriented to production use (online field experiments in research and typical industrial use)

What Samantha Can Do

  • Full-fledged, self-contained server that can be used in production right away with one configuration file, including the following components
  • Data management, including offline and online, in (indexing) and out (post-processing), through configurable backends of most relational databases (e.g. MySQL, PostresSQL, SQLServer etc.), ElasticSearch or Redis.
  • Model management, including online updating, building, loading, dumping and serving.
  • Data processing pipeline based on a data expanding and feature extraction framework
  • State-of-the-art models: collaborative filtering, matrix factorization, knn, trees, boosting and bandits/reinforcement learning
  • Experimental framework for randomized A/B and bucket testing
  • Feedback (for online learning/optimization) and evaluation (for experimenting) loops among application front-end, application back-end server and Samantha
  • Abstracted model parameter server (through extensible variable and index spaces)
  • Generic oracle-based optimization framework/solver with classic solvers
  • Flexible model dependency, e.g. model ensemble, stacking, boosting
  • Schedulers for regular model rebuilding or backup
  • Integration with other state-of-the-art systems including XGBoost and TensorFlow.
  • Control and customize all these components through one centralized configuration file

The Targeted Users of Samantha

  • Individuals or organizations who want to deploy a data-driven predictive system with minimum effort. They might need it to support answering relevant research questions involving an intelligent predictive part in their system or just to have an initial try to see the effects of such a predictive component.
  • Individuals or organizations who are working on comparing and developing new machine learning or recommendation models or algorithms, especially those who care about deploying their models/algorithms into production and evaluate them in front of end users

Documentation

Introduction

Setup

Citation of the Tool

  • Qian Zhao. 2018. User-Centric Design and Evaluation of Online Interactive Recommender Systems. Ph.D. Thesis. University of Minnesota.

Note

  • Samantha is a project developed by Qian Zhao, Ph.D. at GroupLens Research lab (graduated on May 2018) and originated from his research projects there. Samantha might be integrated with Lenskit in future.

Publications Based On Samantha

  • Qian Zhao, Yue Shi, Liangjie Hong. GB-CENT: Gradient Boosted Categorical Embedding and Numerical Trees. In Proceedings of the 26th International World Wide Web conference (WWW 2017), ACM, 2017. (see branch qian/gbcent, docs/README.md for details)

  • Qian Zhao, Jilin Chen, Minmin Chen, Sagar Jain, Alex Beutel, Francois Belletti, Ed Chi. 2018. Categorical-Attributes-Based Item Classification for Recommender Systems. In Proceedings of The 12th ACM Conference on Recommender Systems (RecSys’18). ACM, New York, NY, USA. (see branch qian/hsm for details)

  • Qian Zhao, Martijn Willemsen, Gediminas Adomavicius, F. Maxwell Harper, Joseph A. Konsta. 2019. From Preference Into Decision Making: Modeling User Interactions in Recommender Systems. In Proceedings of The 13th ACM Conference on Recommender Systems (RecSys’19). ACM, New York, NY, USA. (see branch qian/interaction for details)

  • Qian Zhao, F. Maxwell Harper, Gediminas Adomavicius, Joseph Konstan. Explicit or Implicit Feedback? Engagement or Satisfaction? A Field Experiment on Machine-Learning-Based Recommender Systems. In Proceedings of the 33rd ACM/SIGAPP Symposium On Applied Computing, Track of Recommender Systems: Theory, User Interactions and Applications (SAC 2018), ACM, 2018. (see Reinforce-State, Bandit-*, MF-*)

samantha's People

Contributors

qian2015 avatar taavi223 avatar will-qianzhao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

samantha's Issues

SGD fails to converge when order of the data is sequential

For the movielens dataset, SGD (and ParallelSGD) will fail to converge when using the following settings with a timestamp ordered training set.

learningRate = 0.01
l2coef = 0.001

Randomizing the order of the training set fixes the problem—SGD converges and the resulting model has good performance.

We should (at the very least) make a note in the documentation that users may need to randomize the order of their datasets prior to training, particularly if you get poor performance or the algorithm fails to converge.

We may also want to offer a DAO that handles this randomization for the user.

Fire warnings if the used attr/field is not present while doing feature extraction

//TODO: warn if attr is not present
//TODO: currently mostly feature extractors are using attrName in data to be the key internal representation, consider separate them and use attrName as default
public interface FeatureExtractor extends Serializable {
Map<String, List> extract(JsonNode entity, boolean update,
IndexSpace indexSpace);
}

Add the ability to set the number of cores for parallel learning methods

  • Allow user to specify all available cores by passing in 0, null, or just not specifying a number
  • Allow user to specify a specific number of cores by passing in a number >= 1 (up to the number of available cores)
  • Allow user to specify a percentage of cores by passing in a number x between (0, 1), in which case we will multiply the number of available cores by x, rounding to the nearest integer (but not rounding down to 0 or up to the number of available cores).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.