Code Monkey home page Code Monkey logo

bandit-panda's Introduction

Bandit Panda

This repository is a collection of pandas-based implementations of algorithms for the Multi-Armed Bandit Problem, heavily inspired by John Myles White's Bandit Algorithms for Website Optimization. The original intent of this repository was to implement the algorithms present on the book (epsilon-greedy, Softmax with and without annealing and UCB1) using the convenient DataFrame structures and functions provided by pandas. However, since it is a "training wheels" project, the repository will probably be expanded with other algorithms and tools.

Structure

The simulation_test.py script implements Myles White's testing framework: it creates Arm objects based on a list of arm parameters, and then runs a fixed number of simulations with a predefined run length. For each simulation, the counts and average rewards for each arm are updated, with the results of each iteration inside of a simulation (which arm was pulled and what was the reward) being saved on a DataFrame which is then saved as a CSV file.

Implementations of distributions are done through inheritance from the Arm class. Each arm must have a draw() method which takes no arguments and returns a single instance of the reward from that arm.

Algorithms are implemented through inheritance from the Algorithm class. Besides their constructor, they should provide an update method which takens as input the ID of the chosen arm on a given run and the reward from pulling that arm, a select_arm method with no parameters which returns the chosen arm in a given pull, and an initialize method which sets initial values for the counts and average rewards for each arm.

Currently Implemented

Arms

  • BernoulliArm: rewards are drawn from a Bernoulli distribution with probability parameter p.

  • NormalArm: rewards are drawn from a Normal distribution with parameters mean and sigma.

  • LogNormalArm: rewards are drawn from a Log-Normal distribution, where the underlying Gaussian is defined by parameters mean and sigma.

  • RPVArm: a crude model of customer purchase behaviour: rewards depend on a Bernoulli distribution with parameter p which models the probability of a customer making a purchase and on a Normal distribution with parameters mean and sigma that describe their average ticket.

Algorithms

  • epsilon-greedy: chooses randomly from all arms based on a parameter, otherwise takes the arm that is the current best performer.

  • Softmax: exploration is done in a random fashion, controlled by a temperature parameter. In the annealing mode, the temperature varies with time.

  • UCB1: Algorithm usable for reward distributions with support on [0,1], choosing arms based on a maximizing function instead of picking them at random according to parameters.

  • UCB1-Normal: UCB1 variant for the case in which the reward functions are Gaussian.

  • UCB1-RPV: a variant of UCB1-Normal which ensures that each arm will have at least one draw with a non-null reward, giving a fairer trial to low-probability arms.

TO DO

  • Write a decent and useful README

  • Improve algorithm performance

  • Compare pandas implementation with implementations using other languages or libraries

  • Implement Thompson sampling

  • Create scripts for test result visualization

  • Implement the Environment abstraction from Myles White's repository

  • Introduce sample techniques for result analyses

bandit-panda's People

Contributors

marcoalmada avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.