bgalbraith / bandits Goto Github PK

View Code? Open in Web Editor NEW

731.0 731.0 155.0 711 KB

Python library for Multi-Armed Bandits

License: Apache License 2.0

Python 2.04% Jupyter Notebook 97.96%

bandits's People

Contributors

Stargazers

Watchers

Forkers

sshegheva vakjha adrianyi gtostock xiutingmi vohuynh1990 tomkimpson lenovor pgnepal salemameen vyraun wroscoe lorenzo all-seeing-code jdc08161063 shannonyu benjamesbabala traveler817 fulquan daxiaoluo sroecker fmailhot ecdeng tvkpz lujingyang1029 caohy1988 xingyupan caljohnson jtremback daroza kaushalya romanusyk fifar jwuthri wajustinzhang neo4reo wutenghu tobymao syzdemonhunter delongwu ardittot yuqianghan min-yang ariewahyu mdiby yun1221 brahmaslee liuguoping pinaappleg ranyishere maksim-vatkin moherx eugenepy minsu-daniel-kim shubhampachori12110095 h3dema zuoyingqi chryzanthemum jungi21cc cgxabc afcarl leechang-soo oscardaniel88 sanchitaggarwal ajeetppe liwzhi mostafa86 shijunw sinanh amjadhisham ebengin zhu2856061 shafiahmed joelenahoro 1789291 eledir xflee liusongee mattgorb mrr-phys 0xshreyash sunyong2016 4v jerrycatleung timkam dobriban mragunathan helloworldwq hereispatrick lichaoliu666 jaykimbravekjh pikefish smartjennings axelabels anirudhs96 ahcheriet rohansaphal97 schizism drkwint a515151

bandits's Issues

which version of python does this project need?

A problem in bandits/bandits/bandit.py

Thank you very much for the code of n-armed bandits problem.
I read your code, and found there may be a mistake in the file
bandits/bandits/bandit.py
line 38: return (np.random.normal(self.action_values[action]),

I think the correct one should be the following
line 38: return (self.action_values[action],

If you use np.random.normal() here, a new random number will be generated with mean self.action_values[action], this is not what we want, right?

epsilon-greedy choose function may be wrong

class EpsilonGreedyPolicy(Policy):
    [................................]

    def choose(self, agent):
        if np.random.random() < self.epsilon:
            return np.random.choice(len(agent.value_estimates))
        else:
            action = np.argmax(agent.value_estimates)                    <---------
            check = np.where(agent.value_estimates == action)[0]   <------
            if len(check) == 0:
                return action
            else:
                return np.random.choice(check)

I don't really get how the lines with "<-----------" work. Action is an index of value_estimates, okay, but in the second line I think you are comparing an index with value_estimates values!! This is the reason why len(check) can be 0. I believe the correct code would be:


def choose(self, agent):
        if np.random.random() < self.epsilon:
            return np.random.choice(len(agent.value_estimates))
        else:
            action = np.argmax(agent.value_estimates)                    <---------
            check = np.where(agent.value_estimates == agent.value_estimates[action])[0]   <------
            if len(check) == 1:   <--- At least there is going to be 1 
                return action
            else:    <---- Ties are solved randomly
                return np.random.choice(check)

Please, let me know if I'm mistaking. Thank you!

Do not understand the code

I see your policy in bandits/agent.py

class EpsilonGreedyPolicy(Policy):
    """
    The Epsilon-Greedy policy will choose a random action with probability
    epsilon and take the best apparent approach with probability 1-epsilon. If
    multiple actions are tied for best choice, then a random action from that
    subset is selected.
    """
    def __init__(self, epsilon):
        self.epsilon = epsilon

    def __str__(self):
        return '\u03B5-greedy (\u03B5={})'.format(self.epsilon)

    def choose(self, agent):
        if np.random.random() < self.epsilon:
            return np.random.choice(len(agent.value_estimates))
        else:
            action = np.argmax(agent.value_estimates)
            check = np.where(agent.value_estimates == action)[0]
            if len(check) == 0:
                return action
            else:
                return np.random.choice(check)

I do not understand what check means. Action is the biggest element's indices in the array. And what check means? Also what does its length indicate?

Thank you~

bgalbraith / bandits Goto Github PK

bandits's People

Contributors

Stargazers

Watchers

Forkers

bandits's Issues

which version of python does this project need?

A problem in bandits/bandits/bandit.py

epsilon-greedy choose function may be wrong

Do not understand the code

Got error when importing pymc3

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent