Code Monkey home page Code Monkey logo

tabularl's Introduction

TabulaRL

Library for implementing Tabular Reinforcement Learning Algorithms

Implementing a RL algorithm it is as simple as changing parameters of a TaskController instance

tc = TaskController(
        cliff_jump,  <-- The task being performed 
        learning_algorithm='FIRST_VISIT_MC', <-- QLEARNING, SARSA, EXPECTED_SARSA, EVERY_VISIT_MC, FIRST_VISIT_MC
        exploration=True, <-- Whether to explore or not
        exploration_decay='CONSTANT', <-- CONSTANT, SIMULATED_ANNEALING, LOGARITHMIC (decay of exploration rate)
        exploration_strategy='SOFT_E_GREEDY', <-- E_GREEDY, SOFT_E_GREEDY
        exploration_epsilon=0.8, <-- exploration rate
        learning_rate=0.05, <-- learning rate
        learning_rate_decay='EXPONENTIAL', <-- CONSTANT, LOGARITHMIC (decay of learning rate)
        gamma=0.9, <-- gamma to look ahead
        online_learning=False <-- Learn online or in episodes
    )

Some results for a toy task cliff_jump as defined in the book Reinforcement Learning: An Introduction are displayed below. The task is defined as follows -

alt text

In above image, you start at bottom left corner and try to reach bottom right corner (goal). If you reach goal, you get 100 points, if you enter any of red cells, you get -50, and for every movement you get either 0 or -1 (if you want to reach fastest) depending on game

GAME TYPE 1:

The agents path is in blue and you get 0 for every move and 100 if goal reached and -50 if you enter red area. trained for 1000 episodes with various tuned parameters

Results:

Every Visit Monte Carlo First Visit Monte Carlo
SARSA with online learning SARSA with batch learning
Expected SARSA with online learning Expected SARSA with batch learning
Qlearning in online learning mode Qlearning in batch learning mode

GAME TYPE 2:

The agents path is in blue and you get 0 for every move and 100 if goal reached and -50 if you enter red area.

We can even see difference here. Even though travelling by 2nd lowest row is fastest route and gives max reward for this type of game, expected SARSA still takes safe route while q learning takes riskier but faster route

Results:

Qlearning in online learning mode Expected SARSA in online learning mode

tabularl's People

Contributors

gauravtendolkar avatar tendolkar3 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.