TabulaRL

Library for implementing Tabular Reinforcement Learning Algorithms

Implementing a RL algorithm it is as simple as changing parameters of a TaskController instance

tc = TaskController(
        cliff_jump,  <-- The task being performed 
        learning_algorithm='FIRST_VISIT_MC', <-- QLEARNING, SARSA, EXPECTED_SARSA, EVERY_VISIT_MC, FIRST_VISIT_MC
        exploration=True, <-- Whether to explore or not
        exploration_decay='CONSTANT', <-- CONSTANT, SIMULATED_ANNEALING, LOGARITHMIC (decay of exploration rate)
        exploration_strategy='SOFT_E_GREEDY', <-- E_GREEDY, SOFT_E_GREEDY
        exploration_epsilon=0.8, <-- exploration rate
        learning_rate=0.05, <-- learning rate
        learning_rate_decay='EXPONENTIAL', <-- CONSTANT, LOGARITHMIC (decay of learning rate)
        gamma=0.9, <-- gamma to look ahead
        online_learning=False <-- Learn online or in episodes
    )

Some results for a toy task cliff_jump as defined in the book Reinforcement Learning: An Introduction are displayed below. The task is defined as follows -

alt text

In above image, you start at bottom left corner and try to reach bottom right corner (goal). If you reach goal, you get 100 points, if you enter any of red cells, you get -50, and for every movement you get either 0 or -1 (if you want to reach fastest) depending on game

GAME TYPE 1:

The agents path is in blue and you get 0 for every move and 100 if goal reached and -50 if you enter red area. trained for 1000 episodes with various tuned parameters

Results:

Every Visit Monte Carlo	First Visit Monte Carlo

SARSA with online learning	SARSA with batch learning

Expected SARSA with online learning	Expected SARSA with batch learning

Qlearning in online learning mode	Qlearning in batch learning mode

GAME TYPE 2:

The agents path is in blue and you get 0 for every move and 100 if goal reached and -50 if you enter red area.

We can even see difference here. Even though travelling by 2nd lowest row is fastest route and gives max reward for this type of game, expected SARSA still takes safe route while q learning takes riskier but faster route

Results:

Qlearning in online learning mode	Expected SARSA in online learning mode

gauravtendolkar / tabularl Goto Github PK

tabularl's Introduction

TabulaRL

GAME TYPE 1:

Results:

GAME TYPE 2:

Results:

tabularl's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent