Code Monkey home page Code Monkey logo

gym-td3-keras's Introduction

gym-td3-keras

Reference Code : gym-ddpg-keras(DDPG)

Keras Implementation of TD3(Twin Delayed Deep Deterministic Policy Gradient) with PER(Prioritized Experience Replay) option on OpenAI gym framework

STATUS : IN PROGRESS

This branch is just for debugging, change the branch to main.


Test on Simulation

  • RoboschoolInvertedPendulum-v1
  • RoboschoolHopper-v1
  • RoboschoolHalfCheetah-v1
  • RoboschoolAnt-v1

Experiment Details from paper

Network Model & Hyperparameter

  • For our implementation of DDPG, we use a two layer feedforward neural network of 400 and 300 hidden nodes respectively, with rectified linear units (ReLU) between each layer for both the actor and critic, and a final tanh unit following the output of the actor.
  • Unlike the original DDPG, the critic receives both the state and action as input to the first layer.
  • Both network parameters are updated using Adam with a learning rate of 10−3.
  • After each time step, the networks are trained with a mini-batch of a 100 transitions, sampled uniformly from a replay buffer containing the entire history of the agent.
  • Both target networks are updated with τ = 0.005.

Differences from DDPG

  • The target policy smoothing is implemented by adding img to the actions chosen by the target actor network, clipped to (−0.5, 0.5).

  • Delayed policy updates consists of only updating the actor and target critic network every d iterations, with d = 2.

    (While a larger d would result in a larger benefit with respect to accumulating errors, for fair comparison, the critics are only trained once per time step, and training the actor for too few iterations would cripple learning.)

Exploration

  • To remove the dependency on the initial parameters of the policy we use a purely exploratory policy for the first 10000 time steps of stable length environments.

  • Afterwards, we use an off-policy exploration strategy, adding Gaussian noise N (0, 0.1) to each action.

    (we found noise drawn from the Ornstein-Uhlenbeck process offered no performance benefits.)

Evaluation

  • Each task is run for 1 million time steps with evaluations every 5000 time steps, where each evaluation reports the average reward over 10 episodes with no exploration noise.

Easy Installation

  1. Make an independent environment using virtualenv
# install virtualenv module
sudo apt-get install python3-pip
sudo pip3 install virtualenv

# create a virtual environment named venv
virtualenv venv 

# activate the environment
source venv/bin/activate 

​ To escape the environment, deactivate

  1. Install the requirements
pip install -r requirements.txt
  1. Run the training node
#trainnig
python train.py

Reference

[1] Addressing Function Approximation Error in Actor-Critic Methods

@misc{fujimoto2018addressing,
      title={Addressing Function Approximation Error in Actor-Critic Methods}, 
      author={Scott Fujimoto and Herke van Hoof and David Meger},
      year={2018},
      eprint={1802.09477},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

REVIEW | PAPER

[2] CUN-bjy/gym-ddpg-keras

[3] sfujim/TD3

[4] quantumiracle/SOTA-RL-Algorithms

gym-td3-keras's People

Contributors

cun-bjy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

gym-td3-keras's Issues

td3_implementation analysis

the first TD3 implementation do not work well..

so.. have to analysis each part of the differences from ddpg

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.