Code Monkey home page Code Monkey logo

pongpg-cnn's Introduction

PongPG-CNN

Using policy gradient to play atari game pong(deterministic-v4) on gym

Based on

Environment

  • tensorflow-cpu 1.8.0
  • PongDeterministic-v4

Dependencies & Usage

  • refer to (2)  

主要改动

1. 预处理:

80x80 -> 40x80,取出图像中间160x160并采样为40x80(这样采样后球刚好还占有1x1的像素,如果采成40x40,在某些帧中球就消失了)

2. 网络输入:

(cur_x-prev_x) -> [prev_x, cur_x], 前后帧差分改为前后帧一起输入(没有做实验比较差别。。)

3. 网络结构:
  • 改动前: 200个隐单元的全连接层加输出层
  • 改动后:
    • 输入: 40x80x2,
    • 第一层:5x2, 4 channel,
    • 第二层:3x1, 8 channel,
    • 第三层:100, 全连接,
    • 输出层:1,  动作概率,这里是向上的概率(这里只考虑两个动作)
4.batch:
  • 完成一场完整比赛(21分),为一个episode,
  • (2)中每n=10个episode训练一次,一个episode大约有300~1000帧数据
  • 本方法中每次获得reward(+1/-1)作为一个step,每个step只取结束前60帧,每40step训练一次,一次训练的输入最大2400帧(还可自行调整)

训练结果 

  • 网络规模较小,直接在CPU训练,大约14小时reward可以到0左右【(2)在pong-v0上训练50小时到达0附近】
  • 持续训练约24小时左右可以达到+9左右reward, 据不完全统计此时对战Pong-v0还是有不小的胜率

pongpg-cnn's People

Contributors

wjllance avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.