susanth-24 / proximal-policy-optimization Goto Github PK
View Code? Open in Web Editor NEWProximal Policy Optimization, or PPO, is a policy gradient method for reinforcement learning. The motivation was to have an algorithm with the data efficiency and reliable performance of TRPO