This is a LXH-unfriendly, but YGY-friendly and GCX-CrazyHappy project. Unstable Baselines is designed to provide a quick-start guide for Reinforcement Learning beginners and a codebase for agile algorithm development. In light of this, only the basic version of each algorithm is implemented here, without tedious training skills and code-level optimizations. UB is currently maintained by researchers from lamda-rl, and a pypi source will be available once it is ready for publishing.
- Deep Q Learning (DQN)
- Deep Deterministic Policy Gradient (DDPG)
- Soft Actor Critic (SAC)
- Twin Delayed Deep Deterministic policy gradient algorithm (TD3)
- Randomized Ensembled Double Q-Learning (REDQ)
- Proximal Policy Optimization (PPO)
- Model-based Policy Optimization (MBPO)
- Efficient Off-policy Meta-learning via Probabilistic Context Variables (PEARL)
git clone --recurse-submodules https://github.com/x35f/unstable_baselines.git
cd unstable_baselines
conda env create -f env.yaml
conda activate rl_base
pip3 install -e .
python3 /path/to/algorithm/main.py /path/to/algorithm/configs/some-config.json args(optional)
#install metaworld for meta_rl benchmark
cd envs/metaworld
pip install -e .
#install atari
pip install gym[all]