themtank / cups-rl Goto Github PK

Customisable Unified Physical Simulations (CUPS) for Reinforcement Learning. Experiments run on the ai2thor environment (http://ai2thor.allenai.org/) e.g. using A3C, RainbowDQN and A3C_GA (Gated Attention multi-modal fusion) for Task-Oriented Language Grounding (tasks specified by natural language instructions) e.g. "Pick up the Cup or else"

Home Page: http://www.themtank.org

License: MIT License

Python 100.00%

reinforcement-learning robotics cups simulated-environments multi-task-learning cup a3c rainbow model-based transfer-learning

cups-rl's Introduction

cups-rl - Customisable Unified Physical Simulations for Reinforcement Learning

This project will focus primarily on the implementation and benchmark of different approaches to domain and task transfer learning in reinforcement learning. The focus lies on a diverse set of simplified domestic robot tasks using ai2thor, a realistic household 3D environment. To provide an example, an agent could learn to pick up a cup under particular conditions and then zero/few shot transfer to pick up many different cups in many different situations.

We included our own wrapper for the environment as well to support the modification of the tasks within an openAI gym interface, so that new and more complex tasks can be developed efficiently to train and test the agent.

We have begun a long-running blog series which will go into more detail about this repo and how to use all of the features of our wrapper.

Currently using ai2thor version 0.0.44 and up. More detailed information on ai2thor environment can be found on their website.

A3C agent learning during training on NaturalLanguagePickUpMultipleObjectTask in one of our customized scenes and tasks with the target object being CUPS!

Overview

This project will include implementations and adaptations of the following papers as a benchmark of the current state of the art approaches to the problem:

Ikostrikov's A3C
Gated-Attention Architectures for Task-Oriented Language Grounding -- Original code available on DeepRL-Grounding based on Ikostrikov's A3C
Rainbow: Combining Improvements in Deep Reinforcement Learning -- Original code available on Rainbow from Kaixhin

Implementations of these can be found in the algorithms folder and can be run on AI2ThorEnv with:

python algorithms/a3c/main.py

python algorithms/rainbow/main.py

Check the argparse help for more details and variations of running the algorithm with different hyperparams and on the atari environment as well.

Installation

Clone cups-rl repository:

# CUPS_RL=/path/to/clone/cups-rl
git clone https://github.com/TheMTank/cups-rl.git $CUPS_RL

Install Python dependencies (Currently only supporting python 3.5+):

pip install -r $CUPS_RL/requirements.txt

Finally, add CUPS_RL to your PYTHONPATH environment variable and you are done.

How to use

The wrapper is based on OpenAI gym interfaces as described in gym documentation. Here is a simple example with the default configuration, that will place the agent in a "Kitchen" for the task of picking up and putting down mugs.

from gym_ai2thor.envs.ai2thor_env import AI2ThorEnv
N_EPISODES = 20
env = AI2ThorEnv()
max_episode_length = env.task.max_episode_length
for episode in range(N_EPISODES):
    state = env.reset()
    for step_num in range(max_episode_length):
        action = env.action_space.sample()
        state, reward, done, info = env.step(action)
        if done:
            break

Environment and Task configurations

The environment is typically defined by a JSON configuration file located on the gym_ai2thor/config_files folder. You can find a full example at config_example.json to see how to customize it. Here there is another one as well:

# gym_ai2thor/config_files/myconfig.json
{'pickup_put_interaction': True,
 'open_close_interaction': true,
 'pickup_objects': ['Mug', 'Apple', 'Book'],
 'acceptable_receptacles': ['CounterTop', 'TableTop', 'Sink'],
 'openable_objects': ['Microwave'],
 'scene_id': 'FloorPlan28',
 'gridSize': 0.1,
 'continuous_movement': true,
 'grayscale': True,
 'resolution': (300, 300),
 'task': {'task_name': 'PickUp',
          'target_object': {"Mug": 1}}}

For experimentation it is important to be able to make slight modifications of the environment without having to create a new config file each time. The class AI2ThorEnv includes the keyword argument config_dict, that allows to input a python dictionary in addition to the config file that overrides the parameters described in the config.

The tasks are defined in gym_ai2thor/tasks.py and allow for particular configurations regarding the rewards given and termination conditions for an episode. You can use the tasks that we defined there or create your own by adding it as a subclass of BaseTask. Here an example of a new task definition:

# gym_ai2thor/tasks.py
class MoveAheadTask(BaseTask):
    def __init__(self, *args, **kwargs):
        super().__init__(kwargs)
        self.rewards = []

    def transition_reward(self, state):
        reward = 1 if state.metadata['lastAction'] == 'MoveAhead' else -1 
        self.rewards.append(reward)
        done = sum(rewards) > 100 or self.step_num > self.max_episode_length
        if done:
            self.rewards = []
        return reward, done

    def reset(self):
        self.step_num = 0

We encourage you to explore the scripts on the examples folder to guide you on the wrapper functionalities and explore how to create more customized versions of ai2thor environments and tasks. It is possible for the agent to do continuous rotation with 10 degrees by setting continuous_movement: True in the config as well, see task_on_ground_variation.py in examples.

In the config build_file_name can be set to a file/folder combination within gym_ai2thor/build_files. We provide a preliminary unity build that you can download from Google Drive here but of course you can create your own by following the instructions on the ai2thor repository. We will be adding more builds in the future.

Here is the desired result of an example task in which the goal of the agent is to place a cup in the sink.

Example of task "place cup in sink"

The Team

MTank is a non-partisan organisation that works solely to recognise the multifaceted nature of Artificial Intelligence research and to highlight key developments within all sectors affected by these advancements. Through the creation of unique resources, the combination of ideas and their provision to the public, this project hopes to encourage the dialogue which is beginning to take place globally.

To produce value for the individual, for researchers, for institutions and for the world.

License

This project is released under the MIT license.

cups-rl's People

Contributors

Stargazers

Watchers

Forkers

wwxfromtju zjut-jianhuazhang fudifudi hyzcn mohitshridhar flint-xf-fan rwgardner2

cups-rl's Issues

Regarding to policy/model/weights

Do you mind to clarify is the policy/model/weights saved after each epoch/iteration? If not how should I make it happen? If yes where is it saved at? I see you just call the model 'args.model_path' in agent.py and save(self, path, filename) without specifically assigning a path or name.

Thank you for replying.

ploting the result

Hi,
I have another question. After I run the main.py of rainbow there is no plot coming out, but clearly in the test.py there is a _plot_line command which is supposed to be called if the main.py is executed. After checking, it is discovered that in main.py line149:
avg_reward, avg_Q = test(env, mem_steps, args, dqn, val_mem, evaluate_only=True)
and line179:
if num_steps % args.evaluation_interval == 0:
(hence line182 avg_reward, avg_Q = test(env, num_steps, args, dqn, val_mem) )

were never executed, which means that:
if args.evaluate_only: #line147
and
if num_steps >= args.learn_start: #line172

were never satisfied.

I didn't modify any part of the code except changing --max-num-steps (line 34) to 1000 for checking result promptly.

Do you have any clue what is the cause?

Thank you.

make random action in rainbow

Hi, may I ask is it possible to make a sequence of random actions by adding a line of code in main.py of Rainbow?

How long does it take to train rainbow

Roughly how long it took you to train Rainbow

Change the maximum number of steps for one eposide

Hi there, i try to change the max_step_num for rainbow algorithms under the main.py, however it doesn't seem to work. Can advise me please? Thank you

Randomize object's location

Does the current wrapper allow to randomize object's location?

Mulit agent

Hi, does the ai2thor_env allow mulit agent to run in the same room?

decouple RL algorithm with gym wrapper

Hi, could you provide some guidelines on how to decouple RL algorithm(A3C, Rainbow) with gym wrapper? I would like to make ai2thor analogous as a sub-environment of gym, such at setting it as an environment could be done by env = gym.make(args.env_id) where the env_id is ai2thor without RL algo being involved. I tried to just use

env = FrameStackEnv(AI2ThorEnv(config_file=args.config_file), args.history_length,args.device)

but this line asks for history_length and device, which I supposed is the argument for the RL algo. Also the class Env is under the algo's directory so everything seems very coupled. Is there some way to separate them?

Really appreciate it if can give some guidance on this issue.

Is there any ways to improve trainning speed?

I am currently training the rainbow model on a GPU, i try to train 2 models simultaneously however, the training time become much slower. Is there any suggestion that i can improve my training speed? I am training them on a GeForce RTX 2080 Ti

How to train on server?

I would like to train the rainbow main.py on a remote server without a GUI?

How to add in new movement like Crouch and stand

Hi,can advice me on how to add in new movement such as crouch and stand.
Thank you

Enquiry Regarding Running random agent using build file

Hello all,
First of all , great project and thanks for the sharing the code. I am currently interested in training the PickupTask using a3c or rainbow. I am facing issue while running random agent using the build file mentioned in readme([https://drive.google.com/open?id=1UlmAnLuDVBYEiw_xPsGcbuXQTAiNwo8E](build file)). I extracted it and put it in gym_ai2thor/build_files/.
The agent doesn't move or take any action after instantiation of controller.

It contains:

build_bowls_vs_cups_fp1_v_0.1.x86_64 (file)
build_bowls_vs_cups_fp1_v_0.1_Data (folder)

the code I am running is as follows:

import time

import gym
from gym_ai2thor.envs.ai2thor_env import AI2ThorEnv

N_EPISODES = 3


if __name__ == '__main__':
    config_dict = {
        'max_episode_length': 200,
        "build_file_name": "pickup_build_bowl/build_bowls_vs_cups_fp1_v_0.1.x86_64"
    }
    env = AI2ThorEnv(config_dict=config_dict)
    max_episode_length = env.task.max_episode_length
    for episode in range(N_EPISODES):
        start = time.time()
        state = env.reset()
        for step_num in range(max_episode_length):
            action = env.action_space.sample()
            state, reward, done, _ = env.step(action)
            if done:
                break

            if step_num + 1 > 0 and (step_num + 1) % 100 == 0:
                print('Episode: {}. Step: {}/{}. Time taken: {:.3f}s'.format(episode + 1,
                                         (step_num + 1), max_episode_length, time.time() - start))
                start = time.time()

this is the output on the terminal :

home/srinjoym/Documents/cups-rl/gym_ai2thor/utils.py:62: UserWarning: Key: build_file_name already in config file with value False. Overwriting with value: pickup_build_bowl/build_bowls_vs_cups_fp1_v_0.1.x86_64
  warnings.warn('Key: {} already in config file with value {}. '
Build file path at: /home/srinjoym/Documents/cups-rl/gym_ai2thor/build_files/pickup_build_bowl/build_bowls_vs_cups_fp1_v_0.1.x86_64
Found path: /home/srinjoym/Documents/cups-rl/gym_ai2thor/build_files/pickup_build_bowl/build_bowls_vs_cups_fp1_v_0.1.x86_64
Mono path[0] = '/home/srinjoym/Documents/cups-rl/gym_ai2thor/build_files/pickup_build_bowl/build_bowls_vs_cups_fp1_v_0.1_Data/Managed'
Mono config path = '/home/srinjoym/Documents/cups-rl/gym_ai2thor/build_files/pickup_build_bowl/build_bowls_vs_cups_fp1_v_0.1_Data/Mono/etc'
Preloaded 'ScreenSelector.so'
Display 0 '0': 1920x1080 (primary device).
Logging to /home/srinjoym/.config/unity3d/Allen Institute for Artificial Intelligence/AI2-Thor/Player.log
1/3
Resetting environment and starting new episode

and the frame of ai2thor simulator is as such:

Any help is greatly appreciated. Thank you.

"open_close_interaction": true gives error

Hi, when I set "open_close_interaction": true, meaning that i want the agent to be able to open/close openable_objects, it returns me the following error:

Traceback (most recent call last):

File "", line 1, in
runfile('/home/user/Documents/Zeyu/cups-rl/main.py', wdir='/home/user/Documents/Zeyu/cups-rl')

File "/home/user/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 827, in runfile
execfile(filename, namespace)

File "/home/user/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "/home/user/Documents/Zeyu/cups-rl/main.py", line 142, in
next_state, _, done, _ = env.step(env.action_space.sample())

File "/home/user/Documents/Zeyu/cups-rl2/algorithms/rainbow/env.py", line 140, in step
state, reward, done, info = self.env.step(action)

File "/home/user/Documents/Zeyu/cups-rl2/gym_ai2thor/envs/ai2thor_env.py", line 175, in step
obj['distance'] < distance and not obj['isopen'] and \

KeyError: 'isopen'

I tried with a copy of your repo without changing any other things.

error after num_steps = 100000 / 50000000

Hi,
I was just directly running the rainbow with all default settings and this error popped out after 100000 steps:

Resetting environment and starting new episode
eval step 200
eval step 400
eval step 600
eval step 800
eval step 1000
Reached maximum episode length: 1000
Traceback (most recent call last):

File "", line 1, in
runfile('/home/user/Documents/Zeyu/cups-rl/main.py', wdir='/home/user/Documents/Zeyu/cups-rl')

File "/home/user/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 827, in runfile
execfile(filename, namespace)

File "/home/user/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "/home/user/Documents/Zeyu/cups-rl/main.py", line 182, in
avg_reward, avg_Q = test(env, num_steps, args, dqn, val_mem)

File "/home/user/Documents/Zeyu/cups-rl/algorithms/rainbow/test.py", line 76, in test
_plot_line(eval_steps, rewards, 'Reward', path='results')

File "/home/user/Documents/Zeyu/cups-rl/algorithms/rainbow/test.py", line 112, in _plot_line
}, filename=os.path.join(path, title + '.html'), auto_open=False)

File "/home/user/anaconda3/lib/python3.7/site-packages/plotly/offline/offline.py", line 596, in plot
auto_open=auto_open,

File "/home/user/anaconda3/lib/python3.7/site-packages/plotly/io/_html.py", line 527, in write_html
with open(file, "w") as f:

FileNotFoundError: [Errno 2] No such file or directory: 'results/Reward.html'

And yes there is indeed no results/Reward.html in the repo. I wonder do I need to create it myself?

One thing to be noted is that I copied main.py of rainbow to just under cups-rl and ran it, because otherwise lines such as :

from algorithms.rainbow.agent import Agent

will return error if main.py is under cups-rl/algorithms/rainbow

Thanks for your reply.

reset agent but not the envirnment

Hi,
I suppose
state, done = env.reset(), False
in line 141 and 157 of rainbow main.py is to reset both agent and environment. May I ask after setting the environment will the objects location be randomized or kept in the initial position, and is there a way to reset agent's position only while keeping the environment as it is?

Thank you.

Calling different tasks for different iteration

Hi,
May I check if it is possible to define multiple tasks in tasks.py and call just one of them based on certain condition. If it is possible, how to do it ?
My guess is to modify rainbow_example.json script with a if-condition (I am using the rainbow algorithm to train the model). But since the task in that script is defined with a cell I'm not sure how exactly to implement that.

Thank you very much.

Error starting AI2THOR Env

Unable to start an Env. I tried this:
python random_walk.py

/home/kb/anaconda3/lib/python3.6/site-packages/ai2thor/controller.py:1152: UserWarning: start method depreciated. The server started when the Controller was initialized.
"start method depreciated. The server started when the Controller was initialized."
Traceback (most recent call last):
File "random_walk.py", line 15, in
env = AI2ThorEnv(config_dict=config_dict)
File "/home/kb/CUPS_RL/cups-rl/gym_ai2thor/envs/ai2thor_env.py", line 119, in init
self.controller.start()
File "/home/kb/anaconda3/lib/python3.6/site-packages/ai2thor/controller.py", line 1163, in start
self.server.start()
File "/home/kb/anaconda3/lib/python3.6/site-packages/ai2thor/fifo_server.py", line 203, in start
os.mkfifo(self.server_pipe_path)
FileExistsError: [Errno 17] File exists

regarding the input of Rainbow

Hi, may I ask does Rainbow take the metadata (especially metadata['objects']) as input? I feel by just taking frame there are a lot of useful information missing. If the metadata is taken as input, how is it fed in? Is it by certain embedding method? Thanks for replying.

About task "place cup in microwave"

I'd like to consult one question that about how to create the task like the "place cup in microwave" ,I think it's a hard task, which consists of two simple tasks, should I create it in a two task way or just create one task that just a hard way but the objects contains two kinds?