Code Monkey home page Code Monkey logo

teach's Introduction

TEACh

Task-driven Embodied Agents that Chat

Aishwarya Padmakumar*, Jesse Thomason*, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, Dilek Hakkani-Tur

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment. The code and model weights are licensed under the MIT License (see SOFTWARELICENSE), images are licensed under Apache 2.0 (see IMAGESLICENSE) and other data files are licensed under CDLA-Sharing 1.0 (see DATALICENSE). Please include appropriate licensing and attribution when using our data and code, and please cite our paper.

Citation:

@inproceedings{teach,
  title={{TEACh: Task-driven Embodied Agents that Chat}},
  author={Padmakumar, Aishwarya and Thomason, Jesse and Shrivastava, Ayush and Lange, Patrick and Narayan-Chen, Anjali and Gella, Spandana and Piramuthu, Robinson and Tur, Gokhan and Hakkani-Tur, Dilek},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={36},
  number={2},
  pages={2017--2025},
  year={2022}
}

As of 09/07/2022, the dataset has been updated to include dialog acts annotated in the paper

Dialog Acts for Task-Driven Embodied Agents

Spandana Gella*, Aishwarya Padmakumar*, Patrick Lange, Dilek Hakkani-Tur

If using the dialog acts in your work, please cite the following paper:

@inproceedings{teachda,
  title={{Dialog Acts for Task-Driven Embodied Agents}},
  author={Gella, Spandana and Padmakumar, Aishwarya and Lange, Patrick and Hakkani-Tur, Dilek},
  booktitle={Proceedings of the 23nd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDial)},
  year={2022},
  pages={111-123}
}

Interactions in the games, EDH instances and TfD instances that are utterances now have an additional field da_metadata containing the dialog act annotations. See the data exploration notebook for sample code to view dialog acts.

Prerequisites

  • python3 >=3.7,<=3.8
  • python3.x-dev, example: sudo apt install python3.8-dev
  • tmux, example: sudo apt install tmux
  • xorg, example: sudo apt install xorg openbox
  • ffmpeg, example: sudo apt install ffmpeg

Installation

pip install -r requirements.txt
pip install -e .

Downloading the dataset

Run the following script:

teach_download 

This will download and extract the archive files (experiment_games.tar.gz, all_games.tar.gz, images_and_states.tar.gz, edh_instances.tar.gz & tfd_instances.tar.gz) in the default directory (/tmp/teach-dataset).
Optional arguments:

  • -d/directory: The location to store the dataset into. Default=/tmp/teach-dataset.
  • -se/--skip-extract: If set, skip extracting archive files.
  • -sd/--skip-download: If set, skip downloading archive files.
  • -f/--file: Specify the file name to be retrieved from S3 bucket.

File changes (12/28/2022): We have modified EDH instances so that the state changes checked for to evaluate success are only those that contribute towards task success in the main task of the gameplay session the EDH instance is created from. We have removed EDH instances that had no state changes meeting these requirements. Additionally, two game files, and their corresponding EDH and TfD instances were deleted from the valid_unseen split due to issues in the game files. Version 3 of our paper on Arxiv, which will be public on Dec 30, 2022 contains the updated dataset size and experimental results.

Remote Server Setup

If running on a remote server without a display, the following setup will be needed to run episode replay, model inference of any model training that invokes the simulator (student forcing / RL).

Start an X-server

tmux
sudo python ./bin/startx.py

Exit the tmux session (CTRL+B, D). Any other commands should be run in the main terminal / different sessions.

Replaying episodes

Most users should not need to do this since we provide this output in images_and_states.tar.gz.

The following steps can be used to read a .json file of a gameplay session, play it in the AI2-THOR simulator, and at each time step save egocentric observations of the Commander and Driver (Follower in the paper). It also saves the target object panel and mask seen by the Commander, and the difference between current and initial state.

Replaying a single episode locally, or in a new tmux session / main terminal of remote headless server:

teach_replay \
--game_fn /path/to/game/file \
--write_frames_dir /path/to/desired/output/images/dir \
--write_frames \
--write_states \
--status-out-fn /path/to/desired/output/status/file.json

Note that --status-out-fn must end in .json Also note that the script will by default not replay sessions for which an output subdirectory already exists under --write-frames-dir Additionally, if the file passed to --status-out-fn already exists, the script will try to resume files not marked as replayed in that file. It will error out if there is a mismatch between the status file and output directories on which sessions have been previously played. It is recommended to use a new --write-frames-dir and new --status-out-fn for additional runs that are not intended to resume from a previous one.

Replay all episodes in a folder locally, or in a new tmux session / main terminal of remote headless server:

teach_replay \
--game_dir /path/to/dir/containing/.game.json/files \
--write_frames_dir /path/to/desired/output/images/dir \
--write_frames \
--write_states \
--num_processes 50 \
--status-out-fn /path/to/desired/output/status/file.json

To generate a video, additionally specify --create_video. Note that for images to be saved, --write_images must be specified and --write-frames-dir must be provided. For state changes to be saved, --write_states must be specified and --write_frames_dir must be provided.

Evaluation

We include sample scripts for inference and calculation of metrics. teach_inference and teach_eval. teach_inference is a wrapper that implements loading EDH instance, interacting with the simulator as well as writing the game file and predicted action sequence as JSON files after each inference run. It dynamically loads the model based on the --model_module and --model_class arguments. Your model has to implement teach.inference.teach_model.TeachModel. See teach.inference.sample_model.SampleModel for an example implementation which takes random actions at every time step.

After running teach_inference, you use teach_eval to compute the metrics based output data produced by teach_inference.

Sample run:

export DATA_DIR=/path/to/data/with/games/and/edh_instances/as/subdirs (Default in Downloading is /tmp/teach-dataset)
export OUTPUT_DIR=/path/to/output/folder/for/split
export METRICS_FILE=/path/to/output/metrics/file_without_extension

teach_inference \
    --data_dir $DATA_DIR \
    --output_dir $OUTPUT_DIR \
    --split valid_seen \
    --metrics_file $METRICS_FILE \
    --model_module teach.inference.sample_model \
    --model_class SampleModel

teach_eval \
    --data_dir $DATA_DIR \
    --inference_output_dir $OUTPUT_DIR \
    --split valid_seen \
    --metrics_file $METRICS_FILE

To run TfD inference instead of EDH inference add --benchmark tfd to the inference command.

TEACh Benchmark Challenge

For participation in the challenge, you will need to submit a docker image container your code and model. Docker containers using your image will serve your model as HTTP API following the [TEACh API Specification](#TEACh API Specification). For your convenience, we included the teach_api command which implements this API and is compatible with models implementing teach.inference.teach_model.TeachModel also used by teach_inference.

We have also included two sample Docker images using teach.inference.sample_model.SampleModel and teach.inference.et_model.ETModel respectively in docker/.

When evaluating a submissions, the submitted container will be started with access to a single GPU and no internet access. For details see Step 3 - Start your container.

The main evaluation code invoking your submission will also be run as Docker container. It reuses the teach_inference CLI command together with teach.inference.remote_model.RemoteModel to call the HTTP API running in your container. For details on how to start it locally see Step 4 - Start the evaluation.

Please note that TfD inference is not currently supported via Docker image.

Testing Locally

Assuming you have downloaded the data to /home/ubuntu/teach-dataset and followed Prerequisites and Remote Server Setup.

Step 0 - Setup Environment

export HOST_DATA_DIR=/home/ubuntu/teach-dataset
export HOST_IMAGES_DIR=/home/ubuntu/images
export HOST_OUTPUT_DIR=/home/ubuntu/output
export API_PORT=5000
export SUBMISSION_PK=168888
export INFERENCE_GPUS='"device=0"'
export API_GPUS='"device=1"'
export SPLIT=valid_seen
export DOCKER_NETWORK=no-internet

mkdir -p $HOST_IMAGES_DIR $HOST_OUTPUT_DIR
docker network create --driver=bridge --internal $DOCKER_NETWORK

Note: If you run on a machine that only has a single GPU, set API_GPUS='"device=0"'.

Step 1 - Build the remote-inference-runner container

docker build -t remote-inference-runner -f docker/Dockerfile.RemoteInferenceRunner .

Step 2 - Build your container

Note: When customizing the images for your own usage, do not edit the following or your submission will fail:

  • teach_api options: --data_dir /data --images_dir /images --split $SPLIT
  • EXPOSE 5000 and don't change the port the flask API listens on

For the SampleModel example, the corresponding command is:

docker build -t teach-model-api-samplemodel -f docker/Dockerfile.TEAChAPI-SampleModel .

For the baseline models, follow the corresponding command replacing MODEL_VARIANT=et with the desired variant e.g. et_plus_a.

mkdir -p ./models
mv $HOST_DATA_DIR/baseline_models ./models/
mv $HOST_DATA_DIR/et_pretrained_models ./models/
docker build --build-arg MODEL_VARIANT=et -t teach-model-api-etmodel -f docker/Dockerfile.TEAChAPI-ETModel .

Step 3 - Start your container

For the SampleModel example, the corresponding command is:

docker run -d --rm \
    --gpus $API_GPUS \
    --name TeachModelAPI \
    --network $DOCKER_NETWORK \
    -e SPLIT=$SPLIT \
    -v $HOST_DATA_DIR:/data:ro \
    -v $HOST_IMAGES_DIR/$SUBMISSION_PK:/images:ro \
    -t teach-model-api-samplemodel    

For the baseline models, just replace the image name e.g. if you followed the commands above

docker run -d --rm \
    --gpus $API_GPUS \
    --name TeachModelAPI \
    --network $DOCKER_NETWORK \
    -e SPLIT=$SPLIT \
    -v $HOST_DATA_DIR:/data:ro \
    -v $HOST_IMAGES_DIR/$SUBMISSION_PK:/images:ro \
    -t teach-model-api-etmodel    

Verify the API is running with

docker exec TeachModelAPI curl @TeachModelAPI:5000/ping

Output:
{"action":"Look Up","obj_relative_coord":[0.1,0.2]}

Step 4 - Start the evaluation

docker run --rm \
    --privileged \
    -e DISPLAY=:0 \
    -e NVIDIA_DRIVER_CAPABILITIES=all \
    --name RemoteInferenceRunner \
    --network $DOCKER_NETWORK \
    --gpus $INFERENCE_GPUS \
    -v /tmp/.X11-unix:/tmp/.X11-unix:ro \
    -v $HOST_DATA_DIR:/data:ro \
    -v $HOST_IMAGES_DIR/$SUBMISSION_PK:/images \
    -v $HOST_OUTPUT_DIR/$SUBMISSION_PK:/output \
    remote-inference-runner teach_inference \
        --data_dir /data \
        --output_dir /output \
        --images_dir /images \
        --split $SPLIT \
        --metrics_file /output/metrics_file \
        --model_module teach.inference.remote_model \
        --model_class RemoteModel \
        --model_api_host_and_port "@TeachModelAPI:$API_PORT"

Step 5 - Results

The evaluation metrics will be in $HOST_OUTPUT_DIR/$SUBMISSION_PK/metrics_file. Images for each episode will be in $HOST_IMAGES_DIR/$SUBMISSION_PK.

Running without docker

You may want to test your implementation without rebuilding Docker images. You can test your model by directly calling the teach_api CLI command e.g.

Using the teach.inference.sample_model.SampleModel:

export DATA_DIR=/home/ubuntu/teach-dataset
export IMAGE_DIR=/tmp/images

teach_api \
    --data_dir $DATA_DIR \
    --images_dir $IMAGE_DIR

Using the teach.inference.et_model.ETModel assuming you already moved the models from the teach-dataset location to ./models following instructions in Step 2 - Build your container.

export DATA_DIR=/home/ubuntu/teach-dataset
export IMAGE_DIR=/tmp/images

teach_api \
    --data_dir $DATA_DIR \
    --images_dir $IMAGE_DIR \
    --split valid_seen \
    --model_module teach.inference.et_model \
    --model_class ETModel \
    --model_dir ./models/baseline_models/et \
    --visual_checkpoint ./models/et_pretrained_models/fasterrcnn_model.pth
    --object_predictor ./models/et_pretrained_models/maskrcnn_model.pth \
    --seed 4 

The corresponding command for running teach_inference against such an API without container uses teach.inference.remote_model.RemoteModel.

export DATA_DIR=/home/ubuntu/teach-dataset
export OUTPUT_DIR=/home/ubuntu/output/valid_seen
export METRICS_FILE=/home/ubuntu/output/valid_seen/metrics
export IMAGE_DIR=/tmp/images

teach_inference \
    --data_dir $DATA_DIR  \
    --output_dir $OUTPUT_DIR \    
    --split valid_seen \
    --metrics_file $METRICS_FILE \    
    --model_module teach.inference.remote_model \
    --model_class RemoteModel \        
    --model_api_host_and_port 'localhost:5000' \
    --images_dir $IMAGE_DIR
    

Smaller split

It may be useful for faster turn around time to locally create a smaller split in $DATA_DIR/edh_instances/test_seen with a handful of files from $DATA_DIR/edh_instances/valid_seen for faster turn around times.

Runtime Checks

The TEACh Benchmark Challenge places a maximum time limit of 36 hours when using all GPUs of a p3.16xlarge instance. The best way to verify that your code is likely to satisfy this requirement would be to use a script to run two Docker evaluation processes in sequence on a p3.16xlarge EC2 instance, one for the valid_seen split and one for the valid_unseen split. Note that you will need to specify export API_GPUS='"device=1,2,3,4,5,6,7"' (we reserve GPU 0 for ai2thor in our runs) to use all GPUs and your model code will need to place different instances of the model on different GPUs for this test (see the use of process_index in ETModel.set_up_model() for an example). Also note that while the test splits are close in size to the validation splits, they are not identical so your runtime estimate will necessarily be an approximation.

TEACh API Specification

As mentioned above, teach_api already implements this API and it is usually not necessary to implement this yourself. During evaluations of submissions, edh_instances without ground truth and images corresponding to the edh_instances' histories will be available in /data. /images will contain images produced during inference at runtime. teach_api already handles loading and passes them to your implementation of teach.inference.teach_model.TeachModel.

Start EDH Instance

This endpoint will be called once at the start of processing a new EDH instance. Currently, we ensure that the API processes only a single EDH instance from start to finish i.e. once called it can be assumed that the previous EDH instance has completed.

URL : /start_new_edh_instance
Method : POST
Payload:

{
    "edh_name": "[name of the EDH instance file]"
}

Responses:

Status Code: 200
Response: success

Status Code: 500
Response: [error message]

Get next action

This endpoint will be called at each timestep during inference to get the next predicted action from the model.

URL : /get_next_action
Method : POST
Payload:

{
    "edh_name": "[name of the EDH instance file]",
    "img_name": "[name of the image taken in the simulator after the previous action]",
    "prev_action": "[JSON string representation of previous action]", // this is optional
}

Responses:

Status Code: 200

{
    "action": "[An action name from all_agent_actions]",
    "obj_relative_coord": [0.1, 0.5] // see teach.inference.teach_model.TeachModel.get_next_action
}

Status Code: 500
Response: [error message]

TEACh EDH Offline Evaluation

While the leaderboard for the TEACh EDH benchmark is not active, we recommend that researchers follow the following protocol for evaluation. A split of the existing TEACh validation splits has been provided in the src/teach/meta_data_files/divided_split directory. For your experiments, please use the divided_val_seen and divided_val_unseen splits for validation and divided_test_seen and divided_test_unseen for testing. Note that the TEACh code has not been modified at the moment to directly support use of these splits, so you will need to locally reorganize your data directory so that games, EDH instances and image folders are reorganized according to the divided split. Some additional notes:

  1. If you have previously tuned hyperparameters using the full TEACh validation split, you will need to re-tune hyperparameters on just the divided_val_seen or divided_val_unseen splits for fair comparison to other papers.
  2. The divided test splits are likely to be easier than the original TEACh test split as the floorplans used in the divided_val_unseen and divided_test_unseen splits are identical.
  3. Please do not incorporate the divided_val_seen or divided_val_unseen splits into your training set and retrain after hyperparameter tuning if using this protocol, as the divided_test_unseen split will then no longer be unseen.
  4. We have observed that the ET model can show some variance when being retrained on ALFRED or TEACh even when changing only the random seeds, and as such we expect some performance differences between the full TEACh validation splits, TEACh test splits and divided splits.
  5. Alexa Prize SimBot Challenge Participants please refer to challenge rules regarding publications.

Security

See CONTRIBUTING for more information.

License

The code is licensed under the MIT License (see SOFTWARELICENSE), images are licensed under Apache 2.0 (see IMAGESLICENSE) and other data files are licensed under CDLA-Sharing 1.0 (see DATALICENSE).

teach's People

Contributors

aishwaryap avatar amazon-auto avatar dependabot[bot] avatar hangjieshi avatar lucyhu-amz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

teach's Issues

How are EDH instances segmented?

Hello, what is the criteria for segmentation of EDH instances from sessions?

Is it # of interactions or something else?

Thanks!

-Luke

Tfd Files

Hi @aishwaryap ,

I'm looking at the tfd.json files for each game. How come there are a different number of driver_actions_future and interactions? How do they correspond with one another? From the timestamps it looks like driver_actions_future just prunes some of the interactions.

About Model Input and Output

May I ask what the inputs and outputs of this model are? In my understanding, visual language navigation tasks should not be able to provide input for language and visual information. Then, will the intelligent agent wait for you to navigate to the target point? Why is there no navigation section about the intelligent agent in the code. Also, may I ask why my success rate after running inference is zero

When I try to start an X-server,I will meet some questions.

When I run ** sudo python ./bin/startx.py**,I will meet that

Starting X on DISPLAY=:0
Traceback (most recent call last):
File "./bin/startx.py", line 108, in
main()
File "./bin/startx.py", line 104, in main
startx(display)
File "./bin/startx.py", line 78, in startx
for r in pci_records():
File "./bin/startx.py", line 19, in pci_records
output = subprocess.check_output(command).decode()
File "/root/miniconda3/lib/python3.8/subprocess.py", line 415, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/root/miniconda3/lib/python3.8/subprocess.py", line 493, in run
with Popen(*popenargs, **kwargs) as process:
File "/root/miniconda3/lib/python3.8/subprocess.py", line 858, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/root/miniconda3/lib/python3.8/subprocess.py", line 1704, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'lspci'

So I try to install pciutils,and then it will report that

X.Org X Server 1.20.13
X Protocol Version 11, Revision 0
Build Operating System: linux Ubuntu
Current Operating System: Linux autodl-container-b5c14e88ea-6759e3ee 5.4.0-162-generic #179-Ubuntu SMP Mon Aug 14 08:51:31 UTC 2023 x86_64
Kernel command line: BOOT_IMAGE=/vmlinuz-5.4.0-162-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro cgroup_enable=memory swapaccount=1
Build Date: 15 January 2024 03:45:41PM
xorg-server 2:1.20.13-1ubuntu1~20.04.14 (For technical support please see http://www.ubuntu.com/support)
Current version of pixman: 0.38.4
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Wed Jan 24 16:37:25 2024
(++) Using config file: "/tmp/tmpvxc79oan"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
(EE)
Fatal server error:
(EE) parse_vt_settings: Cannot open /dev/tty0 (No such file or directory)
(EE)
(EE)
Please consult the The X.Org Foundation support
at http://wiki.x.org
for help.
(EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
(EE)
(EE) Server terminated with error (1). Closing log file.

I try many methods,but I can't solve this problem,and when I try other baseline models that based on TEACh dataset,I also meet similar problems.So I ask you for help.Thanks very much๏ผ

No valid positions to place object found when placing slices on Bowl or Plate

Hello,

We have found a problem with interaction when attempting to place many objects on the same plate or bowl. We have found that when one or more object (e.g., slices) are already on a bowl or plate, attempting to place another slice on the plate often causes action failure with error message "No valid positions to place object found":

<ai2thor.server.Event at 0x7fd33070e4f0
    .metadata["lastAction"] = PutObject
    .metadata["lastActionSuccess"] = False
    .metadata["errorMessage"] = "No valid positions to place object found
    .metadata["actionReturn"] = None

test

We have also found this error persists when not only specifying an x,y position (see image) with the teach controller but also directly calling the simulator via:

self.controller.step(
                action="PutObject",
                objectId='Bowl|-01.86|+00.92|+01.80',
                forceAction=True,
            )

The above example is from TfD instance: 9f0fa54e2587998b_97c6

This error significantly limits success on many of the tasks, as they require multiple objects to be placed on same receptacle for task success. Any help with this would be greatly appreciated! Thank you so much!

Issues about custom object properties

Hi,

I notice that the custom property simbotPickedUp of the picked object is not set to 0 after placing (see here). Is this a bug or intended behavior?

Besides, I find that some custom properties are weird in statediff files. For example, in images/train/2fbb46170662d917_06a8/statediff.464.9037940502167.json, the simbotLastParentReceptacle and simbotIsReceptacleOf of object Plate|-01.07|+01.65|-02.57 are both the object itself. How can that happen?

Thanks

User-friendly API for manipulating the data

Hi @aishwaryap,

Thanks for releasing your dataset. I was wondering whether you had a way to manipulate the low-level JSON data in a more user-friendly way. I can see from the codebase that there is a Dataset class that exposes a from_dict() method which is supposed to be used to create a Dataset object from a Dict. However, I'm currently having the following issue when doing so:

>>> with open("/tmp/teach/games/train/8cdf3d9a18cac7fe_6b02.game.json") as in_file:
            game = json.load(in_file)
>>> dataset = Dataset.from_dict(game)

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/asuglia/workspace/teach/src/teach/dataset/dataset.py", line 47, in from_dict
    tasks = [
  File "/Users/asuglia/workspace/teach/src/teach/dataset/dataset.py", line 48, in <listcomp>
    Task.from_dict(task_dict, definitions, process_init_state) for task_dict in dataset_dict.get("tasks")
  File "/Users/asuglia/workspace/teach/src/teach/dataset/task.py", line 35, in from_dict
    episodes = [
  File "/Users/asuglia/workspace/teach/src/teach/dataset/task.py", line 36, in <listcomp>
    Episode.from_dict(episode_dict, definitions, process_init_state)
  File "/Users/asuglia/workspace/teach/src/teach/dataset/episode.py", line 60, in from_dict
    initial_state=Initialization.from_dict(episode_dict["initial_state"])
  File "/Users/asuglia/workspace/teach/src/teach/dataset/initialization.py", line 47, in from_dict
    agents = [Pose_With_ID.from_dict(x) for x in initialization_dict["agents"]]
  File "/Users/asuglia/workspace/teach/src/teach/dataset/initialization.py", line 47, in <listcomp>
    agents = [Pose_With_ID.from_dict(x) for x in initialization_dict["agents"]]
  File "/Users/asuglia/workspace/teach/src/teach/dataset/pose.py", line 60, in from_dict
    return cls(identity=identity, pose=Pose.from_array(pose_with_id_dict["pose"]), is_object=is_object)
KeyError: 'pose'

In general, I would love to have a more object-oriented way of handling the data. I could write my own parser of the JSON data but I believe this logic must be somewhere already in your codebase. Potentially having an example script with an example showing how to explore the dataset might be useful to others as well. Any thoughts?

Support for Python 3.9?

README.md states that only python version >=3.7 and <=3.8 are supported but setup.py only specifies >=3.7. Is Python 3.9 officially supported?

Possible bug in config file

Hi @aishwaryap

I was revisiting the code for the E.T. baseline and there seems to be a bug in the config file for training the model:

detach_lang_emb = False

I believe it should be detach_lang_emb = True since we do not want to propagate the gradients through the look-up table or the language encoder.

Please let me know your thoughts on this.

Thanks,
Divyam

UnboundLocalError when evaluating the Episodic Transformer baselines

Hello,

I have been trying to evaluate the Episodic Transformer baselines for the TEACh Benchmark Challenge. And I keep getting the following error message when I am running the evaluation script provided inside the ET directory. I have also tried running the evaluation via "teach_inference". The error is the same.

Traceback (most recent call last):
  File "/home/ubuntu/workplace/teach/src/teach/inference/inference_runner.py", line 121, in _run
    instance_id, instance_metrics = InferenceRunner._run_edh_instance(instance_file, config, model, er)
  File "/home/ubuntu/workplace/teach/src/teach/inference/inference_runner.py", line 221, in _run_edh_instance
    traj_steps_taken,
UnboundLocalError: local variable 'traj_steps_taken' referenced before assignment

I am doing the inference on an AWS instance. I have started the X-server and installed all requirements and prerequisites without bug.

Here's the script I used for evaluation.

#!/bin/sh

export AWS_ROOT=/home/ubuntu/workplace
export ET_DATA=$AWS_ROOT/data
export TEACH_ROOT_DIR=$AWS_ROOT/teach
export TEACH_SRC_DIR=$TEACH_ROOT_DIR/src
export ET_ROOT=$TEACH_SRC_DIR/teach/modeling/ET
export ET_LOGS=$TEACH_ROOT_DIR/src/teach/modeling/ET/checkpoints
export INFERENCE_OUTPUT_PATH=$TEACH_ROOT_DIR/inference_output
export PYTHONPATH=$TEACH_SRC_DIR:$ET_ROOT:$PYTHONPATH
export SPLIT=valid_seen

cd $TEACH_ROOT_DIR
python src/teach/cli/inference.py \
            --model_module teach.inference.et_model \
                --model_class ETModel \
                    --data_dir $ET_DATA \
                        --output_dir $INFERENCE_OUTPUT_PATH/inference__teach_et_trial_$SPLIT \
                            --split $SPLIT \
                                --metrics_file $INFERENCE_OUTPUT_PATH/metrics__teach_et_trial_$SPLIT.json \
                                    --seed 4 \
                                        --model_dir $ET_DATA/baseline_models/et \
                                            --object_predictor $ET_LOGS/pretrained/maskrcnn_model.pth \
                                            --visual_checkpoint $ET_LOGS/pretrained/fasterrcnn_model.pth \
                                                --device "cpu" \
                                                --images_dir $INFERENCE_OUTPUT_PATH/images

Could you help me with this? Thanks!

No module named "alfred"

Hi,

While running the teach_inference on the sample_model using the code below, the inference and the evaluation work fine.

teach_inference     
  --data_dir /Users/sakthi/Desktop/teach-main/data4/     
  --output_dir /Users/sakthi/Desktop/teach-main/data4/outputs     
  --split valid_seen     
  --metrics_file /Users/sakthi/Desktop/teach-main/data4/outputs/metrics     
  --model_module teach.inference.sample_model    
  --model_class SampleModel    
  --images_dir /Users/sakthi/Desktop/teach-main/data4/images/

But, while trying to run it with the ET models using the code below, it throws the error "No module named 'alfred'".

teach_inference     
  --data_dir /Users/sakthi/Desktop/teach-main/data4/     
  --output_dir /Users/sakthi/Desktop/teach-main/data4/outputs     
  --split valid_seen     
  --metrics_file /Users/sakthi/Desktop/teach-main/data4/outputs/metrics    
   --model_module teach.inference.et_model    
  --model_class ETModel     
  --images_dir /Users/sakthi/Desktop/teach-main/data4/images/ 
  --model_dir /Users/sakthi/Desktop/teach-main/data4/baseline_models/et 
  --object_predictor /Users/sakthi/Desktop/teach-main/data4/et_pretrained_models/maskrcnn_model.pth     
  --visual_checkpoint /Users/sakthi/Desktop/teach-main/data4/et_pretrained_models/fasterrcnn_model.pth

Error:

Traceback (most recent call last):
  File "/Users/sakthi/opt/anaconda3/envs/teach_edh/bin/teach_inference", line 8, in <module>
    sys.exit(main())
  File "/Users/sakthi/Desktop/teach-main/src/teach/cli/inference.py", line 150, in main
    model_class=dynamically_load_class(args.model_module, args.model_class),
  File "/Users/sakthi/Desktop/teach-main/src/teach/utils.py", line 390, in dynamically_load_class
    module = __import__(package_path, fromlist=[class_name])
  File "/Users/sakthi/Desktop/teach-main/src/teach/inference/et_model.py", line 11, in <module>
    from alfred import constants
ModuleNotFoundError: No module named 'alfred'

Could you please let me know how to solve it? Or add the required alfred package to the requirements?

Thanks!

How to setup docker to use multiple GPUs for inference

Hi,

I am trying to set up the evaluation environment on a multi-GPU AWS following the instructions in TEACh Benchmark Challenge. However, I encounter two problems:
(1) The model can use only 1 GPU even if I have set the value of API_GPUS to multiple GPUs.
(2) When I start the inference runner, although it is able to launch multiple ai2thor instances by specifying --num_process X, the processes are all on one GPU instead of on X GPUs. Also, I have to manually specify --model_api_host_and_port to include multiple API ports (e.g. "@TeachModelAPI:$API_PORT,@TeachModelAPI:$API_PORT,@TeachModelAPI:$API_PORT" for --num_processes 3), which seems weird.

Besides, I notice that in this line it mentions that the model container will have access to only one GPU, while this line says that the model can use all GPUs of a p3.16xlarge instance. I wonder which would be the case, and if multiple GPUs are allowed, how to correctly setup the docker container.

Thanks!

Issue in rendering RGB image

Hi,

Thanks for this awesome work!

I am trying to replay the episodes. But I observe that the following function does not render an RGB image as expected but instead renders something RGB stiched with segmentation mask: https://github.com/alexa/teach/blob/main/src/teach/simulators/simulator_THOR.py#L2114. I am attaching a sample output image
Screenshot from 2021-12-19 23-50-39

Here is the exact command I am running:

teach_replay \
--game_fn teach_dataset/all_game_files/0396d773f627df48_0db5.game.json \
--write_frames_dir teach_videos \
--write_frames \
--num_processes 1 \
--status_out_fn teach_status.json \
--create_video  

I also tried the following minimalistic script and it works well:

from ai2thor.controller import Controller
import matplotlib.pyplot as plt
controller = Controller()
event = controller.step("MoveAhead")
rgb = event.frame
plt.figure()
plt.imshow(rgb)
plt.savefig('test.png')

Please let me know if I am doing something wrong, or if I should raise this issue with AI2Thor instead. Thanks!

Questions about TfD inference

Hi, I wonder whether the TfD inference will be released and how can I obtain the goal state changes as the same in the paper? Thanks!

Trajectories that raise an error are ignored

Hi @aishwaryap,

I was reading about the recent changes to the code reported in #10 and we unfortunately get results that differ substantially from yours. I started dissecting the code to understand what's the reason for such discrepancies in the results. From my understanding of the inference_runner.py script, you spawn several processes, each with a given portion of the tasks. However, I can see that the exception handling logic simply ignores an instance that raises an error: https://github.com/emma-simbot/teach/blob/speaker_tokens/src/teach/inference/inference_runner.py#L130

This is detrimental because if a dataset instance errors for whatever reason, its contribution to the overall metrics is ignored. Instead, the proper way of dealing with this should be to ignore that trajectory and still add to the metrics that you were not successful. Potentially, such faulty trajectories should be reported in the metrics file for future debugging.

Am I missing something?

Implementations of the rule-based TATC baseline

Hi,

Thanks for releasing such a high-quality and large-scale dataset. For the Two-Agent Task Completion (TATC) benchmark, the paper proposes a rule-based approach for both agents as the baseline. Do you have the plan to release the implementations of such rule-based agents? It would be very helpful to work on this benchmark with such baseline implementations.

Thanks!

Evaluating Episodic Transformer baselines for EDH instances gives zero successes

Hi!

I have been trying to replicate the results of the Episodic Transformer (ET) baselines for the EDH benchmark. The inference script runs without any errors but the ET baselines provided along with this repository give zero successes on all the validation EDH splits (both ['valid-seen', 'valid-unseen']).

This behavior can be replicated using the instructions in the ET root directory (found here), specifically the following script:

CUDA_VISIBLE_DEVICES=4,5,6,7 python3 src/teach/cli/inference.py \
    --model_module teach.inference.et_model \
    --model_class ETModel \
    --data_dir $ET_DATA \
    --images_dir $IMAGES_DIR \
    --output_dir $INFERENCE_OUTPUT_PATH/inference__teach_et_trial \
    --split valid_seen \
    --metrics_file $INFERENCE_OUTPUT_PATH/metrics__teach_et_trial.json \
    --seed 4 \
    --model_dir $ET_DATA/baseline_models/et \
    --num_processes 50 \
    --object_predictor $ET_LOGS/pretrained/maskrcnn_model.pth \
    --visual_checkpoint $ET_LOGS/pretrained/fasterrcnn_model.pth \
    --device "cuda"

I also tried training the basic ET baseline from scratch. Running the evaluation script on this model also leads to zero successes.

Much higher scores when evaluating Episodic Transformer baselines for EDH instances

Hello,

I have finished the evaluation of the Episodic Transformer baselines for the TEACh Benchmark Challenge on the valid_seen.

However, one weird thing I found is that our reproduced result is much higher than what is reported in the paper. The result is shown below (All values are percentages). There is a total of 608 EDH instances (valid_seen) in the metric file which matches the number in the paper.

ย  SR [TLW] GC [TLW]
Reproduced 13.8 [3.2] 14 [8.7]
Reported in the paper 5.76 [0.90] 7.99 [1.65]

I believe I am using the correct checkpoints. And the only change I made to the code is mentioned in #9.

I am running on an AWS instance. I have started the X-server and installed all requirements and prerequisites without bugs. And the inference process is bugfree.

Here is the script I used for evaluation.

#!/bin/sh

export AWS_ROOT=/home/ubuntu/workplace
export ET_DATA=$AWS_ROOT/data
export TEACH_ROOT_DIR=$AWS_ROOT/teach
export TEACH_SRC_DIR=$TEACH_ROOT_DIR/src
export ET_ROOT=$TEACH_SRC_DIR/teach/modeling/ET
export ET_LOGS=$TEACH_ROOT_DIR/src/teach/modeling/ET/checkpoints
export INFERENCE_OUTPUT_PATH=$TEACH_ROOT_DIR/inference_output
export PYTHONPATH=$TEACH_SRC_DIR:$ET_ROOT:$PYTHONPATH
export SPLIT=valid_seen

cd $TEACH_ROOT_DIR
python src/teach/cli/inference.py \
            --model_module teach.inference.et_model \
                --model_class ETModel \
                    --data_dir $ET_DATA \
                        --output_dir $INFERENCE_OUTPUT_PATH/inference__teach_et_trial_$SPLIT \
                            --split $SPLIT \
                                --metrics_file $INFERENCE_OUTPUT_PATH/metrics__teach_et_trial_$SPLIT.json \
                                    --seed 4 \
                                        --model_dir $ET_DATA/baseline_models/et \
                                            --object_predictor $ET_LOGS/pretrained/maskrcnn_model.pth \
                                            --visual_checkpoint $ET_LOGS/pretrained/fasterrcnn_model.pth \
                                                --device "cpu" \
                                                --images_dir $INFERENCE_OUTPUT_PATH/images

I wonder if the data split provided in the dataset is the same as the paper. And if so, what would be the possible explanation for this?

Please let me know if someone else is getting similar results. Thank you!

Install teach within a headless docker container

Hello everyone,

thanks for the contribution of the benchmark! Is there any chance to run this from within a docker container (as our infrastructure requires to do so)?

We run into trouble when starting the python script even from within a --privileged container

sudo python ./bin/startx.py

Are there guidelines or configurations that have been tested (docker versions, images, os) to set this up?

Thanks!

About the evaluation time spent

Could you please let me know how much time was spent on the evaluation? It took me about two days to evaluate with 4 processes, and I found that a large part of the time was spent in the state initialization of edh instance, as well as reaching max_api_fails and max_traj_steps. And the time for the agent to take a step is also very long, which is very dependent on the frequency of the CPU. Can you tell me the settings of your experimental equipment? and is there any other way to evaluate the trained model?

Questions about the evaluation rules for the Alexa Simbot Challenge

I have three questions regarding the evaluation rules for the Alexa Simbot Challenge:

  1. Can we use "dialog_history_cleaned" rather than "dialog_history" in the edh instance?
  2. Using only the action history and dialogue history from the driver results in missing a piece of key information - the time of each user utterance made during the interaction. We argue that such causal information should be allowed to use.
  3. Could you elaborate on the "should not use task definitions" rule? For example, are we allowed to integrate the task structures provided in the task definitions into our model, while do not rely on any ground truth task information during inference as the model has to figure out the task and arguments by itself from the dialog input?

Thanks!

Error while running teach_eval - float division by zero

Hi,

I am trying to run the teach_eval command to evaluate the performance of the model for EDH by running the following command.

teach_eval --data_dir /scratch/smahali6/teach_edh/teach/data/ --inference_output_dir /scratch/smahali6/teach_edh/teach/data/outputs/ --split valid_seen --metrics_file /scratch/smahali6/teach_edh/teach/data/outputs/metrics/

Initially, I got the error below:

[MainThread-71052-INFO] teach.cli.eval: Evaluating split valid_seen requiring 608 files
INFO:teach.cli.eval:Evaluating split valid_seen requiring 608 files
Traceback (most recent call last):
  File "/home/smahali6/.conda/envs/teach_edh/bin/teach_eval", line 8, in <module>
    sys.exit(main())
  File "/scratch/smahali6/teach_edh/teach/src/teach/cli/eval.py", line 79, in main
    logger.info("Evaluating split %s requiring %d files" % (args.split, len(edh_instance_files)))
NameError: name 'edh_instance_files' is not defined

I solved it by using the line below:
edh_instance_files = set(os.listdir(os.path.join(args.data_dir, input_subdir, args.split)))

Now, I get a ZeroDivisonError mentioned below:

[MainThread-91566-INFO] teach.cli.eval: Evaluating split valid_seen requiring 608 files
INFO:teach.cli.eval:Evaluating split valid_seen requiring 608 files
[MainThread-91566-INFO] teach.cli.eval: Evaluating split valid_seen requiring 608 files
INFO:teach.cli.eval:Evaluating split valid_seen requiring 608 files
[MainThread-91566-INFO] teach.cli.eval: Found output files for 0 instances; treating remaining 608 as failed...
INFO:teach.cli.eval:Found output files for 0 instances; treating remaining 608 as failed...
Traceback (most recent call last):
  File "/home/smahali6/.conda/envs/teach_edh/bin/teach_eval", line 8, in <module>
    sys.exit(main())
  File "/scratch/smahali6/teach_edh/teach/src/teach/cli/eval.py", line 114, in main
    results = aggregate_metrics(traj_stats, args)
  File "/scratch/smahali6/teach_edh/teach/src/teach/eval/compute_metrics.py", line 87, in aggregate_metrics
    sr = float(num_successes) / num_evals
ZeroDivisionError: float division by zero

Does the pipeline require any files under the output directory as Found output files for 0 instances; treating remaining 608 as failed...? How can I fix this and evaluate the model?

The directory tree for the project directory can be found in this link.
https://drive.google.com/file/d/16DrPFl-dcxbPgtbIz0ryeFZXxDRUOcIO/view?usp=sharing

Thanks!

Possible bugs in `get_state_changes`

Hi @aishwaryap,

Thank you for releasing the dataset. It seems that there is a bug in the get_state_changes function:

agent_final = init_state["agents"][idx]

I believe it should be agent_final = final_state["agents"][idx] instead. As a result, the state differences of the agents are empty in all teach-dataset/images/$SPLIT/$REPLAYED_CODE/statediff.*.json files.

Thanks,
Jiachen

Could I get Top-down view for dataset?

Thank you for your hard work in making this awesome benchmark!
As I wrote on the title, I want to get top-down view from commander and driver but seems not supported in the dataset.
Could it possible to get and use the top-down view for each episode for this benchmark?

Regardless.

Connection refused in the first EDH instance during inference with RemoteInferenceRunner

Hi!

We encountered a bug when we were trying to evaluate the docker image with RemoteInferenceRunner locally. But every time we start the evaluation there is a connection error in the very first instance. Then it gets normal for the remaining instances.

Here is part of the console output:

DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): teachmodelapi:5000
ERROR:teach.inference.inference_runner:Failed to start_new_edh_instance for 38e166ac4f59b7a7_cf45.edh0
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 159, in _new_conn
    conn = connection.create_connection(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/local/lib/python3.8/dist-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 670, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.8/http/client.py", line 1256, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1302, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1251, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1011, in _send_output
    self.send(msg)
  File "/usr/lib/python3.8/http/client.py", line 951, in send
    self.connect()
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 187, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 171, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fb92b9065e0>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 440, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 726, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/util/retry.py", line 446, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='teachmodelapi', port=5000): Max retries exceeded with url: /start_new_edh_instance (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb92b9065e0>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/src/teach/inference/inference_runner.py", line 175, in _run_edh_instance
    model.start_new_edh_instance(edh_instance, edh_history_images, instance_file)
  File "/src/teach/inference/remote_model.py", line 94, in start_new_edh_instance
    resp = requests.post(self.start_edh_url, data=data, files=images)
  File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 117, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 519, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='teachmodelapi', port=5000): Max retries exceeded with url: /start_new_edh_instance (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb92b9065e0>: Failed to establish a new connection: [Errno 111] Connection refused'))
[MainThread-135-ERROR] teach.inference.inference_runner: Failed to start_new_edh_instance for 38e166ac4f59b7a7_cf45.edh0
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 159, in _new_conn
    conn = connection.create_connection(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/local/lib/python3.8/dist-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 670, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.8/http/client.py", line 1256, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1302, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1251, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1011, in _send_output
    self.send(msg)
  File "/usr/lib/python3.8/http/client.py", line 951, in send
    self.connect()
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 187, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 171, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fb92b9065e0>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 440, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 726, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.8/dist-packages/urllib3/util/retry.py", line 446, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='teachmodelapi', port=5000): Max retries exceeded with url: /start_new_edh_instance (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb92b9065e0>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/src/teach/inference/inference_runner.py", line 175, in _run_edh_instance
    model.start_new_edh_instance(edh_instance, edh_history_images, instance_file)
  File "/src/teach/inference/remote_model.py", line 94, in start_new_edh_instance
    resp = requests.post(self.start_edh_url, data=data, files=images)
  File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 117, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 519, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='teachmodelapi', port=5000): Max retries exceeded with url: /start_new_edh_instance (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb92b9065e0>: Failed to establish a new connection: [Errno 111] Connection refused'))
ERROR:teach.inference.inference_runner:exception happened for instance=/data/edh_instances/valid_seen/38e166ac4f59b7a7_cf45.edh0.json, continue with the rest
Traceback (most recent call last):
  File "/src/teach/inference/inference_runner.py", line 121, in _run
    instance_id, instance_metrics = InferenceRunner._run_edh_instance(instance_file, config, model, er)
  File "/src/teach/inference/inference_runner.py", line 228, in _run_edh_instance
    json.dump(pred_actions, handle)
UnboundLocalError: local variable 'pred_actions' referenced before assignment
[MainThread-135-ERROR] teach.inference.inference_runner: exception happened for instance=/data/edh_instances/valid_seen/38e166ac4f59b7a7_cf45.edh0.json, continue with the rest
Traceback (most recent call last):
  File "/src/teach/inference/inference_runner.py", line 121, in _run
    instance_id, instance_metrics = InferenceRunner._run_edh_instance(instance_file, config, model, er)
  File "/src/teach/inference/inference_runner.py", line 228, in _run_edh_instance
    json.dump(pred_actions, handle)
UnboundLocalError: local variable 'pred_actions' referenced before assignment

Things got normal starting from the second instance:

MainThread-135-INFO] teach.inference.inference_runner: Elapsed time for episode replay: 19.24296934180893
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): teachmodelapi:5000
DEBUG:urllib3.connectionpool:http://teachmodelapi:5000 "POST /start_new_edh_instance HTTP/1.1" 200 7
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): teachmodelapi:5000
DEBUG:urllib3.connectionpool:http://teachmodelapi:5000 "POST /get_next_action HTTP/1.1" 200 50
[MainThread-135-DEBUG] teach.simulators.simulator_THOR: Driver: Turn Right
[MainThread-135-DEBUG] teach.simulators.simulator_THOR: Driver: Turn Right
DEBUG:teach.simulators.simulator_THOR:Driver: Turn Right
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): teachmodelapi:5000
DEBUG:urllib3.connectionpool:http://teachmodelapi:5000 "POST /get_next_action HTTP/1.1" 200 47
[MainThread-135-DEBUG] teach.simulators.simulator_THOR: Driver: Forward
[MainThread-135-DEBUG] teach.simulators.simulator_THOR: Driver: Forward
DEBUG:teach.simulators.simulator_THOR:Driver: Forward

Here is the script we used to create container and run the inference:

export HOST_DATA_DIR=/home/ubuntu/workplace/data
export HOST_IMAGES_DIR=/home/ubuntu/workplace/teach/doc_images
export HOST_OUTPUT_DIR=/home/ubuntu/workplace/teach/doc_output
export API_PORT=5000
export SUBMISSION_PK=168888
export INFERENCE_GPUS='"device=0"'
export API_GPUS='"device=0"'
export SPLIT=valid_seen
export DOCKER_NETWORK=no-internet


docker run -d --rm \
    --gpus $API_GPUS \
    --name TeachModelAPI \
    --network $DOCKER_NETWORK \
    -e SPLIT=$SPLIT \
    -v $HOST_DATA_DIR:/data:ro \
    -v $HOST_IMAGES_DIR/$SUBMISSION_PK:/images:ro \
    -t teach-model-api-etmodel \

docker run --rm \
    --privileged \
    -e DISPLAY=:0 \
    -e NVIDIA_DRIVER_CAPABILITIES=all \
    --name RemoteInferenceRunner \
    --network $DOCKER_NETWORK \
    --gpus $INFERENCE_GPUS \
    -v /tmp/.X11-unix:/tmp/.X11-unix:ro \
    -v $HOST_DATA_DIR:/data:ro \
    -v $HOST_IMAGES_DIR/$SUBMISSION_PK:/images \
    -v $HOST_OUTPUT_DIR/$SUBMISSION_PK:/output \
    remote-inference-runner teach_inference \
        --data_dir /data \
        --output_dir /output \
        --images_dir /images \
        --split $SPLIT \
        --metrics_file /output/metrics_file \
        --model_module teach.inference.remote_model \
        --model_class RemoteModel \
        --model_api_host_and_port "@TeachModelAPI:$API_PORT"

Can you help us with this? Thanks in advance!

The same action prediction gets different evaluation metrics in different runs

Hi,

I ran the baseline ET model and found that two different runs get significantly different evaluation metrics. (might relate to this issue #10)
Run1:

SR: 77/608 = 0.127
GC: 487/3526 = 0.138
PLW SR: 0.026
PLW GC: 0.093

Run2:

SR: 52/608 = 0.086
GC: 321/3526 = 0.091
PLW SR: 0.007
PLW GC: 0.034

After taking a close look at the output I find in some episodes the same set of prediction actions results in different evaluation metrics in different runs. For example in this 66957a984ae5a714_f28d.edh4, the inference output for the first run is:

"66957a984ae5a714_f28d.edh4": {
        "instance_id": "66957a984ae5a714_f28d.edh4",
        "game_id": "66957a984ae5a714_f28d",
        "completed_goal_conditions": 2,
        "total_goal_conditions": 2,
        "goal_condition_success": 1,
        "success_spl": 0.55,
        "path_len_weighted_success_spl": 12.100000000000001,
        "goal_condition_spl": 0.55,
        "path_len_weighted_goal_condition_spl": 12.100000000000001,
        "gt_path_len": 22,
        "reward": 0,
        "success": 1,
        "traj_len": 40,
        "predicted_stop": 0,
        "num_api_fails": 30,
        "error": 0,
        "init_success": true,
        "pred_actions": [
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ]
        ]
    }

While for the second run it is:

"66957a984ae5a714_f28d.edh4": {
        "instance_id": "66957a984ae5a714_f28d.edh4",
        "game_id": "66957a984ae5a714_f28d",
        "completed_goal_conditions": 0,
        "total_goal_conditions": 2,
        "goal_condition_success": 0.0,
        "success_spl": 0.0,
        "path_len_weighted_success_spl": 0.0,
        "goal_condition_spl": 0.0,
        "path_len_weighted_goal_condition_spl": 0.0,
        "gt_path_len": 22,
        "reward": 0.0,
        "success": 0,
        "traj_len": 40,
        "predicted_stop": 0,
        "num_api_fails": 30,
        "error": 0,
        "init_success": true,
        "pred_actions": [
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ],
            [
                "Forward",
                null
            ]
        ]
    }

So basically the first evaluation result does not make sense since there should be no chance for the model to succeed without performing any manipulative actions.

The first run is done using an AWS ec2 p3.8 instance while the second run using a p3.16. All the other settings are the same. The full evaluation logs are available here: [run 1] [run 2]

Do you have any idea about the cause? Thanks

Different vocab size between `data.vocab` and `embs_ann`?

Hi, I noticed that the data.vocab stored in the baseline model has a different vocabulary length compared to the language embedding stored in pretrained model.

For the baseline model "et_plus_h", the data.vocab file has Vocab(2554) for words while if I load the pretrained model from baseline_models/et_plus_h/latest.pth, the embedding layer model.embs_ann.lmdb_simbot_edh_vocab_none.weight has torch.Size([2788, 768]).

Did I miss something?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.