Code Monkey home page Code Monkey logo

sound-spaces's Introduction

SoundSpaces is a realistic acoustic simulation platform for audio-visual embodied AI research. From audio-visual navigation, audio-visual exploration to echolocation and audio-visual floor plan reconstruction, this platform expands embodied vision research to a broader scope of topics.


Click on the gif to view the video. Listen with headphones to hear the spatial sound properly!

Motivation

Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf---restricted to solely their visual perception of the environment. We introduce audio-visual navigation for complex, acoustically and visually realistic 3D environments. We further build SoundSpaces: a first-of-its-kind dataset of audio renderings based on geometrical acoustic simulations for two sets of publicly available 3D environments (Matterport3D and Replica), and we instrument Habitat to support the new sensor, making it possible to insert arbitrary sound sources in an array of real-world scanned environments.

Citing SoundSpaces

If you use the SoundSpaces platform in your research, please cite the following paper:

@inproceedings{chen22soundspaces2,
  title     =     {SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning},
  author    =     {Changan Chen and Carl Schissler and Sanchit Garg and Philip Kobernik and Alexander Clegg and Paul Calamia and Dhruv Batra and Philip W Robinson and Kristen Grauman},
  booktitle =     {NeurIPS 2022 Datasets and Benchmarks Track},
  year      =     {2022}
}
@inproceedings{chen20soundspaces,
  title     =     {SoundSpaces: Audio-Visual Navigaton in 3D Environments},
  author    =     {Changan Chen and Unnat Jain and Carl Schissler and Sebastia Vicenc Amengual Gari and Ziad Al-Halah and Vamsi Krishna Ithapu and Philip Robinson and Kristen Grauman},
  booktitle =     {ECCV},
  year      =     {2020}
}

If you use any of the 3D scene assets (Matterport3D, Replica, HM3D, Gibson, etc.), please make sure you cite these papers as well!

Installation

Follow the step-by-step installation guide to install the repo.

Usage

This repo renders audio-visual observations with high acoustic and spatial correspondence. It supports various visual-acoustic learning tasks, including audio-visual embodied navigation, acoustics prediction from egocentric observations, etc. In this repo, we provide code for training and evaluating audio-visual navigation agents. For other downstream tasks, please check out each paper's respective repo, e.g., visual acoustic matching and audio-visual dereverberation.

Below we show some example commands for training and evaluating AudioGoal with depth sensor on Replica.

  1. Training
python ss_baselines/av_nav/run.py --exp-config ss_baselines/av_nav/config/audionav/replica/train_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth
  1. Validation (evaluate each checkpoint and generate a validation curve)
python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/val_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth
  1. Test the best validation checkpoint based on validation curve
python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth EVAL_CKPT_PATH_DIR data/models/replica/audiogoal_depth/data/ckpt.XXX.pth
  1. Generate demo video with audio
python ss_baselines/av_nav/run.py --run-type eval --exp-config ss_baselines/av_nav/config/audionav/replica/test_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth EVAL_CKPT_PATH_DIR data/models/replica/audiogoal_depth/data/ckpt.220.pth VIDEO_OPTION [\"disk\"] TASK_CONFIG.SIMULATOR.USE_RENDERED_OBSERVATIONS False TASK_CONFIG.TASK.SENSORS [\"POINTGOAL_WITH_GPS_COMPASS_SENSOR\",\"SPECTROGRAM_SENSOR\",\"AUDIOGOAL_SENSOR\"] SENSORS [\"RGB_SENSOR\",\"DEPTH_SENSOR\"] EXTRA_RGB True TASK_CONFIG.SIMULATOR.CONTINUOUS_VIEW_CHANGE True DISPLAY_RESOLUTION 512 TEST_EPISODE_COUNT 1
  1. Interactive demo
python scripts/interactive_demo.py
  1. [New] Training continuous navigation agent
python ss_baselines/av_nav/run.py --exp-config ss_baselines/av_nav/config/audionav/mp3d/train_telephone/audiogoal_depth_ddppo.yaml --model-dir data/models/ss2/mp3d/dav_nav CONTINUOUS True

SoundSpaces 1.0

We provide acoustically realistic audio renderings for Replica and Matterport3D datasets. The audio renderings exist in the form of pre-rendered room impulse responses (RIR), which allows users to convolve with any source sounds they wish during training. See dataset for more details.
Note that we do not open source the rendering code at this time.

SoundSpaces 2.0

SoundSpaces 2.0 is a fast, continuous, configurable and generalizable audio-visual simulation platform that allows users to render sounds for arbitrary spaces and environments. As a result of rendering accuracy improvements, the rendered IRs are different from SoundSpaces 1.0. Check out the jupyter notebook for a quick tutorial. The documentation of the APIs can be found here.

Contributing

See the CONTRIBUTING file for how to help out.

License

SoundSpaces is CC-BY-4.0 licensed, as found in the LICENSE file.

The trained models and the task datasets are considered data derived from the correspondent scene datasets.

sound-spaces's People

Contributors

changanvr avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sound-spaces's Issues

The validation did not stop,but suspend.

The pring information shows that the validation experiment has printed and verified the check point of ckpt.799.pth, but the main program has been stopped there without termination.

the check point of ckpt.799.pth

image

The validation did not stop,but suspend.

image

Wrong function call

In https://github.com/facebookresearch/sound-spaces/blob/master/soundspaces/simulator.py, line 535

def _update_observations_with_audio(self, observations):
    audio = self.get_current_audio_observation()
    observations.update({"audio": audio})`

I think it should be

def _update_observations_with_audio(self, observations):
    audio = self.get_current_audiogoal_observation()
    observations.update({"audio": audio})

In addition, I wonder why audiogoal/spectrogram is not included in the agent's sensor list.
Thanks

About Validation,not all the checkpoint has been evaluated

I run the following command:

python av_nav/run.py --run-type eval --exp-config av_nav/config/replica/val_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth

my validation log is as follows:

2020-10-31 13:00:19,730 Average episode reward: -0.001720
2020-10-31 13:00:19,730 Average episode distance_to_goal: 6.458000
2020-10-31 13:00:19,730 Average episode normalized_distance_to_goal: 0.993270
2020-10-31 13:00:19,731 Average episode na: 3.872000
2020-10-31 13:00:19,731 Average episode sna: 0.000000
2020-10-31 13:00:19,731 Average episode success: 0.000000
2020-10-31 13:00:19,731 Average episode spl: 0.000000
2020-10-31 13:00:19,731 Average episode softspl: 0.029563
2020-10-31 13:00:21,749 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.1.pth=======
2020-10-31 13:00:21,920 Initializing dataset AudioNav
2020-10-31 13:02:35,308 Average episode reward: 1.280840
2020-10-31 13:02:35,308 Average episode reward: 1.280840
2020-10-31 13:02:35,309 Average episode distance_to_goal: 5.529000
2020-10-31 13:02:35,309 Average episode distance_to_goal: 5.529000
2020-10-31 13:02:35,309 Average episode normalized_distance_to_goal: 0.791555
2020-10-31 13:02:35,309 Average episode normalized_distance_to_goal: 0.791555
2020-10-31 13:02:35,309 Average episode na: 18.516000
2020-10-31 13:02:35,309 Average episode na: 18.516000
2020-10-31 13:02:35,309 Average episode sna: 0.015994
2020-10-31 13:02:35,309 Average episode sna: 0.015994
2020-10-31 13:02:35,309 Average episode success: 0.050000
2020-10-31 13:02:35,309 Average episode success: 0.050000
2020-10-31 13:02:35,309 Average episode spl: 0.034297
2020-10-31 13:02:35,309 Average episode spl: 0.034297
2020-10-31 13:02:35,309 Average episode softspl: 0.174971
2020-10-31 13:02:35,309 Average episode softspl: 0.174971
2020-10-31 13:02:37,325 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.2.pth=======
2020-10-31 13:02:37,325 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.2.pth=======
2020-10-31 13:02:37,485 Initializing dataset AudioNav
2020-10-31 13:02:37,485 Initializing dataset AudioNav
2020-10-31 13:09:06,809 Average episode reward: 3.441059
2020-10-31 13:09:06,809 Average episode reward: 3.441059
2020-10-31 13:09:06,809 Average episode reward: 3.441059
2020-10-31 13:09:06,810 Average episode distance_to_goal: 4.371000
2020-10-31 13:09:06,810 Average episode distance_to_goal: 4.371000
2020-10-31 13:09:06,810 Average episode distance_to_goal: 4.371000
2020-10-31 13:09:06,810 Average episode normalized_distance_to_goal: 0.595446
2020-10-31 13:09:06,810 Average episode normalized_distance_to_goal: 0.595446
2020-10-31 13:09:06,810 Average episode normalized_distance_to_goal: 0.595446
2020-10-31 13:09:06,811 Average episode na: 50.294000
2020-10-31 13:09:06,811 Average episode na: 50.294000
2020-10-31 13:09:06,811 Average episode na: 50.294000
2020-10-31 13:09:06,811 Average episode sna: 0.038717
2020-10-31 13:09:06,811 Average episode sna: 0.038717
2020-10-31 13:09:06,811 Average episode sna: 0.038717
2020-10-31 13:09:06,811 Average episode success: 0.182000
2020-10-31 13:09:06,811 Average episode success: 0.182000
2020-10-31 13:09:06,811 Average episode success: 0.182000
2020-10-31 13:09:06,812 Average episode spl: 0.092030
2020-10-31 13:09:06,812 Average episode spl: 0.092030
2020-10-31 13:09:06,812 Average episode spl: 0.092030
2020-10-31 13:09:06,812 Average episode softspl: 0.242258
2020-10-31 13:09:06,812 Average episode softspl: 0.242258
2020-10-31 13:09:06,812 Average episode softspl: 0.242258
2020-10-31 13:09:08,848 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.3.pth=======
2020-10-31 13:09:08,848 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.3.pth=======
2020-10-31 13:09:08,848 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.3.pth=======
2020-10-31 13:09:09,059 Initializing dataset AudioNav
2020-10-31 13:09:09,059 Initializing dataset AudioNav
2020-10-31 13:09:09,059 Initializing dataset AudioNav
2020-10-31 13:19:53,576 Average episode reward: 6.741738
2020-10-31 13:19:53,576 Average episode reward: 6.741738
2020-10-31 13:19:53,576 Average episode reward: 6.741738
2020-10-31 13:19:53,576 Average episode reward: 6.741738
2020-10-31 13:19:53,577 Average episode distance_to_goal: 3.357000
2020-10-31 13:19:53,577 Average episode distance_to_goal: 3.357000
2020-10-31 13:19:53,577 Average episode distance_to_goal: 3.357000
2020-10-31 13:19:53,577 Average episode distance_to_goal: 3.357000
2020-10-31 13:19:53,577 Average episode normalized_distance_to_goal: 0.444476
2020-10-31 13:19:53,577 Average episode normalized_distance_to_goal: 0.444476
2020-10-31 13:19:53,577 Average episode normalized_distance_to_goal: 0.444476
2020-10-31 13:19:53,577 Average episode normalized_distance_to_goal: 0.444476
2020-10-31 13:19:53,577 Average episode na: 75.626000
2020-10-31 13:19:53,577 Average episode na: 75.626000
2020-10-31 13:19:53,577 Average episode na: 75.626000
2020-10-31 13:19:53,577 Average episode na: 75.626000
2020-10-31 13:19:53,578 Average episode sna: 0.102722
2020-10-31 13:19:53,578 Average episode sna: 0.102722
2020-10-31 13:19:53,578 Average episode sna: 0.102722
2020-10-31 13:19:53,578 Average episode sna: 0.102722
2020-10-31 13:19:53,578 Average episode success: 0.436000
2020-10-31 13:19:53,578 Average episode success: 0.436000
2020-10-31 13:19:53,578 Average episode success: 0.436000
2020-10-31 13:19:53,578 Average episode success: 0.436000
2020-10-31 13:19:53,578 Average episode spl: 0.252236
2020-10-31 13:19:53,578 Average episode spl: 0.252236
2020-10-31 13:19:53,578 Average episode spl: 0.252236
2020-10-31 13:19:53,578 Average episode spl: 0.252236
2020-10-31 13:19:53,579 Average episode softspl: 0.361664
2020-10-31 13:19:53,579 Average episode softspl: 0.361664
2020-10-31 13:19:53,579 Average episode softspl: 0.361664
2020-10-31 13:19:53,579 Average episode softspl: 0.361664
2020-10-31 13:19:55,617 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.4.pth=======
2020-10-31 13:19:55,617 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.4.pth=======
2020-10-31 13:19:55,617 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.4.pth=======
2020-10-31 13:19:55,617 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.4.pth=======
2020-10-31 13:19:55,916 Initializing dataset AudioNav
2020-10-31 13:19:55,916 Initializing dataset AudioNav
2020-10-31 13:19:55,916 Initializing dataset AudioNav
2020-10-31 13:19:55,916 Initializing dataset AudioNav
2020-10-31 13:41:02,260 Average episode reward: 8.549353
2020-10-31 13:41:02,260 Average episode reward: 8.549353
2020-10-31 13:41:02,260 Average episode reward: 8.549353
2020-10-31 13:41:02,260 Average episode reward: 8.549353
2020-10-31 13:41:02,260 Average episode reward: 8.549353
2020-10-31 13:41:02,261 Average episode distance_to_goal: 2.026000
2020-10-31 13:41:02,261 Average episode distance_to_goal: 2.026000
2020-10-31 13:41:02,261 Average episode distance_to_goal: 2.026000
2020-10-31 13:41:02,261 Average episode distance_to_goal: 2.026000
2020-10-31 13:41:02,261 Average episode distance_to_goal: 2.026000
2020-10-31 13:41:02,261 Average episode normalized_distance_to_goal: 0.290701
2020-10-31 13:41:02,261 Average episode normalized_distance_to_goal: 0.290701
2020-10-31 13:41:02,261 Average episode normalized_distance_to_goal: 0.290701
2020-10-31 13:41:02,261 Average episode normalized_distance_to_goal: 0.290701
2020-10-31 13:41:02,261 Average episode normalized_distance_to_goal: 0.290701
2020-10-31 13:41:02,261 Average episode na: 159.964000
2020-10-31 13:41:02,261 Average episode na: 159.964000
2020-10-31 13:41:02,261 Average episode na: 159.964000
2020-10-31 13:41:02,261 Average episode na: 159.964000
2020-10-31 13:41:02,261 Average episode na: 159.964000
2020-10-31 13:41:02,261 Average episode sna: 0.102214
2020-10-31 13:41:02,261 Average episode sna: 0.102214
2020-10-31 13:41:02,261 Average episode sna: 0.102214
2020-10-31 13:41:02,261 Average episode sna: 0.102214
2020-10-31 13:41:02,261 Average episode sna: 0.102214
2020-10-31 13:41:02,262 Average episode success: 0.568000
2020-10-31 13:41:02,262 Average episode success: 0.568000
2020-10-31 13:41:02,262 Average episode success: 0.568000
2020-10-31 13:41:02,262 Average episode success: 0.568000
2020-10-31 13:41:02,262 Average episode success: 0.568000
2020-10-31 13:41:02,262 Average episode spl: 0.255077
2020-10-31 13:41:02,262 Average episode spl: 0.255077
2020-10-31 13:41:02,262 Average episode spl: 0.255077
2020-10-31 13:41:02,262 Average episode spl: 0.255077
2020-10-31 13:41:02,262 Average episode spl: 0.255077
2020-10-31 13:41:02,262 Average episode softspl: 0.339762
2020-10-31 13:41:02,262 Average episode softspl: 0.339762
2020-10-31 13:41:02,262 Average episode softspl: 0.339762
2020-10-31 13:41:02,262 Average episode softspl: 0.339762
2020-10-31 13:41:02,262 Average episode softspl: 0.339762
2020-10-31 13:41:04,296 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.5.pth=======
2020-10-31 13:41:04,296 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.5.pth=======
2020-10-31 13:41:04,296 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.5.pth=======
2020-10-31 13:41:04,296 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.5.pth=======
2020-10-31 13:41:04,296 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.5.pth=======
.
.

2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,197 =======current_ckpt: data/models/replica/audiogoal_depth/data/ckpt.64.pth=======
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
2020-11-01 09:40:54,414 Initializing dataset AudioNav
  1. the total number of checkpoint is 800( 0~799),but I only saw 64 checkpoints has been evaluated from the log of the validation experiment(ckpt.1.pth to ckpt.64.pth), what happened?
  2. my validation curve is as follows:
    image

The Validation didn't run all of the 500 episodes.

Hi, Changan, I'm facing a problem that when I evaluate my trained models, the validation always didn't run all of the 500 episodes. Just like 5%|██ | 25/500 [00:18<05:18, 1.49it/s] and it will be stoped from evaluating. I have checked the validation jason which includes 4 scenes and 501 episodes. So what is the problem? Thanks a lot!

on agent start position and goal position

In a episode for training,validation or test, How is the starting position and goal of the agent set? Is it randomly generated? Or it is given by the dataset, but the starting position and target of the given agent are also randomly generated or how to set it?

on validation

  1. NUM_PROCESSES is set to 1 during verification (evaluating each checkpoint and generating a verification curve), but it is set to 5 during training. Why is this parameter different during verification? Is NUM_PROCESSES recommended to be changed to 5 when verifying (evaluating each checkpoint and generating a verification curve)?
  2. About evaluate multiple checkpoints in order
    1. There is an infinite loop in the ‘eval’ function in av_nav/common/base_trainer.py. Will this cause the verification process to not end?
    1. There is also a sleep for 2 secs before polling again (time.sleep(2), I have asked this question last time). Is this statement recommended not to be executed?
      image
    1. The verification log is still very large (I have consulted this question last time), the sentence in the log record "=======current_ckpt: data/models/replica/audiogoal_depth/data /ckpt.xxx.pth=======” Will repeat xxx times, this xxx is from 1 to 800, I have not found the reason for this phenomenon, can you give some suggestions?
      image

Discrepancy in episode data of Matterport3D

Hi Changan.
I notice some discrepancies with the episode data of matterport.

The number of scenes of matterport is 85 in paper. And I have known that there are 83 instead of 85 scenes from issue #12. But I count the number of scenes in all episode data, and find that there are only 81, 59/10/11 for train/val/test, respectively. Do I miss something?

Here are the scenes of val and test that I count.

val: 'QUCTc6BB5sX', 'EU6Fwq7SyZv', '2azQ1b91cZZ', 'Z6MFQCViBuw', 'pLe4wQe7qrG', 'oLBMNvg9in8', 'X7HyMhZNoso', 'zsNo4HB9uLZ', 'TbHJrupSAjP', '8194nk5LbLH'

test: 'pa4otMbVnkk', 'yqstnuAEVhm', '5ZKStnWn8Zo', 'Vt2qJdWjCF2', 'wc2JMjhGNzB', 'fzynW3qQPVF', 'UwV83HsGsw3', 'q9vSo1VnCiC', 'ARNzJeq3xxb', 'gYvKGZ5eRqb', 'jtcxE69GiFV', 'gxdoqLR6rwA'

Thank you for your time.

issue with "use_belief_predictor"

Hello, I appreciate your wonderful work.

I tried to rerun your savi experiments following the usage you wrote. However, an error occurred at l.203 of ppo_trainer.py when I tried to run python ss_baselines/savi/run.py --exp-config ss_baselines/savi/config/semantic_audionav/savi.yaml --model-dir data/models/savi.

self.belief_predictor.load_state_dict(ckpt_dict["belief_predictor"])

The error was KeyError: 'belief_predictor'. In fact, I made sure that ckpt.399.pth, which had been created in pre-training, did not have the key named belief_predictor. Attached is the error.log.
How should I pretrain savi whose ckpt.XXX.pth stores belief_predictor?

Also, I changed use_belief_predictor in savi.yaml from True to False. The training itself worked without errors, but the result was quite worse than what you had reported in your paper.


Is that a reasonable result for you?

SoundSpaces Challenge

Hi @rfalcon100 and @Pixie412,

I hope you're doing well.

I saw you posted questions in the sound-spaces github repo and you seem to be interested in and using our audio-visual simulator. For your information, we're organizing the first SoundSpaces Challenge on audio-visual navigation this year as part of the CVPR 2021 Embodied AI Workshop. Since you might be doing research related to this, we're emailing to invite you to participate in this challenge. Details of the challenge can be found at https://soundspaces.org/challenge and the challenge will be live until May 31st. The top teams on the leaderboard will present their approaches at the Embodied AI Workshop and the winning team will also win $5k Amazon AWS credits.

Do consider participating and sharing with any relevant members in your research circle!

Sorry for tagging you here since I couldn't find your public email. Please feel free to reach out to us with any questions.

Thanks,
Changan Chen, Unnat Jain, and Kristen Grauman

Discrepancy in Matterport3D datset

Hi,
I noticed some discrepancies with the matterport dataset.

  1. In the original matterport dataset, there are 90 scenes but only 85 scenes are considered. Is there any specific reason for that?
  2. The sampling rate for Replica is 44.1 kHz whereas matterport is 16 kHz. could you please let me know why there is a difference in sampling rate?
  3. Finally in the partial binaural RIRs (867G) provided by you contains only the data for 83 scenes instead of 85.

Thanks for your time.

Is the complete download address of the SoundSpaces data set available?

Is the complete download address of the SoundSpaces data set available?
That includes the rendered sound, the original Replica-Dataset data set and the Matterport3D data set segmentation and the corresponding good data set. The Matterport3D dataset is difficult to respond to when applying.

Questions about sound source of Matterport3D.

Hi,Changan. The sampling rate for Replica is 44.1 kHz whereas matterport is 16 kHz, and there are 2 questions I have:
1、Is the generating of binaural_rirs data related to the 1s source sound?
2、If the answer of question 1 is true, then which frequency of source sound is used to generate the binaural_rirs data of Matterport3D ? 44.1kHz or 16kHz?

Besides, in other paper of Facebook research named VisualEcho, if we want to generate echo data of Matterport3D dataset, should the FREQ be changed as 16000? Here is the related codes in getEchoes.py of VisualEcho :
https://github.com/facebookresearch/VisualEchoes/blob/master/getEchoes.py

on the shape of variable observation_space.spaces['audiogoal']

1) In the code file :scripts/interactive_demo.py, I observed that the shape of the variable observation_space.spaces['audiogoal'].shape is (2, 44100)

2) In the code file :scripts/interactive_demo.py, we can get observation_space from following code

···
observation_space = None
if observation_space is None:
observation_space = env.observation_space
···

3) In the code file :av_nav/rl/models/audio_cnn.py, Through following code

 observation_space.spaces['audiogoal'].shape[2],

I found that observation_space.spaces['audiogoal'].shape should be at least 3 dimensions.
image

4) In the code file :av_nav/rl/models/audio_cnn.py, Through following code

    def forward(self, observations):
        cnn_input = []

        audio_observations = observations[self._audiogoal_sensor]
        # permute tensor to dimension [BATCH x CHANNEL x HEIGHT X WIDTH]
        audio_observations = audio_observations.permute(0, 3, 1, 2)
        cnn_input.append(audio_observations)

        cnn_input = torch.cat(cnn_input, dim=1)

        return self.cnn(cnn_input)

I found that observation_space.spaces['audiogoal'].shape should be 4 dimensions.

5) how do I construct observation_space.spaces['audiogoal'] in the code file :scripts/interactive_demo.py,

Can you give me some suggestions? I wrote some code, and the following errors are included at runtime:
···
self._n_input_audio = observation_space.spaces[audiogoal_sensor].shape[2]
IndexError: tuple index out of range
···

image

Why do we need to sleep for 2 seconds when evaluate multiple checkpoints in order?

in the code file "av_nav/common/base_trainer.py"

                # evaluate multiple checkpoints in order
                while True:
                    current_ckpt = None
                    while current_ckpt is None:
                        current_ckpt = poll_checkpoint_folder(
                            self.config.EVAL_CKPT_PATH_DIR, prev_ckpt_ind, eval_interval
                        )
                        time.sleep(2)  # sleep for 2 secs before polling again
                    logger.info(f"=======current_ckpt: {current_ckpt}=======")

Why do we need to sleep for 2 seconds evaluate multiple checkpoints in order? In this case, the program will run longer.

image

Initial Run & Speed-up

Hi Changan,

I am new to this interesting repo and would like to make a few quick enquires if possible. Thanks in advance for your time.

  1. When running "python av_nav/run.py --exp-config av_nav/config/replica/train_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth", I noticed some middle message like "Current scene: frl_apartment_3 and sound: None". Does it indicate no sound is utilized for the AudioNav task? Is it a normal case as we expected so?

  2. (closed - see below) Could you pls also leave some comments & guidances about speeding up the training process? I am using all default settings, and it's shown as "env-time: 26.767s pth-time:154.669s frames: 8250".

Looking forward to your replied and best wishes

AttributeError: 'SoundSpaces' object has no attribute '_sim'

  1. when I run python scripts/cache_observations.py , an error occured as follows:
    Caching Replica observations ...

WARNING: Logging before InitGoogleLogging() is written to STDERR
I0926 15:13:51.761842 40370 AssetAttributesManager.cpp:122] Asset attributes (capsule3DSolid : capsule3DSolid_hemiRings_4_cylRings_1_segments_12_halfLen_0.75_useTexCoords_false_useTangents_false) created and registered.
I0926 15:13:51.761885 40370 AssetAttributesManager.cpp:122] Asset attributes (capsule3DWireframe : capsule3DWireframe_hemiRings_8_cylRings_1_segments_16_halfLen_1) created and registered.
I0926 15:13:51.761914 40370 AssetAttributesManager.cpp:122] Asset attributes (coneSolid : coneSolid_segments_12_halfLen_1.25_rings_1_useTexCoords_false_useTangents_false_capEnd_true) created and registered.
I0926 15:13:51.761930 40370 AssetAttributesManager.cpp:122] Asset attributes (coneWireframe : coneWireframe_segments_32_halfLen_1.25) created and registered.
I0926 15:13:51.761940 40370 AssetAttributesManager.cpp:122] Asset attributes (cubeSolid : cubeSolid) created and registered.
I0926 15:13:51.761946 40370 AssetAttributesManager.cpp:122] Asset attributes (cubeWireframe : cubeWireframe) created and registered.
I0926 15:13:51.761968 40370 AssetAttributesManager.cpp:122] Asset attributes (cylinderSolid : cylinderSolid_rings_1_segments_12_halfLen_1_useTexCoords_false_useTangents_false_capEnds_true) created and registered.
I0926 15:13:51.761986 40370 AssetAttributesManager.cpp:122] Asset attributes (cylinderWireframe : cylinderWireframe_rings_1_segments_32_halfLen_1) created and registered.
I0926 15:13:51.761996 40370 AssetAttributesManager.cpp:122] Asset attributes (icosphereSolid : icosphereSolid_subdivs_1) created and registered.
I0926 15:13:51.762004 40370 AssetAttributesManager.cpp:122] Asset attributes (icosphereWireframe : icosphereWireframe_subdivs_1) created and registered.
I0926 15:13:51.762017 40370 AssetAttributesManager.cpp:122] Asset attributes (uvSphereSolid : uvSphereSolid_rings_8_segments_16_useTexCoords_false_useTangents_false) created and registered.
I0926 15:13:51.762030 40370 AssetAttributesManager.cpp:122] Asset attributes (uvSphereWireframe : uvSphereWireframe_rings_16_segments_32) created and registered.
I0926 15:13:51.762037 40370 AssetAttributesManager.cpp:108] AssetAttributesManager::buildCtorFuncPtrMaps : Built default primitive asset templates : 12
W0926 15:13:51.762580 40370 ObjectAttributesManager.cpp:326] Cannot find ./data/objects or ./data/objects.phys_properties.json. Aborting parse.
I0926 15:13:51.762589 40370 PhysicsAttributesManager.cpp:39] File (./data/default.phys_scene_config.json) Based physics manager attributes created and registered.
I0926 15:13:51.762646 40370 StageAttributesManager.cpp:79] File (data/scene_datasets/replica/apartment_0/habitat/mesh_semantic.ply) Based stage attributes created and registered.
I0926 15:13:51.762653 40370 Simulator.cpp:145] Loading navmesh from data/scene_datasets/replica/apartment_0/habitat/mesh_semantic.navmesh
I0926 15:13:51.762989 40370 Simulator.cpp:147] Loaded.
I0926 15:13:51.763332 40370 SceneGraph.h:93] Created DrawableGroup:
Renderer: GeForce RTX 2060/PCIe/SSE2 by NVIDIA Corporation
OpenGL version: 4.6.0 NVIDIA 430.64
Using optional features:
GL_ARB_ES2_compatibility
GL_ARB_direct_state_access
GL_ARB_get_texture_sub_image
GL_ARB_invalidate_subdata
GL_ARB_multi_bind
GL_ARB_robustness
GL_ARB_separate_shader_objects
GL_ARB_texture_filter_anisotropic
GL_ARB_texture_storage
GL_ARB_texture_storage_multisample
GL_ARB_vertex_array_object
GL_KHR_debug
Using driver workarounds:
no-forward-compatible-core-context
no-layout-qualifiers-on-old-glsl
nv-zero-context-profile-mask
nv-implementation-color-read-format-dsa-broken
nv-cubemap-inconsistent-compressed-image-size
nv-cubemap-broken-full-compressed-image-query
nv-compressed-block-size-in-bits
I0926 15:13:55.722658 40370 ResourceManager.cpp:302] ResourceManager::loadStage : Not loading semantic mesh
I0926 15:13:55.724828 40370 simulator.py:181] Loaded navmesh data/scene_datasets/replica/apartment_0/habitat/mesh_semantic.navmesh
Traceback (most recent call last):
File "scripts/cache_observations.py", line 159, in
main('replica')
File "scripts/cache_observations.py", line 146, in main
obs, rotation_index = simulator.step(None)
File "scripts/cache_observations.py", line 83, in step
sim_obs = self._sim.get_sensor_observations()
AttributeError: 'SoundSpaces' object has no attribute '_sim'
I0926 15:13:56.307615 40370 PhysicsManager.cpp:33] Deconstructing PhysicsManager
I0926 15:13:56.307637 40370 SemanticScene.h:41] Deconstructing SemanticScene
I0926 15:13:56.307670 40370 SceneManager.h:25] Deconstructing SceneManager
I0926 15:13:56.307674 40370 SceneGraph.h:26] Deconstructing SceneGraph
I0926 15:13:56.307956 40370 Sensor.h:81] Deconstructing Sensor
I0926 15:13:56.307973 40370 Sensor.h:81] Deconstructing Sensor
I0926 15:13:56.312932 40370 Renderer.cpp:34] Deconstructing Renderer
I0926 15:13:56.312947 40370 WindowlessContext.h:17] Deconstructing WindowlessContext
I0926 15:13:56.319006 40370 Simulator.cpp:46] Deconstructing Simulator

  1. my enrionment as follows(I modify the version habitat_sim and habitat before install soundspaces :
    python
    Python 3.6.12 |Anaconda, Inc.| (default, Sep 8 2020, 23:10:56)
    [GCC 7.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.

import habitat_sim
import habitat
habitat_sim.version
'0.1.6'
habitat.version
'0.1.6'
import sounspace
Traceback (most recent call last):
File "", line 1, in
ModuleNotFoundError: No module named 'sounspace'
import sounspaces
Traceback (most recent call last):
File "", line 1, in
ModuleNotFoundError: No module named 'sounspaces'
import soundspaces
soundspaces.version
Traceback (most recent call last):
File "", line 1, in
AttributeError: module 'soundspaces' has no attribute 'version'

Questions about point goal navigation.

Hi, Changan! Is it the truth that the agent still needs to predict a stop action to stop the current episode in point goal task? Why can't I just stop when the (delta x, delta y) is equal to (0,0) ?

How can I change the size of agent observations?

Hi,
The scripts/cache_observations.py only provide 128*128 resolution rgb and depth images. Is there any way that I can get other resolution observations? In the habitat-sim, I can change the ImageExtractor() input setting of "img_size", but I don't know how to do it in soundspaces. Can you provide any suggestions?
image

Issues in running interactive_demo.py

Hi @ChanganVR,

I am trying to run python scripts/interactive_demo.py, but I get this error:

(habitat) gyan@gyan-Lenovo-Y50:~/Documents/sound-spaces$ python scripts/interactive_demo.py
pygame 2.0.1 (SDL 2.0.14, Python 3.6.13)
Hello from the pygame community. https://www.pygame.org/contribute.html
Traceback (most recent call last):
  File "scripts/interactive_demo.py", line 242, in <module>
    main()
  File "scripts/interactive_demo.py", line 219, in main
    run_type=args.run_type)
  File "/home/gyan/Documents/sound-spaces/ss_baselines/av_nav/config/default.py", line 177, in get_config
    config = merge_from_path(_C.clone(), config_paths)
  File "/home/gyan/Documents/sound-spaces/ss_baselines/av_nav/config/default.py", line 155, in merge_from_path
    config.merge_from_file(config_path)
  File "/home/gyan/miniconda3/envs/habitat/lib/python3.6/site-packages/yacs/config.py", line 211, in merge_from_file
    with open(cfg_filename, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'baselines/config/audiogoal_rgb_demo.yaml'

When I run python scripts/interactive_demo.py --exp-config "/home/gyan/Documents/sound-spaces/configs/audionav/av_nav/mp3d/audiogoal.yaml", I get this error:

(habitat) gyan@gyan-Lenovo-Y50:~/Documents/sound-spaces$ python scripts/interactive_demo.py --exp-config "/home/gyan/Documents/sound-spaces/configs/audionav/av_nav/mp3d/audiogoal.yaml"
pygame 2.0.1 (SDL 2.0.14, Python 3.6.13)
Hello from the pygame community. https://www.pygame.org/contribute.html
Traceback (most recent call last):
  File "scripts/interactive_demo.py", line 242, in <module>
    main()
  File "scripts/interactive_demo.py", line 219, in main
    run_type=args.run_type)
  File "/home/gyan/Documents/sound-spaces/ss_baselines/av_nav/config/default.py", line 178, in get_config
    config.TASK_CONFIG = get_task_config(config_paths=config.BASE_TASK_CONFIG_PATH)
  File "/home/gyan/Documents/sound-spaces/ss_baselines/av_nav/config/default.py", line 226, in get_task_config
    config.merge_from_file(config_path)
  File "/home/gyan/miniconda3/envs/habitat/lib/python3.6/site-packages/yacs/config.py", line 211, in merge_from_file
    with open(cfg_filename, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'configs/tasks/pointgoal.yaml'

When I change _C.BASE_TASK_CONFIG_PATH in ss_baselines/av_nav/config/default.py to configs/audionav/av_nav/mp3d/pointgoal.yaml, I get this error:

VIDEO_DIR: data/models/output/video_dir
VIDEO_OPTION: ['disk', 'tensorboard']
VISUALIZATION_OPTION: ['top_down_map']
2021-06-17 15:08:59,368 Initializing dataset AudioNav
Traceback (most recent call last):
  File "scripts/interactive_demo.py", line 242, in <module>
    main()
  File "scripts/interactive_demo.py", line 231, in main
    dataset = make_dataset(id_dataset=config.TASK_CONFIG.DATASET.TYPE, config=config.TASK_CONFIG.DATASET)
  File "/home/gyan/Documents/habitat-lab/habitat/datasets/registration.py", line 18, in make_dataset
    assert _dataset is not None, "Could not find dataset {}".format(id_dataset)
AssertionError: Could not find dataset AudioNav

Could you please fix these errors and help me run interactive_demo.py?

How to generate smooth video

I would like to generate a video of an agent moving in the sounding environment, and I successfully ran scripts/intereactive_demo.py in Replica dataset.
However, in the video generated by scripts/intereactive_demo.py, the agent does not move smoothly by the commands such as move forward, turn left, and turn right.
How can I generate a video where the agent moves smoothly as shown in the README (https://www.youtube.com/watch?v=4uiptTUyq30)?

semantic label to acoustic material mapping

Hi, Changan. As per the paper,
"For each semantic class that was deemed to be acoustically relevant, we provide a mapping to an equivalent acoustic material from an existing material database. For the floor, wall, and ceiling classes, we assume acoustic materials of carpet, gypsum board, and acoustic tile, respectively."

It would be really helpful if you can kindly provide the mapping for all the semantic labels.

Thanks in advance.

How did you draw picture 8 in the article?

How did you draw picture 8 in the article? Can you give me a reference? Specifically include the following 3 points:

  1. Grid
  2. Response waveform diagram of position 1 and position 2
  3. Red sound source
    image

Thank you.

How to generate sounds when facing a new direction except for 0, 90, 180, 230 degrees

Hi Changan.

Thank you for sharing the impressive dataset. In your dataset, you have provided the RIRs for generating sounds when an agent is facing 0, 90, 180, 270 degrees. I wonder is it possible for us to generate sounds when the agent is facing other degrees (e.g. 30°)?

It would be helpful if you could provide some relevant references or code if possible.

Thanks.

Topdown heatmap viewer

I noticed in your paper you presented figures (fig 1 & fig 2) for topdown & side view visualizations for sound pressure.

I want to visualize a similar heatmap using sparse samples & bilinear interpolation. I was curious if you can share what tools you use to generate the visualizations (for topdown & side view respectively).

Was it just blender?

Question about metadata

Hi, thanks for this great work! May I ask if it is possible to share how the metadata being created? I read the paper about how the graph being created and wondering if this part of the code can also be released.

Thanks a lot for your help!

Any API to query a 3-d point's visibility?

Hi, I'm interested in your works. Recently, I tried to modify the baseline "ss_baselines/av_nav/run.py", and possibly need some additional APIs to implement my method.

More specifically, I need to query a given point([x, y, z] in the world's coordinate) whether it is visible by the agent at each timestep.

However, I haven't found a feasible API for this need. Can you refer me to any related API? Any help is appreciated! Thanks :)

ConnectionResetError: [Errno 104] Connection reset by peer

Hi @ChanganVR,

I am using habitat v0.1.7 and when I run python ss_baselines/av_nav/run.py --exp-config ss_baselines/av_nav/config/audionav/replica/train_telephone/audiogoal_depth.yaml --model-dir data/models/replica/audiogoal_depth, I get this error:

Traceback (most recent call last):
  File "ss_baselines/av_nav/run.py", line 101, in <module>
    main()
  File "ss_baselines/av_nav/run.py", line 95, in main
    trainer.train()
  File "/home/i21_gtatiya/projects/sound-spaces/ss_baselines/av_nav/ppo/ppo_trainer.py", line 316, in train
    episode_steps
  File "/home/i21_gtatiya/projects/sound-spaces/ss_baselines/av_nav/ppo/ppo_trainer.py", line 149, in _collect_rollout_step
    outputs = self.envs.step([a[0].item() for a in actions])
  File "/home/i21_gtatiya/projects/habitat-lab/habitat/core/vector_env.py", line 448, in step
    return self.wait_step()
  File "/home/i21_gtatiya/miniconda3/envs/avn/lib/python3.6/contextlib.py", line 52, in inner
    return func(*args, **kwds)
  File "/home/i21_gtatiya/projects/habitat-lab/habitat/core/vector_env.py", line 436, in wait_step
    self.wait_step_at(index_env) for index_env in range(self.num_envs)
  File "/home/i21_gtatiya/projects/habitat-lab/habitat/core/vector_env.py", line 436, in <listcomp>
    self.wait_step_at(index_env) for index_env in range(self.num_envs)
  File "/home/i21_gtatiya/miniconda3/envs/avn/lib/python3.6/contextlib.py", line 52, in inner
    return func(*args, **kwds)
  File "/home/i21_gtatiya/projects/habitat-lab/habitat/core/vector_env.py", line 409, in wait_step_at
    return self._connection_read_fns[index_env]()
  File "/home/i21_gtatiya/projects/habitat-lab/habitat/core/vector_env.py", line 97, in __call__
    res = self.read_fn()
  File "/home/i21_gtatiya/projects/habitat-lab/habitat/utils/pickle5_multiprocessing.py", line 68, in recv
    buf = self.recv_bytes()
  File "/home/i21_gtatiya/miniconda3/envs/avn/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/i21_gtatiya/miniconda3/envs/avn/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/i21_gtatiya/miniconda3/envs/avn/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

I have attached the complete log here:
train_telephone_audiogoal_depth_log.txt

I was not getting this error when I was using habitat v0.1.6. Could you please fix this error.

Collision is defined twice in `nav.py` and other questions

  1. Collision is defined twice in nav.py : Is this a typo?
  2. Is SoundSpacesSim supposed to subclass HabitatSimulator and not just the generic simulator class? At present the USE_RENDERED_OBSERVATIONS: False seems like it doesn't set several flags since the generic simulator init is a no-op.

How to change batch size in your code?

How to change batch size in your code?
Is this set by *.yaml file? I guess batch size is related with "RL.PPO.num_mini_batch",am I right?
1)partial yaml file of train
image

2)partial yaml file of validation
image

Question about evaluating.

Hi, Changan, when you run on 'eval' or 'test' mode, do you need to use ".eval()" after the model? If you used this in 'eval' or 'test' mode, where is it?

How to set check points (best_val.pth and best_val.pth)?

@ChanganVR
Thank you, I appreciate your wonderful work.

I tried to rerun your savi experiments following the usage you wrote. I have 1 GPU.
I have questions as follows:

  1. How to set label check points?
  • Is it set by modifying RL.PPO.SCENE_MEMORY_TRANSFORMER.pretrained_path?
  • What is RL.PPO.SCENE_MEMORY_TRANSFORMER.pretrained_path set to?
  1. How to use the check point you provided? Specifically, which configuration can be modified to use the check point you provided?
    image
  2. How can I train from scratch to get these two check points similar to the one you provided?
  • label_predictor.pth
  • best_val.pth
  1. How to set DDPPO.pretrained_weights check points in line 60 of the file ss_baselines/savi/config/semantic_audionav/savi.yaml?
  2. I use PPO trainer. Do I need to modify the parameter configuration of DDPPO?

How to let GPU run to accelerate my trainning time?

Thank you, the training part of your code has been trained on the replica dataset of 9 scenes. In addition, I would like to ask you the following two questions:

  1. Is my GPU not used? The experimental environment is actually carried out under the version of habitat_sim v0.1.5. A hardware environment with 1 GPU and 5 threads. The replica data set of 9 scenes took almost 4 days. The GPU showed 5 threads (3M), but the usage rate of the GPU did not exceed 10%. Is this normal?
  2. I see the configuration parameters
    config.TASK_CONFIG.SIMULATOR.HABITAT_SIM_V0.GPU_GPU=False Is the GPU not really used because of this parameter?
    The training result of mine is as follows:
    image
    image
    image
    image
    image
    image
    image

Embodiment Constraints Calculation

From the paper, it is clear that the receiver can not go into every location in the scene. So, some of the points that are not navigable are removed from the dataset. At which point is the calculation done for the navigable points (I am unable to figure this out from the file scripts/cache_observations.py)? or is it precomputed and saved in the graph.pkl file of every scene?

self._audiogoal_cache is increasing all the time

in the file soundspaces/simulator.py, the code below may have a problem about memory
image
During the execution of the program, I found that the variable self._audiogoal_cache[join_index] is increasing. I know that a task experiment is over. Is there any way to clean this up and ensure that an epsisode is saved. It is cleared every time an episode is updated. I wonder if the idea is reasonable?

About acoustic simulations

Hi, I have a questions regarding the acoustic simulation.

It seems that the code in /sound-spaces/soundspaces/simulator.py only supports loading pre-rendered BRIRs, which can be downloaded with the links provided in the documentation.

Is the acoustic simulator code available somewhere?

Thanks

FileNotFoundError: [Errno 2] No such file or directory: 'configs/semantic_audionav/av_snav/mp3d/semantic_audiogoal_no_segmentation.yaml'

Hi @ChanganVR,

When I run python ss_baselines/savi/run.py --exp-config ss_baselines/savi/config/semantic_audionav/savi_pretraining.yaml --model-dir data/models/savi, I get this error:

Traceback (most recent call last):
  File "ss_baselines/savi/run.py", line 144, in <module>
    main()
  File "ss_baselines/savi/run.py", line 95, in main
    config = get_config(args.exp_config, args.opts, args.model_dir, args.run_type, args.overwrite)
  File "/home/i21_gtatiya/projects/sound-spaces/ss_baselines/savi/config/default.py", line 253, in get_config
    config.TASK_CONFIG = get_task_config(config_paths=config.BASE_TASK_CONFIG_PATH)
  File "/home/i21_gtatiya/projects/sound-spaces/ss_baselines/savi/config/default.py", line 313, in get_task_config
    config.merge_from_file(config_path)
  File "/home/i21_gtatiya/miniconda3/envs/avn/lib/python3.6/site-packages/yacs/config.py", line 211, in merge_from_file
    with open(cfg_filename, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'configs/semantic_audionav/av_snav/mp3d/semantic_audiogoal_no_segmentation.yaml'

Could you please fix this error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.