hovsg / hov-sg Goto Github PK

[RSS2024] Official implementation of "Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation"

Home Page: https://hovsg.github.io

License: MIT License

Python 100.00%

3d-scene-graph natural-language-understanding open-vocabulary robot-navigation robot-planning

hov-sg's Introduction

HOV-SG

This repository is the official implementation of the paper:

Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

Abdelrhman Werby*, Chenguang Huang*, Martin Büchner*, Abhinav Valada, and Wolfram Burgard.
*Equal contribution.

arXiv preprint arXiv:2403.17846, 2024
(Accepted for Robotics: Science and Systems (RSS), Delft, Netherlands, 2024.)

🏗 Setup

Clone and set up the HOV-SG repository

git clone https://github.com/hovsg/HOV-SG.git
cd HOV-SG

# set up virtual environment and install habitat-sim afterwards separately to avoid errors.
conda env create -f environment.yaml
conda activate hovsg
conda install habitat-sim -c conda-forge -c aihabitat

# set up the HOV-SG python package
pip install -e .

Open CLIP

HOV-SG uses the Open CLIP model to extract features from RGB-D frames. To download the Open CLIP model checkpoint CLIP-ViT-H-14-laion2B-s32B-b79K please refer to Open CLIP.

mkdir checkpoints
wget https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/resolve/main/open_clip_pytorch_model.bin?download=true -O checkpoints/temp_open_clip_pytorch_model.bin && mv checkpoints/temp_open_clip_pytorch_model.bin checkpoints/laion2b_s32b_b79k.bin

Another option is to use the OVSeg fine-tuned Open CLIP model, which is available under here:

pip install gdown
gdown --fuzzy https://drive.google.com/file/d/17C9ACGcN7Rk4UT4pYD_7hn3ytTa3pFb5/view -O checkpoints/ovseg_clip.pth

SAM

HOV-SG uses SAM to generate class-agnostic masks for the RGB-D frames. To download the SAM model checkpoint sam_v2 execute the following:

wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth -O checkpoints/sam_vit_h_4b8939.pth

🖼️ Prepare dataset

Habitat Matterport 3D Semantics

HOV-SG takes posed RGB-D sequences as input. In order to represent hierarchical multi-story scenes we make use of the Habitat 3D Semantics dataset (HM3DSem). We provide a script and pose files (data/hm3dsem_poses/) to generate RGB-D sequences using the habitat-sim simulator. The script can be found under hovsg/data.

Download the Habitat Matterport 3D Semantics dataset.

To generate RGBD sequences, run the following command:

python data/habitat/gen_hm3dsem_from_poses.py --dataset_dir <hm3dsem_dir> --save_dir data/hm3dsem_walks/

Make sure that the hm3dsem_dir has the following structure

├── hm3dsem_dir
│   ├── hm3d_annotated_basis.scene_dataset_config.json # this file is necessary
│   ├── val
│   │   └── 00824-Dd4bFSTQ8gi
│   │         ├── Dd4bFSTQ8gi.basis.glb
│   │         ├── Dd4bFSTQ8gi.basis.navmesh
│   │         ├── Dd4bFSTQ8gi.glb
│   │         ├── Dd4bFSTQ8gi.semantic.glb
│   │         └── Dd4bFSTQ8gi.semantic.txt
...

We only used the following scenes from the Habitat Matterport 3D Semantics dataset:

Show Scenes ID

00824-Dd4bFSTQ8gi
00829-QaLdnwvtxbs
00843-DYehNKdT76V
00847-bCPU9suPUw9
00849-a8BtkwhxdRV
00861-GLAQ4DNUx5U
00862-LT9Jq6dN3Ea
00873-bxsVRursffK
00877-4ok3usBNeis
00890-6s7QHgap2fW

To evaluate semantic segmentation cababilities, we used ScanNet and Replica.

ScanNet

To get an RGBD sequence for ScanNet, download the ScanNet dataset from the official website. The dataset contains RGB-D frames compressed as .sens files. To extract the frames, use the SensReader/python. We used the following scenes from the ScanNet dataset:

Show Scenes ID

scene0011_00
scene0050_00
scene0231_00
scene0378_00
scene0518_00

Replica

To get an RGBD sequence for Replica, Instead of the original Replica dataset, download the scanned RGB-D trajectories of the Replica dataset provided by Nice-SLAM. It contains rendered trajectories using the mesh models provided by the original Replica datasets. Download the Replica RGB-D scan dataset using the downloading script in Nice-SLAM.

wget https://cvg-data.inf.ethz.ch/nice-slam/data/Replica.zip -O data/Replica.zip && unzip data/Replica.zip -d data/Replica_RGBD && rm data/Replica.zip

To evaluate against the ground truth semantics labels, you also need also to download the original Replica dataset from the Replica as it contains the ground truth semantics labels as .ply files.

git clone https://github.com/facebookresearch/Replica-Dataset.git data/Replica-Dataset
chmod +x data/Replica-Dataset/download.sh && data/Replica-Dataset/download.sh data/Replica_original

We only used the following scenes from the Replica dataset:

Show Scenes ID

office0
office1
office2
office3
office4
room0
room1
room2

📂 Datasets file strutcre

The Data folder should have the following structure:

Show data folder structure

├── hm3dsem_walks
│   ├── val
│   │   ├── 00824-Dd4bFSTQ8gi
│   │   │   ├── depth
│   │   │   │   ├── Dd4bFSTQ8gi-000000.png
│   │   │   │   ├── ...
│   │   │   ├── rgb
│   │   │   │   ├── Dd4bFSTQ8gi-000000.png
│   │   │   │   ├── ...
│   │   │   ├── semantic
│   │   │   │   ├── Dd4bFSTQ8gi-000000.png
│   │   │   │   ├── ...
│   │   │   ├── pose
│   │   │   │   ├── Dd4bFSTQ8gi-000000.png
│   │   │   │   ├── ...
|   |   ├── 00829-QaLdnwvtxbs
|   |   ├── ..
├── Replica
│   ├── office0
│   │   ├── results
│   │   │   ├── depth0000.png
│   │   │   ├── ...
│   │   |   ├── rgb0000.png
│   │   |   ├── ...
│   │   ├── traj.txt
│   ├── office1
│   ├── ...
├── ScanNet
│   ├── scans
│   │   ├── scene0011_00
│   │   │   ├── color
│   │   │   │   ├── 0.jpg
│   │   │   │   ├── ...
│   │   │   ├── depth
│   │   │   │   ├── 0.png
│   │   │   │   ├── ...
│   │   │   ├── poses
│   │   │   │   ├── 0.txt
│   │   │   │   ├── ...
│   │   │   ├── internsics
│   │   │   │   ├── intrinsics_color.txt
│   │   │   │   ├── intrinsics_depth.txt
│   │   ├── ..

🚀 Run

Create Scene Graphs (only for Habitat Matterport 3D Semantics):

python application/create_graph.py main.dataset=hm3dsem main.dataset_path=data/hm3dsem_walks/val/00824-Dd4bFSTQ8gi/ main.save_path=data/scene_graphs/00824-Dd4bFSTQ8gi

This will generate a scene graph for the specified RGB-D sequence and save it. The following files are generated:

├── graph
│   ├── floors
│   │   ├── 0.json
│   │   ├── 0.ply
│   │   ├── 1.json
│   │   ├── ...
│   ├── rooms
│   │   ├── 0_0.json
│   │   ├── 0_0.ply
│   │   ├── 0_1.json
│   │   ├── ...
│   ├── objects
│   │   ├── 0_0_0.json
│   │   ├── 0_0_0.ply
│   │   ├── 0_0_1.json
│   │   ├── ...
│   ├── nav_graph
├── tmp
├── full_feats.pt
├── mask_feats.pt
├── full_pcd.ply
├── masked_pcd.ply

The graph folder contains the generated scene graph hierarchy, the first number in the file name represents the floor number, the second number represents the room number, and the third number represents the object number. The tmp folder holds intermediate results obtained throughout graph construction. The full_feats.pt and mask_feats.pt contain the features extracted from the RGBD frames using the Open CLIP and SAM models. the former contains per point features and the latter contains the features for the object masks. The full_pcd.ply and masked_pcd.ply contain the point cloud representation of the RGB-D frames and the instance masks of all objects, respectively.

Visualize Scene Graphs

python application/visualize_graph.py graph_path=data/scene_graphs/hm3dsem/00824-Dd4bFSTQ8gi/graph

Interactive visualization of Scene Graphs with Queries

Setup OpenAI

In order to test graph queries with HOV-SG, you need to setup an OpenAI API account with the following steps:

Sign up an OpenAI account, login your account, and bind your account with at least one payment method.
Get you OpenAI API keys, copy it.
Open your ~/.bashrc file, paste a new line export OPENAI_KEY=<your copied key>, save the file, and source it with command source ~/.bashrc. Another way would be to run export OPENAI_KEY=<your copied key> in the teminal where you want to run the query code.

Evaluate query against pre-built hierarchical scene graph

python application/visualize_query_graph.py main.graph_path=data/scene_graphs/hm3dsem/00824-Dd4bFSTQ8gi/graph

After launching the code, you will be asked to input the hierarchical query. An example is chair in the living room on floor 0. You can see the visualization of the top 5 target objects and the room it lies in.

Extract feature map for Semantic Segmentation (only for ScanNet and Replica)

python application/semantic_segmentation.py main.dataset=replica main.dataset_path=Replica/office0 main.save_path=data/sem_seg/office0

Evaluate Semantic Segmentation (only for ScanNet and Replica)

python application/eval/evaluate_sem_seg.py dataset=replica scene_name=office0 feature_map_path=data/sem_seg/office0

Evaluate Scene Graphs (WIP)

python application/eval/evaluate_graph.py main.graph_path=data/scene_graphs/00824-Dd4bFSTQ8gi

📔 Abstract

Recent open-vocabulary robot mapping methods enrich dense geometric maps with pre-trained visual-language features. While these maps allow for the prediction of point-wise saliency maps when queried for a certain language concept, largescale environments and abstract queries beyond the object level still pose a considerable hurdle, ultimately limiting languagegrounded robotic navigation. In this work, we present HOVSG, a hierarchical open-vocabulary 3D scene graph mapping approach for language-grounded indoor robot navigation. Leveraging open-vocabulary vision foundation models, we first obtain state-of-the-art open-vocabulary segment-level maps in 3D and subsequently construct a 3D scene graph hierarchy consisting of floor, room, and object concepts, each enriched with openvocabulary features. Our approach is able to represent multistory buildings and allows robotic traversal of those using a cross-floor Voronoi graph. HOV-SG is evaluated on three distinct datasets and surpasses previous baselines in open-vocabulary semantic accuracy on the object, room, and floor level while producing a 75% reduction in representation size compared to dense open-vocabulary maps. In order to prove the efficacy and generalization capabilities of HOV-SG, we showcase successful long-horizon language-conditioned robot navigation within realworld multi-story environments.

If you find our work useful, please consider citing our paper:

@article{werby23hovsg,
Author = {Abdelrhman Werby and Chenguang Huang and Martin Büchner and Abhinav Valada and Wolfram Burgard},
Title = {Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation},
Year = {2024},
journal = {Robotics: Science and Systems},
}

👩‍⚖️ License

For academic usage, the code is released under the MIT license. For any commercial purpose, please contact the authors.

🙏 Acknowledgment

This work was funded by the German Research Foundation (DFG) Emmy Noether Program grant number 468878300, the BrainLinks-BrainTools Center of the University of Freiburg, and an academic grant from NVIDIA.

hov-sg's People

Stargazers

Watchers

Forkers

robot-learning-freiburg agreene90 yesandy abwerby ashurms dogiwara gvc0461082002 alexandor91 micaelcarvalho curieuxjy anthony-ecpkn

hov-sg's Issues

CUDA out of memory

When I tried to create scene graphs, it showed 'CUDA out of memory'. Is there any good solution?

~/HOV-SG$ python application/create_graph.py main.dataset=hm3dsem main.dataset_path=data/hm3dsem_walks/val/00824-Dd4bFSTQ8gi/ main.save_path=data/scene_graphs/00824-Dd4bFSTQ8gi
[2024-07-23 11:58:29,144][root][INFO] - Loaded ViT-H-14 model config.
[2024-07-23 11:58:37,214][root][INFO] - Loading pretrained ViT-H-14 weights (checkpoints/laion2b_s32b_b79k.bin).
Creating RGB-D point cloud: 100%|█████████████| 226/226 [00:18<00:00, 12.31it/s]
Extracting features:   0%|                              | 0/226 [00:00<?, ?it/s]
Error executing job with overrides: ['main.dataset=hm3dsem', 'main.dataset_path=data/hm3dsem_walks/val/00824-Dd4bFSTQ8gi/', 'main.save_path=data/scene_graphs/00824-Dd4bFSTQ8gi']
Traceback (most recent call last):
  File "/home/guojb/HOV-SG/application/create_graph.py", line 19, in main
    hovsg.create_feature_map() # create feature map
  File "/home/guojb/HOV-SG/hovsg/graph/graph.py", line 177, in create_feature_map
    F_2D, F_masks, masks, F_g = extract_feats_per_pixel(
  File "/home/guojb/HOV-SG/hovsg/models/sam_clip_feats_extractor.py", line 102, in extract_feats_per_pixel
    masks = mask_generator.generate(image)
  File "/home/guojb/anaconda3/envs/hovsg/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/guojb/anaconda3/envs/hovsg/lib/python3.9/site-packages/segment_anything/automatic_mask_generator.py", line 163, in generate
    mask_data = self._generate_masks(image)
  File "/home/guojb/anaconda3/envs/hovsg/lib/python3.9/site-packages/segment_anything/automatic_mask_generator.py", line 206, in _generate_masks
    crop_data = self._process_crop(image, crop_box, layer_idx, orig_size)
  File "/home/guojb/anaconda3/envs/hovsg/lib/python3.9/site-packages/segment_anything/automatic_mask_generator.py", line 236, in _process_crop
    self.predictor.set_image(cropped_im)
  File "/home/guojb/anaconda3/envs/hovsg/lib/python3.9/site-packages/segment_anything/predictor.py", line 60, in set_image
    self.set_torch_image(input_image_torch, image.shape[:2])
  File "/home/guojb/anaconda3/envs/hovsg/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/guojb/anaconda3/envs/hovsg/lib/python3.9/site-packages/segment_anything/predictor.py", line 89, in set_torch_image
    self.features = self.model.image_encoder(input_image)
  File "/home/guojb/anaconda3/envs/hovsg/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/guojb/anaconda3/envs/hovsg/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/guojb/anaconda3/envs/hovsg/lib/python3.9/site-packages/segment_anything/modeling/image_encoder.py", line 112, in forward
    x = blk(x)
  File "/home/guojb/anaconda3/envs/hovsg/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/guojb/anaconda3/envs/hovsg/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/guojb/anaconda3/envs/hovsg/lib/python3.9/site-packages/segment_anything/modeling/image_encoder.py", line 174, in forward
    x = self.attn(x)
  File "/home/guojb/anaconda3/envs/hovsg/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/guojb/anaconda3/envs/hovsg/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/guojb/anaconda3/envs/hovsg/lib/python3.9/site-packages/segment_anything/modeling/image_encoder.py", line 234, in forward
    attn = add_decomposed_rel_pos(attn, q, self.rel_pos_h, self.rel_pos_w, (H, W), (H, W))
  File "/home/guojb/anaconda3/envs/hovsg/lib/python3.9/site-packages/segment_anything/modeling/image_encoder.py", line 358, in add_decomposed_rel_pos
    attn.view(B, q_h, q_w, k_h, k_w) + rel_h[:, :, :, :, None] + rel_w[:, :, :, None, :]
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1024.00 MiB. GPU 

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

create_graph.py fails for hm3dsem dataset

Also, when I am trying to run "create_graph.py" script, I am getting the following error:

Have you seen such a problem previously? What could be the reason?

BTW, the command in the README.md seems incorrect.
Path to the dataset should be:
main.dataset_path=data/hm3dsem_walks/val/00824-Dd4bFSTQ8gi/ instead of main.dataset_path=hm3dsem_walks/val/00824-Dd4bFSTQ8gi/ like in README.md

(hovsg) ➜  HOV-SG git:(main) ✗ python application/create_graph.py main.dataset=hm3dsem main.dataset_path=data/hm3dsem_walks/val/00824-Dd4bFSTQ8gi/ main.save_path=data/scene_graphs/00824-Dd4bFSTQ8gi
/home/ncdev/miniforge3/envs/hovsg/lib/python3.9/site-packages/scipy/__init__.py:155: UserWarning: A NumPy version >=1.18.5 and <1.26.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
[2024-07-22 17:39:49,622][root][INFO] - Loaded ViT-H-14 model config.
[2024-07-22 17:39:57,253][root][INFO] - Loading pretrained ViT-H-14 weights (checkpoints/laion2b_s32b_b79k.bin).
Creating RGB-D point cloud:   0%|                                                                                                                                                   | 0/226 [00:00<?, ?it/s]
Error executing job with overrides: ['main.dataset=hm3dsem', 'main.dataset_path=data/hm3dsem_walks/val/00824-Dd4bFSTQ8gi/', 'main.save_path=data/scene_graphs/00824-Dd4bFSTQ8gi']
Traceback (most recent call last):
  File "/home/ncdev/repos/HOV-SG/application/create_graph.py", line 39, in <module>
    main()
  File "/home/ncdev/miniforge3/envs/hovsg/lib/python3.9/site-packages/hydra/main.py", line 94, in decorated_main
    _run_hydra(
  File "/home/ncdev/miniforge3/envs/hovsg/lib/python3.9/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/home/ncdev/miniforge3/envs/hovsg/lib/python3.9/site-packages/hydra/_internal/utils.py", line 457, in _run_app
    run_and_report(
  File "/home/ncdev/miniforge3/envs/hovsg/lib/python3.9/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/home/ncdev/miniforge3/envs/hovsg/lib/python3.9/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
  File "/home/ncdev/miniforge3/envs/hovsg/lib/python3.9/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
  File "/home/ncdev/miniforge3/envs/hovsg/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/home/ncdev/miniforge3/envs/hovsg/lib/python3.9/site-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/home/ncdev/miniforge3/envs/hovsg/lib/python3.9/site-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "/home/ncdev/repos/HOV-SG/application/create_graph.py", line 19, in main
    hovsg.create_feature_map() # create feature map
  File "/home/ncdev/repos/HOV-SG/hovsg/graph/graph.py", line 154, in create_feature_map
    rgb_image, depth_image, pose, _, depth_intrinsics = self.dataset[i]
  File "/home/ncdev/repos/HOV-SG/hovsg/dataloader/hm3dsem.py", line 52, in __getitem__
    pose = self._load_pose(pose_path)
  File "/home/ncdev/repos/HOV-SG/hovsg/dataloader/hm3dsem.py", line 123, in _load_pose
    transformation_matrix = np.array(values).reshape((4, 4))
ValueError: cannot reshape array of size 1 into shape (4,4)

Camera Pose Incorrect Error & Building Scene Graph not Run on CUDA

Hi, I use the hm3d dataset to test the scene graph following the instruction in readme. However, I got the following error when running the command python application/create_graph.py main.dataset=hm3dsem main.dataset_path=./data/hm3dsem_walks/val/00843-DYehNKdT76V main.save_path=./data/scene_graphs/0843-DYehNKdT76V.

Upon checking, the extracted camera poses in folder HOV-SG/data/hm3dsem_walks/val/00843-DYehNKdT76V/pose by command python data/habitat/gen_hm3dsem_from_poses.py --dataset_dir <hm3dsem_dir> --save_dir data/hm3dsem_walks/ is something like:

which is the reason why reading it gets 7-dimensional array. I fixed this by changing the code to make the saved camera poses from the original dataset by frame to be the original 4x4 texts like (note this is not the same camera as the previous screenshot):

However, even though this fixes the first error, I still encounter the problem that the process of merging 3d masks sequentially is extremely slow as the following:

Upon checking, this process is not running on CUDA at all.
Could you provide any solution for this? Thanks.

RGB-D data collection in real world

Hi,

Thank you for sharing this great work! I notice that you use Fast-LIO for localization which is very smooth in reconstructing the real-world building. I also try to use Fast-LIO from the official repo however the performance of the localization is not that good. Could you provide details about your modified version of Fast-LIO? Thanks!!

"create_graph.py" takes a lot of time!

It only runs for 15 iterations after 4 hours, which is quite unreasonable. Is it expected or there are some parameters to configure making it run faster?

It is taking an 4090 GPU up to 17GB GPU memory, and my flash memory up to about 45GB.

python application/create_graph.py main.dataset=hm3dsem main.dataset_path=data/hm3dsem_walks/val/00824-Dd4bFSTQ8gi/ main.save_path=data/scene_graphs/00824-Dd4bFSTQ8gi
[2024-08-20 15:24:57,570][root][INFO] - Loaded ViT-H-14 model config.
[2024-08-20 15:25:04,095][root][INFO] - Loading pretrained ViT-H-14 weights (checkpoints/laion2b_s32b_b79k.bin).
Creating RGB-D point cloud: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 226/226 [00:12<00:00, 18.73it/s]
Extracting features: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 226/226 [12:27<00:00,  3.31s/it]
Merging 3d masks sequentially
  7%|██████████▍                                        15/225 [4:19:08<204:03:45, 3498.22s/it]

Semantic evaluation

Hi, first of all congrats for your great work!

I wanted to ask, how did you compute the mIoU from Tables I and II of the paper?
Did you compute the metrics for each scene, and then average across all scenes? or did you aggregate classes statistics across all the scenes, and after that computed the metrics?

Also, it would be really useful if you could upload the generated scenes reconstructions. Then I would be able to compare your method against others that compute mIoU including backround objects like OpenNerf.

Thanks!

Tomás

have you tried building the scene graph online?

Hi,
Have you tried to build the scene graph online? I mean use a real-time odometry and RGB-D camera to provide sensor data and odometry when the robot is moving, and setup a process to take them as input to build the scene graph.

create_graph.py crashes after some time

          @abwerby thank you for quick fix! I don't encounter such problems right now.

However, when I am trying to generate scene graph, I repeatedly face crashes:

(hovsg) ➜  HOV-SG git:(main) ✗ python application/create_graph.py main.dataset=hm3dsem main.dataset_path=data/hm3dsem_walks/val/00824-Dd4bFSTQ8gi/ main.save_path=data/scene_graphs/00824-Dd4bFSTQ8gi
/home/ncdev/miniforge3/envs/hovsg/lib/python3.9/site-packages/scipy/__init__.py:155: UserWarning: A NumPy version >=1.18.5 and <1.26.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
[2024-07-23 12:37:18,487][root][INFO] - Loaded ViT-H-14 model config.
[2024-07-23 12:37:26,890][root][INFO] - Loading pretrained ViT-H-14 weights (checkpoints/laion2b_s32b_b79k.bin).
Creating RGB-D point cloud: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 226/226 [00:26<00:00,  8.58it/s]
Extracting features: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 226/226 [20:37<00:00,  5.47s/it]
Merging 3d masks sequentially
  6%|██████████                                                                                                                                                        | 14/225 [25:23<19:20:49, 330.09s/it][1]    100314 killed     python application/create_graph.py main.dataset=hm3dsem

Traces show that some processes were killed on my machine, due to Out of memory problem. Despite the machine having 62 gigs of RAM.

What are the memory requirements for this script to run properly?

Originally posted by @ADetilie in #6 (comment)

Script crash during dataset preparation

TL;DR

I am getting the following error during Habitat Matterport dataset prep:

-------------
{}
agent_state: position [13.376938    0.09943584  0.18875428] rotation quaternion(1, 0, 0, 0)
saving frame 3584/3586: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3585/3585 [03:05<00:00, 19.28it/s]
GL::Context::current(): no current context
[1]    234262 IOT instruction (core dumped)

Steps to reproduce:

I've downloaded "Downloading HM3D v0.2" from https://github.com/matterport/habitat-matterport-3dresearch
and prepared it as described in the README.md.

See the image:

I've modified https://github.com/hovsg/HOV-SG/blob/main/data/habitat/gen_hm3dsem_from_poses.py#L84 and https://github.com/hovsg/HOV-SG/blob/main/data/habitat/gen_hm3dsem_from_poses.py#L85 from "quite" to "verbose" in order to get more detailed error message.
After that, I executed the following command:

python data/habitat/gen_hm3dsem_from_poses.py --dataset_dir <hm3dsem_dir> --save_dir data/hm3dsem_walks/

Results:

Script have been executed for some time, but in the end, it failed with the following error:

agent_state: position [13.376938    0.09943584  0.18875428] rotation quaternion(1, 0, 0, 0)
saving frame 3584/3586: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3585/3585 [03:48<00:00, 15.68it/s]
[15:40:40:663264]:[Debug]:[Physics] PhysicsManager.cpp(57)::~PhysicsManager : Deconstructing PhysicsManager
[15:40:40:663379]:[Debug]:[Scene] SceneManager.h(21)::~SceneManager : Deconstructing SceneManager
[15:40:40:663397]:[Debug]:[Scene] SceneGraph.h(25)::~SceneGraph : Deconstructing SceneGraph
[15:40:40:663973]:[Debug]:[Sensor] Sensor.cpp(68)::~Sensor : Deconstructing Sensor
[15:40:40:664186]:[Debug]:[Sensor] Sensor.cpp(68)::~Sensor : Deconstructing Sensor
[15:40:40:664413]:[Debug]:[Sensor] Sensor.cpp(68)::~Sensor : Deconstructing Sensor
[15:40:40:664608]:[Debug]:[Sensor] Sensor.cpp(68)::~Sensor : Deconstructing Sensor
[15:40:40:664647]:[Debug]:[Scene] SceneGraph.h(25)::~SceneGraph : Deconstructing SceneGraph
[15:40:40:667051]:[Debug]:[Scene] SemanticScene.h(53)::~SemanticScene : Deconstructing SemanticScene
[15:40:40:668724]:[Debug]:[Gfx] Renderer.cpp(71)::~Impl : Deconstructing Renderer
[15:40:40:668769]:[Debug]:[Gfx] WindowlessContext.h(17)::~WindowlessContext : Deconstructing WindowlessContext
[15:40:40:767203]:[Debug]:[Sim] Simulator.cpp(61)::~Simulator : Deconstructing Simulator
[15:40:40:767945]:[Debug]:[Physics] PhysicsManager.cpp(57)::~PhysicsManager : Deconstructing PhysicsManager
[15:40:40:767990]:[Debug]:[Scene] SceneManager.h(21)::~SceneManager : Deconstructing SceneManager
[15:40:40:768002]:[Debug]:[Scene] SceneGraph.h(25)::~SceneGraph : Deconstructing SceneGraph
GL::Context::current(): no current context
[1]    235042 IOT instruction (core dumped)  python data/habitat/gen_hm3dsem_from_poses.py --dataset_dir  --save_dir

Generated data looks like this:

Questions

Have you seen similar problems? Could you help figure out what might be the root cause of the issue? Thanks.

By the way, it seems like it might be a habitat problem. Could you specify the version of habitat you are using?

empty evaluate_graph.py

Hi,
Thank you for your great work! However, the evaluate_graph.py in the current code is empty.

I saw in the README that evaluation can be done.

I would like to know whether this code can be provided. Thanks!!

missing semantic_segmentation_config file

It seems that the semanticssegmention_config file was not provided, so the segmentation task on Replica and ScanNet cannot be run?

Out of Memory during "Merging the Masks"

Hi, thank you for sharing this excellent work.

I encountered an out-of-memory issue when running the code during the "Merging the masks" stage. My system has 32GB of RAM and 32GB of swap, but the code still runs out of memory early in the process.

I attempted the solution provided in Issue #7, but the problem persists regardless of whether the merge type is set to 'sequential' or 'hierarchical'.

#### Hierarchical
~/my_code/HOV-SG (dev) » python application/semantic_segmentation.py main.dataset=replica main.dataset_path=/home/ubuntu/workspace/dataset/Replica/room0 main.save_path=data/sem_seg/room0
[2024-07-28 19:52:08,172][root][INFO] - Loaded ViT-H-14 model config.
[2024-07-28 19:52:12,736][root][INFO] - Loading pretrained ViT-H-14 weights (checkpoints/laion2b_s32b_b79k.bin).
Creating RGB-D point cloud: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:14<00:00, 13.60it/s]
Extracting features: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [17:38<00:00,  5.29s/it]
Merging 3d masks hierarchically
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [02:11<00:00,  1.32s/it]
th:  0.7252525252525253
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [03:17<00:00,  3.96s/it]
th:  0.700762729334158
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [05:40<00:00, 13.62s/it]
th:  0.6768043960008246
 92%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████              | 12/13 [10:53<00:54, 54.48s/it]
th:  0.653887729334158
 86%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                          | 6/7 [31:23<05:13, 313.85s/it]
th:  0.6330543960008246
  0%|                                                                                                                                                                                                | 0/4 [00:00<?, ?it/s]
[1]    2516582 killed     python application/semantic_segmentation.py main.dataset=replica

#### Sequential
~/my_code/HOV-SG (dev*) » python application/semantic_segmentation.py main.dataset=replica main.dataset_path=/home/ubuntu/workspace/dataset/Replica/room0 main.save_path=data/sem_seg/room0            127 ↵ ubuntu@ubuntu
[2024-07-29 08:46:23,681][root][INFO] - Loaded ViT-H-14 model config.
[2024-07-29 08:46:28,274][root][INFO] - Loading pretrained ViT-H-14 weights (checkpoints/laion2b_s32b_b79k.bin).
Creating RGB-D point cloud: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:13<00:00, 14.47it/s]
Extracting features: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [16:58<00:00,  5.09s/it]
Merging 3d masks sequentially
  3%|████▌                                                                                                                                                                               | 5/199 [02:43<2:51:02, 52.90s/it]
[1]  2988673 killed     python application/semantic_segmentation.py main.dataset=replica

For reference, I am running the code with the Replica dataset for room0, which is a relatively small scene. Could you share your settings for running the code or any tips to handle this issue?
Any suggestions or advice to resolve this issue would be greatly appreciated. Thanks!

Bug in evaluation

https://github.com/hovsg/HOV-SG/blob/6e7560061ec5ea1f11670e8ebcbda255818b8f24/application/eval/evaluate_sem_seg.py#L46C13-L46C32

This part of the code should be device=params.main.device

Run HOV-SG on own datassets

I want to run HOV-SG on own dataset.

I controlled simulated robot and gathered RGB images, depth images and pos. Than - reformatted this data in the structure, similar to hm3dsem dataset.
I've created 2 different datasets:

with low-res RGB and depth images (640x480) and
high-res RGB and depth images (1080x720)

HOV-SG failed to build a proper Scene Graphs on those datasets (I will post full logs and results for each dataset in comments to this issue).

Command I've used for graph generation:

python application/create_graph.py main.dataset=<path to my dataset> main.save_path=<output path>

Config:

main:
  device: cuda
  dataset: hm3dsem # scannet, replica
  scene_id: 00824-Dd4bFSTQ8gi # scene0011_00
  dataset_path: /data/buechner/hm3dsem_walks/val/00824-Dd4bFSTQ8gi
  save_path: /data/buechner/hovsg/
models:
  clip:
    type:  ViT-H-14 # ViT-L/14@336px # ViT-H-14
    checkpoint: checkpoints/laion2b_s32b_b79k.bin 
    # checkpoint: checkpoints/ovseg_clipl14_9a1909.pth checkpoints/laion2b_s32b_b79k.bin
  sam:
    checkpoint: checkpoints/sam_vit_h_4b8939.pth
    type: vit_h
    points_per_side: 12
    pred_iou_thresh: 0.88
    points_per_batch: 144
    crop_n_layers: 0
    stability_score_thresh: 0.95
    min_mask_region_area: 100
pipeline:
  create_graph: True
  voxel_size: 0.02
  skip_frames: 10
  init_overlap_thresh: 0.75
  overlap_thresh_factor: 0.025
  iou_thresh: 0.05
  clip_masked_weight: 0.4418
  clip_bbox_margin: 50 # in pixels
  feature_dbscan_eps: 0.01
  max_mask_distance: 10000 # 6.4239 in meters
  min_pcd_points: 100
  depth_weighting: false
  grid_resolution: 0.05
  merge_type: hierarchical # hierarchical, sequential
  save_intermediate_results: true
  obj_labels: HM3DSEM_LABELS
  merge_objects_graph: false

Could you help identifying possible problems and give some hints how I can debug the issue?
Details will be in comments to this issue.

Data examples.

RGB:

Depth:

Pos:
27.000001408.txt

Question about the process of collecting RGB-D and pose data in real-world

If I understand correctly, the HOV-SG method requires collecting RGB-D and pose information of scenes before real-world application. However, your paper does not specifically explain the process of data collection. I would like to know whether the locomotion of the robotic dog in the scene depends on manual operation or is implemented through algorithms?

hovsg / hov-sg Goto Github PK

hov-sg's Introduction

HOV-SG

🏗 Setup

Open CLIP

SAM

🖼️ Prepare dataset

Habitat Matterport 3D Semantics

ScanNet

Replica

📂 Datasets file strutcre

🚀 Run

Create Scene Graphs (only for Habitat Matterport 3D Semantics):

Visualize Scene Graphs

Interactive visualization of Scene Graphs with Queries

Setup OpenAI

Evaluate query against pre-built hierarchical scene graph

Extract feature map for Semantic Segmentation (only for ScanNet and Replica)

Evaluate Semantic Segmentation (only for ScanNet and Replica)

Evaluate Scene Graphs (WIP)

📔 Abstract

👩‍⚖️ License

🙏 Acknowledgment

hov-sg's People

Stargazers

Watchers

Forkers

hov-sg's Issues

TL;DR

Steps to reproduce:

Results:

Questions

Command I've used for graph generation:

Config:

Data examples.

Recommend Projects

Recommend Topics

Recommend Org