Loci-Segmented: Improving Scene Segmentation Learning

TL;DR: Introducing Loci-Segmented, an extension to Loci, with a dynamic background module. Demonstrates over 32% relative IoU improvement to SOTA on the MOVi dataset.

loci-seg-03.mp4

Requirements

A suitable conda environment named loci-s can be created and activated with:

conda env create -f environment.yml
conda activate loci-s

Dataset and trained models

Preprocessed datasets together with model checkpoints can be found here

Reproducing the results from the paper

Make sure you download all necessary datasets and model checkpoints. To reproduce the MOVi results run:

run-movi-evalulation.sh
python eval-movi.py

To reproduce the evaluation on the datasets presented in the review paper on "Compositional scene representation learning via reconstruction: A survey" run:

run-review.sh
process-review.sh
python eval-review.py

Use your own data

We provide a example dataset creating script that you can adjust to your needs.

You can also inspect any compatible dataset using our Dataset Viewer

data/plot_hdf5.py <dataset>.hdf5

Training Guide

Our training pipeline employs multi-GPU configurations and extensive pretraining to accelerate model convergence. Specifically, we use a single node with 8 x GTX1080 GPUs for the pretraining phase, and a single node with 8 x A100 GPUs for the final Loci-s training. Below are the details for each stage of the training pipeline.

Note: The following examples use a single GPU setup, which is suboptimal for performance. Multi-GPU configurations are highly recommended.

Pretraining Phases

Decoder Pretraining

Pretrain individual decoders for mask, depth, and RGB using the following commands:

python -m model.main -cfg configs/pretrain-mask-decoder.json --pretrain-objects --single-gpu
python -m model.main -cfg configs/pretrain-depth-decoder.json --pretrain-objects --single-gpu
python -m model.main -cfg configs/pretrain-rgb-decoder.json --pretrain-objects --single-gpu

Encoder-Decoder Pretraining

Pretrain the Loci encoder with already pretrained mask, depth, and RGB decoders:

python -m model.main -cfg configs/pretrain-encoder-decoder-stage1.json --pretrain-objects --single-gpu --load-mask <mask-decoder>.ckpt --load-depth <depth-decoder>.ckpt --load-rgb <rgb-decoder>.ckpt

For a version that utilizes depth as an input feature, append -depth to the config name.

Hyper-Network Pretraining

Execute three passes through the encoder-decoder architecture to train the internal hyper-networks:

python -m model.main -cfg configs/pretrain-encoder-decoder-stage2.json --pretrain-objects --single-gpu --load-stage1 <encoder-decoder>.ckpt

Background Module Pretraining

Train the background module:

python -m model.main -cfg configs/pretrain-background.json --pretrain-bg --single-gpu

Final Training: Loci-s

Execute full-scale training for Loci-s:

python -m model.main -cfg configs/loci-s.json --train --single-gpu --load-objects <encoder-decoder>.ckpt --load-bg <background>.ckpt

Visualization Guide

Generate visualizations to inspect the model at various stages of pretraining and during the final phase.

Pretraining Visualizations

To visualize individual components like mask, depth, RGB, objects, or background during pretraining:

python -m model.main -cfg <config> --save-<mask|depth|rgb|objects|bg> --single-gpu --add-text --load <checkpoint>.ckpt

Final Model Visualizations

For visualizing the fully trained Loci-s model:

python -m model.main -cfg <config> --save --single-gpu --add-text --load <checkpoint>.ckpt

Note: To visualize using the segmentation pretraining network, append the --load-proposal flag followed by the corresponding checkpoint:

--load-proposal <proposal>.ckpt

cognitivemodeling / loci-segmented Goto Github PK

loci-segmented's Introduction

Loci-Segmented: Improving Scene Segmentation Learning

Requirements

Dataset and trained models

Reproducing the results from the paper

Use your own data

Training Guide

Pretraining Phases

Final Training: Loci-s

Visualization Guide

Pretraining Visualizations

Final Model Visualizations

loci-segmented's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent