Code Monkey home page Code Monkey logo

loci-segmented's Introduction

Loci-Segmented: Improving Scene Segmentation Learning

TL;DR: Introducing Loci-Segmented, an extension to Loci, with a dynamic background module. Demonstrates over 32% relative IoU improvement to SOTA on the MOVi dataset.

loci-seg-03.mp4

Requirements

A suitable conda environment named loci-s can be created and activated with:

conda env create -f environment.yml
conda activate loci-s

Dataset and trained models

Preprocessed datasets together with model checkpoints can be found here

Reproducing the results from the paper

Make sure you download all necessary datasets and model checkpoints. To reproduce the MOVi results run:

run-movi-evalulation.sh
python eval-movi.py

To reproduce the evaluation on the datasets presented in the review paper on "Compositional scene representation learning via reconstruction: A survey" run:

run-review.sh
process-review.sh
python eval-review.py

Use your own data

We provide a example dataset creating script that you can adjust to your needs.

You can also inspect any compatible dataset using our Dataset Viewer

data/plot_hdf5.py <dataset>.hdf5

Training Guide

Our training pipeline employs multi-GPU configurations and extensive pretraining to accelerate model convergence. Specifically, we use a single node with 8 x GTX1080 GPUs for the pretraining phase, and a single node with 8 x A100 GPUs for the final Loci-s training. Below are the details for each stage of the training pipeline.

Note: The following examples use a single GPU setup, which is suboptimal for performance. Multi-GPU configurations are highly recommended.

Pretraining Phases

  1. Decoder Pretraining

    Pretrain individual decoders for mask, depth, and RGB using the following commands:

    python -m model.main -cfg configs/pretrain-mask-decoder.json --pretrain-objects --single-gpu
    python -m model.main -cfg configs/pretrain-depth-decoder.json --pretrain-objects --single-gpu
    python -m model.main -cfg configs/pretrain-rgb-decoder.json --pretrain-objects --single-gpu
  2. Encoder-Decoder Pretraining

    Pretrain the Loci encoder with already pretrained mask, depth, and RGB decoders:

    python -m model.main -cfg configs/pretrain-encoder-decoder-stage1.json --pretrain-objects --single-gpu --load-mask <mask-decoder>.ckpt --load-depth <depth-decoder>.ckpt --load-rgb <rgb-decoder>.ckpt

    For a version that utilizes depth as an input feature, append -depth to the config name.

  3. Hyper-Network Pretraining

    Execute three passes through the encoder-decoder architecture to train the internal hyper-networks:

    python -m model.main -cfg configs/pretrain-encoder-decoder-stage2.json --pretrain-objects --single-gpu --load-stage1 <encoder-decoder>.ckpt
  4. Background Module Pretraining

    Train the background module:

    python -m model.main -cfg configs/pretrain-background.json --pretrain-bg --single-gpu

Final Training: Loci-s

Execute full-scale training for Loci-s:

python -m model.main -cfg configs/loci-s.json --train --single-gpu --load-objects <encoder-decoder>.ckpt --load-bg <background>.ckpt

Visualization Guide

Generate visualizations to inspect the model at various stages of pretraining and during the final phase.

Pretraining Visualizations

To visualize individual components like mask, depth, RGB, objects, or background during pretraining:

python -m model.main -cfg <config> --save-<mask|depth|rgb|objects|bg> --single-gpu --add-text --load <checkpoint>.ckpt

Final Model Visualizations

For visualizing the fully trained Loci-s model:

python -m model.main -cfg <config> --save --single-gpu --add-text --load <checkpoint>.ckpt

Note: To visualize using the segmentation pretraining network, append the --load-proposal flag followed by the corresponding checkpoint:

--load-proposal <proposal>.ckpt

loci-segmented's People

Contributors

manueltraub avatar fredeee avatar

Stargazers

Jan avatar

Watchers

Martin Butz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.