Code Monkey home page Code Monkey logo

ssms_event_cameras's Introduction

State Space Models for Event Cameras (Spotlight)

youtube_video

This is the official PyTorch implementation of the CVPR 2024 paper State Space Models for Event Cameras.

🖼️ Check Out Our Poster! 🖼️ here

✅ Updates

  • June. 14th, 2024: Everything is updated! Poster released! Check it above.
  • June. 6st, 2024: Video released! To watch our video, simply click on the YouTube play button above.
  • June. 1st, 2024: Our CVPR conference paper has also been accepted as a Spotlight presentation at "The 3rd Workshop on Transformers for Vision (T4V)."
  • April. 19th, 2024: The code along with the best checkpoints is released! The poster and video will be released shortly before CVPR 2024.

Citation

If you find this work and/or code useful, please cite our paper:

@InProceedings{Zubic_2024_CVPR,
  author  = {Zubi\'c, Nikola and Gehrig, Mathias and Scaramuzza, Davide},
  title   = {State Space Models for Event Cameras},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year    = {2024},
}

SSM-ViT

  • S5 model used in our SSM-ViT pipeline can be seen here.
  • In particular, S5 is used instead of RNN in a 4-stage hierarchical ViT backbone, and its forward function is exposed here. What is nice about this approach is that we do not need a 'for' loop over sequence dimension, but instead we employ a parallel scanning algorithm. This model assumes that a hidden state is being carried over.
  • For a model that is standalone, and can be used for any sequence modeling problem, one does not use by default this formulation where we carry on the hidden state. The implementation is the same as the original JAX implementation and can be downloaded in zip format from ssms_event_cameras/RVT/models/s5.zip.

Installation

Conda

We highly recommend using Mambaforge to reduce the installation time.

conda create -y -n events_signals python=3.11
conda activate events_signals
conda install pytorch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install lightning wandb pandas plotly opencv-python tabulate pycocotools bbox-visualizer StrEnum hydra-core einops torchdata tqdm numba h5py hdf5plugin lovely-tensors tensorboardX pykeops scikit-learn          

Required Data

To evaluate or train the S5-ViT model, you will need to download the required preprocessed datasets:

1 Mpx Gen1
pre-processed dataset download download
crc32 c5ec7c38 5acab6f3

You may also pre-process the dataset yourself by following the instructions.

Pre-trained Checkpoints

1 Mpx

S5-ViT-Base S5-ViT-Small
pre-trained checkpoint download download

Gen1

S5-ViT-Base S5-ViT-Small
pre-trained checkpoint download download

Evaluation

  • Evaluation scripts with concrete parameters that we trained our models can be seen here.

  • Set DATA_DIR as the path to either the 1 Mpx or Gen1 dataset directory

  • Set CKPT_PATH to the path of the correct checkpoint matching the choice of the model and dataset

  • Set

    • MDL_CFG=base or
    • MDL_CFG=small

    to load either the base or small model configuration.

  • Set GPU_ID to the PCI BUS ID of the GPU that you want to use. e.g. GPU_ID=0. Only a single GPU is supported for evaluation

1 Mpx

python RVT/validation.py dataset=gen4 dataset.path=${DATA_DIR} checkpoint=${CKPT_PATH} \
use_test_set=1 hardware.gpus=${GPU_ID} +experiment/gen4="${MDL_CFG}.yaml" \
batch_size.eval=12 model.postprocess.confidence_threshold=0.001

Gen1

python RVT/validation.py dataset=gen1 dataset.path=${DATA_DIR} checkpoint=${CKPT_PATH} \
use_test_set=1 hardware.gpus=${GPU_ID} +experiment/gen1="${MDL_CFG}.yaml" \
batch_size.eval=8 model.postprocess.confidence_threshold=0.001

We set the same batch size for the evaluation and training: 12 for the 1 Mpx dataset, and 8 for the Gen1 dataset.

Evaluation results

Evaluation should give the same results as shown below:

  • 47.7 and 47.8 mAP on Gen1 and 1 Mpx datasets for the base model, and
  • 46.6 and 46.5 mAP on Gen1 and 1 Mpx datasets for the small model.

Training

  • Set DATA_DIR as the path to either the 1 Mpx or Gen1 dataset directory

  • Set

    • MDL_CFG=base or
    • MDL_CFG=small

    to load either the base or the small configuration.

  • Set GPU_IDS to the PCI BUS IDs of the GPUs that you want to use. e.g. GPU_IDS=[0,1] for using GPU 0 and 1. Using a list of IDS will enable single-node multi-GPU training. Pay attention to the batch size which is defined per GPU.

  • Set BATCH_SIZE_PER_GPU such that the effective batch size is matching the parameters below. The effective batch size is (batch size per GPU)*(number of GPUs).

  • If you would like to change the effective batch size, we found the following learning rate scaling to work well for all models on both datasets:

    lr = 2e-4 * sqrt(effective_batch_size/8).

  • The training code uses W&B for logging during the training. Hence, we assume that you have a W&B account.

    • The training script below will create a new project called ssms_event_cameras. Adapt the project name and group name if necessary.

1 Mpx

  • The effective batch size for the 1 Mpx training is 12.
  • For training the model on 1 Mpx dataset, we need 2x A100 80 GB GPUs and we use 12 workers per GPU for training and 4 workers per GPU for evaluation:
GPU_IDS=[0,1]
BATCH_SIZE_PER_GPU=6
TRAIN_WORKERS_PER_GPU=12
EVAL_WORKERS_PER_GPU=4
python RVT/train.py model=rnndet dataset=gen4 dataset.path=${DATA_DIR} wandb.project_name=ssms_event_cameras \
wandb.group_name=1mpx +experiment/gen4="${MDL_CFG}.yaml" hardware.gpus=${GPU_IDS} \
batch_size.train=${BATCH_SIZE_PER_GPU} batch_size.eval=${BATCH_SIZE_PER_GPU} \
hardware.num_workers.train=${TRAIN_WORKERS_PER_GPU} hardware.num_workers.eval=${EVAL_WORKERS_PER_GPU}

If you for example want to execute the training on 4 GPUs simply adapt GPU_IDS and BATCH_SIZE_PER_GPU accordingly:

GPU_IDS=[0,1,2,3]
BATCH_SIZE_PER_GPU=3

Gen1

  • The effective batch size for the Gen1 training is 8.
  • For training the model on the Gen1 dataset, we need 1x A100 80 GPU using 24 workers for training and 8 workers for evaluation:
GPU_IDS=0
BATCH_SIZE_PER_GPU=8
TRAIN_WORKERS_PER_GPU=24
EVAL_WORKERS_PER_GPU=8
python RVT/train.py model=rnndet dataset=gen1 dataset.path=${DATA_DIR} wandb.project_name=ssms_event_cameras \
wandb.group_name=gen1 +experiment/gen1="${MDL_CFG}.yaml" hardware.gpus=${GPU_IDS} \
batch_size.train=${BATCH_SIZE_PER_GPU} batch_size.eval=${BATCH_SIZE_PER_GPU} \
hardware.num_workers.train=${TRAIN_WORKERS_PER_GPU} hardware.num_workers.eval=${EVAL_WORKERS_PER_GPU}

Code Acknowledgments

This project has used code from the following projects:

  • RVT - Recurrent Vision Transformers for Object Detection with Event Cameras in PyTorch
  • S4 - Structured State Spaces for Sequence Modeling, in particular S4 and S4D models in PyTorch
  • S5 - Simplified State Space Layers for Sequence Modeling in JAX
  • S5 PyTorch - S5 model in PyTorch

ssms_event_cameras's People

Contributors

davsca avatar nikolazubic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ssms_event_cameras's Issues

Questions about the train time

Hi @NikolaZubic ,
Thanks for your wonderful job in event cameras.

Recently, I have been trying to replicate your work. I have installed the corresponding PyTorch and other libraries according to your readme file. However, with four 4090s, the training speed on the dataset gen1 is only 0.8 it/s. Is this speed normal?
What is the training speed on an A100?

Following is my train script.

#!/bin/bash
DATA_DIR="/home/data/gen1"
MDL_CFG=small
GPU_IDS=[0,1,2,3]
BATCH_SIZE_PER_GPU=4
TRAIN_WORKERS_PER_GPU=8
EVAL_WORKERS_PER_GPU=4
python RVT/train.py model=rnndet dataset=gen1 dataset.path=${DATA_DIR} wandb.project_name=ssms_event_cameras \
wandb.group_name=gen1 +experiment/gen1="${MDL_CFG}.yaml" hardware.gpus=${GPU_IDS} \
batch_size.train=${BATCH_SIZE_PER_GPU} batch_size.eval=${BATCH_SIZE_PER_GPU} \
hardware.num_workers.train=${TRAIN_WORKERS_PER_GPU} hardware.num_workers.eval=${EVAL_WORKERS_PER_GPU}

Questions about the paper

Hi @NikolaZubic

Hello, recently I carefully studied the paper and source code. There are a few aspects I don't quite understand.

Firstly, in the process of Output masking, why is only 'C' masked and not 'A' and 'B'?

Secondly, in your code, if only adjusting the 'step_scale', the mask of 'C' remains unaffected.

            step = step_scale * torch.exp(self.log_step)

            freqs = step / step_scale * self.Lambda[:, 1].abs() / (2 * math.pi)
            mask = torch.where(freqs < bandlimit * 0.5, 1, 0)  # (64, )

Since the freqs are not linked with the value of step_scale. (freqs = step / step_scale * self.Lambda[:, 1].abs() / (2 * math.pi) -> freqs = torch.exp(self.log_step) * self.Lambda[:, 1].abs() / (2 * math.pi)). Is the code wrong?

Question 2 is accompanied by another question, Question 3. In the paper, it's only mentioned that generalization from low frequency to high frequency is achieved by masking 'C', but in the code, the discretization of 'A' and 'B' is also related to 'step_scale'. So, I wonder, for generalization from low frequency to high frequency, is it necessary to adjust all three values 'A', 'B', and 'C'?

       if not torch.is_tensor(step_scale) or step_scale.ndim == 0:
            # step_scale = torch.ones(signal.shape[-2], device=signal.device) * step_scale
            step = step_scale * torch.exp(self.log_step)

FLOPs

Thank you for your wonderful work. How can I calculate the FLOPs of the model?

evaluation time question

Hello! I have been interested in RVT for a long time . I think you have done a great job! I have a question and I hope you can help me answer it. After I reproduce RVT, I have the following reasoning time:
Loading and preparing results DONE (t=0.41s) creating index Index created!
Running per image evaluation Evaluate annotation type * bbox * DONE (t=7.23s)
Accumulating evaluation results DONE (t=2.31s)
May I ask how to infer the time velocity in ms, just like 7.81 ms in your paper on the gen1 dataset.
I am using a 3090TIGPU with a batch_size of 1, so what should be the speed in ms?

Reproduced results differ from paper and training time inquiry

Hi, thank you for publishing great work!

I have been trying to reproduce the results on the Gen1 dataset, but my findings are significantly different from those reported in the paper. Below are the details of my experiment setup and the results I obtained.

Environment Configuration:

GPU_IDS: [0,1]         
BATCH_SIZE_PER_GPU: 4  
TRAIN_WORKERS_PER_GPU: 12
EVAL_WORKERS_PER_GPU: 4
DATA_DIR: /root/data/gen1
MDL_CFG: base

I have used an A6000 40GB GPU (two units) instead of the A100 80GB (one unit) as mentioned in the GitHub repository. Since the total memory is the same, I divided the configurations accordingly to allocate the memory across the two GPUs.

I have not modified anything in RVT/config/experiment/gen1/default.yaml.

Observed Results:
During training, I noticed that the max_step was set to 400,000, causing the process to stop after one epoch. The total training time was approximately 67 hours (2 days and 19 hours).

Here are the validation metrics I obtained:

- val/AP: 0.4406683404220787
- val/AP_50: 0.6706345137471073
- val/AP_75: 0.47589861083495194
- val/AP_L: 0.3979635967270493
- val/AP_M: 0.5222804973442176
- val/AP_S: 0.3784906607967766

These results are considerably lower than those reported in the paper.

Questions:
1. Why are the results different from the paper?
2. Is the training time normal? Is it expected to only run for 1 epoch?

Additionally, in the supplementary material, there are detection results images for the DSEC dataset. Could you please let me know if there are any available results or code for training on the DSEC dataset?

Thank you for your assistance.

Testing in a higher frequency

Hi @NikolaZubic

Thanks for your nice work and opening source!
I have a question when testing in a higher frequency, which parameters we should change?

I have found two places about the 'step_scale':

class S5SSM(torch.nn.Module):
    def __init__(
        self,
        lambdaInit: torch.Tensor,
        V: torch.Tensor,
        Vinv: torch.Tensor,
        h: int,
        p: int,
        dt_min: float,
        dt_max: float,
        liquid: bool = False,
        factor_rank: Optional[int] = None,
        discretization: Literal["zoh", "bilinear"] = "bilinear",
        bcInit: Initialization = "factorized",
        degree: int = 1,
        bidir: bool = False,
        step_scale: float = 1.0,
        bandlimit: Optional[float] = None,
    ):

and

    def forward(self, signal, prev_state, step_scale: float | torch.Tensor = 1.0): 

in S5SSM moudle.

If I want two testing in frequecy 200Hz (training in frequency 20), how should I adjust the parameter?
Should I change the above step_scale from 1.0 to 0.1 both?

Questions about datasets

Hello, I am unable to download the dataset from your server right now. Could you please check if it is a server issue? Thanks!

S5 pytorch license

Hey there,

I'm the author of s5-pytorch, and noticed you used added the code to the project. I'm happy it's been of use to you, but I would appreciate it if the license for that directory was indicated as having original license (MPL-2.0) it had, or a link to the original source so license can be found there.

Thanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.