This repository contains the PyTorch implementation of the CVPR'2024 paper (Highlight), IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection.
Issue Description:
Hi there, thank you for your exceptional code implementation. However, I've encountered an issue regarding the absence of the multi_scale_deformable_attn_function module within the middle encoders. This module should ideally include MultiScaleDeformableAttnFunction_fp32 and ms_deform_attn_core_pytorch, which are crucial for running and reproducing the performance reported in the article. This missing module significantly hampers the ability to achieve the expected results and needs to be addressed for effective utilization of the codebase.
Thank you very much for sharing your excellent work! I 'd like to record some implementation details.
Some files may be missing in mmdet3D/models.
I've tried to supplement them with the files from the autoalignv2 project.
In mmdet3D/models./init.py, I had to delete line 20 (from .vtransforms import *) to make sure it can successfully run.
However, I still have some questions:
In the project, I can't find the Point-to-Grid Transformer module you've mentioned in your paper.
How can you get the checkpoint of the baseline Transfusion which can achieve 65.1mAP? Did you train it from scratch? The checkpoint (IS-Fusion_epoch_10.pth) you provided seems like final weights rather than a pre-trained model.
I'm sorry for taking up your time, and I hope to receive your reply.