runnanchen / clip2scene Goto Github PK

View Code? Open in Web Editor NEW

146.0 146.0 6.0 630 KB

License: Other

Python 100.00%

clip2scene's People

Contributors

Stargazers

Watchers

Forkers

whuhxb haitian2du ljwwwiop thehoneymad

clip2scene's Issues

GPU memory

Thank you for your impressive work and for publishing the code!
How much GPU memory is needed for “Annotation-free” and “Fine-tuning” ?

Question about Semantic-guided Spatial-temporal Consistency Regularization

Thanks for the great work!
I have three questions about Semantic-guided Spatial-temporal Consistency Regularization.

What is the reason for dividing the complete stitched point cloud into regular grids rather than using short-term temporality directly?
What does the symbol * represent in Equation 3? Does it indicate a cross product operation?
It is stated that the image is matched to the first frame of the point cloud $P_1$ using pixel-point correspondences ${\hat{x}i^1, \hat{p}i^1}{i=1}^{\hat{M}}$. This implies that for values of $k$ ranging from 1 to $K$, we have $t{\hat{i}}^k = t_{\hat{i}}^1$ and $\hat{x}{\hat{i}}^k = \hat{x}{\hat{i}}^1$. However, in Equation 4, the text embeddings are denoted as $t_{\hat{i}}^1$, while the image embeddings are denoted as $\hat{x}_{\hat{i}}^{\hat{k}}$. Why is this the case?

Performance on the detection task

Thank you for the great work. Have you tested this method on the 3D object detection task? If not, do you think this method can benefit the 3D object detection task? Thanks.

Pre-trained weight

Can you provide the pre-trained model.pt? Thanks!

No such file or directory: './list_keyframes_train.json'

        if phase == "train":
            with open('./list_keyframes_train.json', 'r') as f:
                self.list_keyframes = json.load(f)

            f1 = open('./save_dict_train.json', 'r')
            content = f1.read()
            self.frames_corrs_info = json.loads(content)
            f1.close()
        if phase == "val":
            with open('./list_keyframes_val.json', 'r') as f:
                self.list_keyframes = json.load(f)

            f1 = open('./save_dict_val.json', 'r')
            content = f1.read()
            self.frames_corrs_info = json.loads(content)
            f1.close()
        if phase == "test":
            with open('./list_keyframes_test.json', 'r') as f:
                self.list_keyframes = json.load(f)

            f1 = open('./save_dict_test.json', 'r')
            content = f1.read()
            self.frames_corrs_info = json.loads(content)
            f1.close()

        if phase == "parametrizing":
            with open('./list_keyframes_parametrizing.json', 'r') as f:
                self.list_keyframes = json.load(f)

            f1 = open('./save_dict_parametrizing.json', 'r')
            content = f1.read()
            self.frames_corrs_info = json.loads(content)
            f1.close()
        elif phase == "verifying":
            with open('./list_keyframes_verifying.json', 'r') as f:
                self.list_keyframes = json.load(f)

            f1 = open('./save_dict_verifying.json', 'r')
            content = f1.read()
            self.frames_corrs_info = json.loads(content)
            f1.close()

Can you help me obtain the list_keyframes_train.json file? Thanks!

Hardcoded paths

Thank you releasing the code. The downstream/dataloader_nuscenes.py file has contains multiple hardcoded paths setting class attributes self.list_keyframes and self.frames_corrs_info

How do you generate these json files?

Confusion about the calculation of image_pred

Hi, thanks for your great work.

m_pred = tuple(pairing_images.T.long())
image_global_allpoints = image_global.permute(0, 2, 3, 1)[m_pred]
image_pred = image_pred[m_pred]

However, I have some confusion about the above calculation process (L213-L215 in lightning_trainer.py).

What is the specific meaning of m_pred?

Why the shape of image_global/image_pred and the shape of m_pred are not the same can be done this way? This part of the operation feels puzzling.

If you can provide an answer, thank you very much!

model_points method for nuscenes pretrain

Hi, thanks for the amazing work and sharing the code base. I really appreciate the effort. Can you please let me which model_points network to use between minkunet and voxelnet for the pre-traininng of nuscenes dataset?

Here in the config, its selected as minkunet:

CLIP2Scene/config/clip2scene_nuscenes_pretrain.yaml

Line 29 in f37dae7

model_points : "minkunet"

However, getting assertion error due to mismatch in sparsetensor type from here:

CLIP2Scene/pretrain/lightning_trainer.py

Line 108 in f37dae7

sparse_input = spvcnn_SparseTensor(sinput_F, sinput_C)

Can you please help? Thank you.

When will the code be released？

Thank you for this excellent work, but when will you please public the code?

How to obtain the 'nuscenes_infos_10sweeps_train.pkl'?

Thanks for your interesting work!
May I know how to obtain the /nvme/konglingdong/youquan/nuscenes_infos_10sweeps_train.pkl file?

Questions about the results

Why is the result of Table 1 comparison in the paper inconsistent with the original SLidR paper?

Question about Semantic-guided Spatial-temporal Consistency Regularization

Thanks for your great work! I have some questions about the Semantic-guided Spatial-temporal Consistency Regularization.

If we want to make loss_ssr smaller, we need to make D (P, fn) larger in each grid. According to Formula 3 in your paper, if we want D (p, fn) larger, we need to make b much larger than a. According to Formula 4, That b is much greater than a means that D (p, t) is much greater than D (x, t). I want to know if I understand this correctly, and why it has a constraint effect.

How to obtain the ‘ViT16_clip_weights.pth’ ?

Thank you for this excellent work.

When I pre-training python pretrain.py --cfg_file config/clip2scene_nuscenes_pretrain.yaml , the following error is reported.

FileNotFoundError: [Errno 2] No such file or directory: '/nvme/konglingdong/runnan/CLIP2Scene/ViT16_clip_weights.pth'

How can I obtain the file ViT16_clip_weights.pth'?