Code Monkey home page Code Monkey logo

clip2scene's People

Contributors

runnanchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clip2scene's Issues

GPU memory

Thank you for your impressive work and for publishing the code!
How much GPU memory is needed for “Annotation-free” and “Fine-tuning” ?

Question about Semantic-guided Spatial-temporal Consistency Regularization

Thanks for the great work!
I have three questions about Semantic-guided Spatial-temporal Consistency Regularization.

  1. What is the reason for dividing the complete stitched point cloud into regular grids rather than using short-term temporality directly?
  2. What does the symbol * represent in Equation 3? Does it indicate a cross product operation?
  3. It is stated that the image is matched to the first frame of the point cloud $P_1$ using pixel-point correspondences ${\hat{x}i^1, \hat{p}i^1}{i=1}^{\hat{M}}$. This implies that for values of $k$ ranging from 1 to $K$, we have $t{\hat{i}}^k = t_{\hat{i}}^1$ and $\hat{x}{\hat{i}}^k = \hat{x}{\hat{i}}^1$. However, in Equation 4, the text embeddings are denoted as $t_{\hat{i}}^1$, while the image embeddings are denoted as $\hat{x}_{\hat{i}}^{\hat{k}}$. Why is this the case?

Performance on the detection task

Thank you for the great work. Have you tested this method on the 3D object detection task? If not, do you think this method can benefit the 3D object detection task? Thanks.

No such file or directory: './list_keyframes_train.json'

        if phase == "train":
            with open('./list_keyframes_train.json', 'r') as f:
                self.list_keyframes = json.load(f)

            f1 = open('./save_dict_train.json', 'r')
            content = f1.read()
            self.frames_corrs_info = json.loads(content)
            f1.close()
        if phase == "val":
            with open('./list_keyframes_val.json', 'r') as f:
                self.list_keyframes = json.load(f)

            f1 = open('./save_dict_val.json', 'r')
            content = f1.read()
            self.frames_corrs_info = json.loads(content)
            f1.close()
        if phase == "test":
            with open('./list_keyframes_test.json', 'r') as f:
                self.list_keyframes = json.load(f)

            f1 = open('./save_dict_test.json', 'r')
            content = f1.read()
            self.frames_corrs_info = json.loads(content)
            f1.close()

        if phase == "parametrizing":
            with open('./list_keyframes_parametrizing.json', 'r') as f:
                self.list_keyframes = json.load(f)

            f1 = open('./save_dict_parametrizing.json', 'r')
            content = f1.read()
            self.frames_corrs_info = json.loads(content)
            f1.close()
        elif phase == "verifying":
            with open('./list_keyframes_verifying.json', 'r') as f:
                self.list_keyframes = json.load(f)

            f1 = open('./save_dict_verifying.json', 'r')
            content = f1.read()
            self.frames_corrs_info = json.loads(content)
            f1.close()

Can you help me obtain the list_keyframes_train.json file? Thanks!

Hardcoded paths

Thank you releasing the code. The downstream/dataloader_nuscenes.py file has contains multiple hardcoded paths setting class attributes self.list_keyframes and self.frames_corrs_info

How do you generate these json files?

image

Confusion about the calculation of image_pred

Hi, thanks for your great work.

m_pred = tuple(pairing_images.T.long())
image_global_allpoints = image_global.permute(0, 2, 3, 1)[m_pred]
image_pred = image_pred[m_pred] 

However, I have some confusion about the above calculation process (L213-L215 in lightning_trainer.py).

What is the specific meaning of m_pred?

Why the shape of image_global/image_pred and the shape of m_pred are not the same can be done this way? This part of the operation feels puzzling.

If you can provide an answer, thank you very much!

model_points method for nuscenes pretrain

Hi, thanks for the amazing work and sharing the code base. I really appreciate the effort. Can you please let me which model_points network to use between minkunet and voxelnet for the pre-traininng of nuscenes dataset?

Here in the config, its selected as minkunet:

model_points : "minkunet"

However, getting assertion error due to mismatch in sparsetensor type from here:

sparse_input = spvcnn_SparseTensor(sinput_F, sinput_C)

Can you please help? Thank you.

Question about Semantic-guided Spatial-temporal Consistency Regularization

Thanks for your great work! I have some questions about the Semantic-guided Spatial-temporal Consistency Regularization.

If we want to make loss_ssr smaller, we need to make D (P, fn) larger in each grid. According to Formula 3 in your paper, if we want D (p, fn) larger, we need to make b much larger than a. According to Formula 4, That b is much greater than a means that D (p, t) is much greater than D (x, t). I want to know if I understand this correctly, and why it has a constraint effect.

How to obtain the ‘ViT16_clip_weights.pth’ ?

Thank you for this excellent work.

When I pre-training python pretrain.py --cfg_file config/clip2scene_nuscenes_pretrain.yaml , the following error is reported.

FileNotFoundError: [Errno 2] No such file or directory: '/nvme/konglingdong/runnan/CLIP2Scene/ViT16_clip_weights.pth'

How can I obtain the file ViT16_clip_weights.pth'?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.