Comments (14)
It seems that your feature size is also 512 that you also need to extract slowfast feature
from qd-detr.
Sorry for the inconvenience.
I think that the pretrained weight is from Moment-DETR not from our GitHub repository.
Can you try again with the weights provided in our repository?
Video only weights : https://www.dropbox.com/s/yygwyljw8514d9r/videoonly.ckpt?dl=0
V + A weights : https://www.dropbox.com/s/hsc7jk21ppqasjt/videoaudio.ckpt?dl=0
from qd-detr.
Thank your reply. I use videoaudio.ckpt, get the error:
File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 1483, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for QDDETR:
size mismatch for input_vid_proj.0.LayerNorm.weight: copying a param with shape torch.Size([4868]) from checkpoint, the shape in current model is torch.Size([2818]).
size mismatch for input_vid_proj.0.LayerNorm.bias: copying a param with shape torch.Size([4868]) from checkpoint, the shape in current model is torch.Size([2818]).
size mismatch for input_vid_proj.0.net.1.weight: copying a param with shape torch.Size([256, 4868]) from checkpoint, the shape in current model is torch.Size([256, 2818]).
from qd-detr.
Can you try with the checkpoint trained only with video?
To use the video+audio checkpoint, you may have to change some code and your dataset to have extracted audio features.
from qd-detr.
I have tried whe the checkpoint trained only with video: videoonly.ckpt, but error still happen。The shape of the model and the weights not match.
File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/normalization.py", line 190, in forward
input, self.normalized_shape, self.weight, self.bias, self.eps)
File "/usr/local/lib64/python3.6/site-packages/torch/nn/functional.py", line 2347, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Given normalized_shape=[2818], expected input with shape [*, 2818], but got input of size[1, 75, 514]
from qd-detr.
If you see the given train script, shape of features should be 2304(slowfast)+512(clip).
It looks like you only have clip features.
from qd-detr.
I also have an error when running run_on_video/run.py. I have used both videoonly.ckpt (https://www.dropbox.com/s/yygwyljw8514d9r/videoonly.ckpt?dl=0) and video_model_best.ckpt (run_on_video/qd_detr_ckpt/)
Error logs are below:
File "run_on_video/run.py", line 126, in
run_example()
File "run_on_video/run.py", line 109, in run_example
predictions = qd_detr_predictor.localize_moment(
File "/home/ubuntu/projects/moment-retrieval/envs/moment-detr/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "run_on_video/run.py", line 57, in localize_moment
outputs = self.model(**model_inputs)
File "/home/ubuntu/projects/moment-retrieval/envs/moment-detr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/projects/moment-retrieval/QD-DETR/qd_detr/model.py", line 110, in forward
src_vid = self.input_vid_proj(src_vid)
File "/home/ubuntu/projects/moment-retrieval/envs/moment-detr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/projects/moment-retrieval/envs/moment-detr/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/ubuntu/projects/moment-retrieval/envs/moment-detr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/projects/moment-retrieval/QD-DETR/qd_detr/model.py", line 505, in forward
x = self.LayerNorm(x)
File "/home/ubuntu/projects/moment-retrieval/envs/moment-detr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, **kwargs)
File "/home/ubuntu/projects/moment-retrieval/envs/moment-detr/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 189, in forward
return F.layer_norm(
File "/home/ubuntu/projects/moment-retrieval/envs/moment-detr/lib/python3.8/site-packages/torch/nn/functional.py", line 2503, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Given normalized_shape=[2818], expected input with shape [, 2818], but got input of size[1, 75, 514]
from qd-detr.
I have the same issue. I believe the script written on the repo should not produce this error if used as is.
from qd-detr.
Hello.
For all of you in this thread, thank you for your interest, and sorry for the inconvenience.
I'll let you know through this thread when the model checkpoint trained only with CLIP features is ready.
Thanks.
from qd-detr.
We've uploaded pretrained model only trained with CLIP features to support run on video.
You may try an example with it!
Thank you.
from qd-detr.
Which one is it ?
from qd-detr.
model_best.ckpt is the model trained with only Clip features.
from qd-detr.
It now works thanks. I suggest to change the default model used on master.
from qd-detr.
Thank you for the suggestion.
Do you mean to change the default loaded model in run_on_video/run.py?
from qd-detr.
Related Issues (20)
- At training, RuntimeError: The size of tensor a (148) must match the size of tensor b (150) at non-signleton dimesnion 1
- Confusion about the code HOT 1
- TypeError: unsupported operand type(s) for /: 'dict' and 'float' HOT 4
- The I3D features about Charades-STA HOT 3
- Training Machine Question HOT 1
- Charades dataset feature
- Training on Charades-STA dataset with VGG backbone HOT 10
- Fail to download TVsum dataset and could you please provide a new link? HOT 2
- TVSUM Result HOT 2
- TVSUM data issue HOT 4
- SharePoint: That didn't work - user cannot be found in the directory HOT 1
- About run_on_videos HOT 6
- Use videoonly.ckpt HOT 5
- The hyper parameters of using SF+CLIP features on Charades-STA HOT 9
- official feature files for QVHighlights dataset HOT 3
- About ablation study HOT 1
- Pretraining Modules with Contrastive Learning? HOT 1
- With the same seed, the set of eval_epoch can really influence the performance of model! Why? HOT 5
- About saliency score HOT 2
- Detection score of the segment HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qd-detr.