rmn,tgc1997

Could you upload your faster rcnn code to extract region features for my own data?

The link of visual and text features cannot be opened

Hi,The link is lost!

POS

您好！我想请教一下您这里的POS词性是怎么获得的，我看到只有0，1，2这三种类型，请问他们分别代表什么词性？谢谢！

Hi,
I see you using the 2D CNN features (1536 dim), 3D CNN features (1024 dim), RCNN features (2048 dim). I also see something called spatial features of 5 dimensions. What are these features? I could not find them mentioned anywhere in the paper?

How to download the dataset without an account?

Would I ask one question?

File "/RMN-master/models/allennlp_beamsearch.py", line 257, in search
state_tensor.reshape(batch_size, self.beam_size, *last_dims)
RuntimeError: gather_out_cuda(): Expected dtype int64 for index

Is to use the test set to save the best model？

How to get my own extracted-features?

Hi tgc, I'd like to test this model on my own video. How could I get the extracted features as inputs?

a problem about region_feature file

the shape of sfeats of msvd_region_feature.h5 is 1970 x 26 x 36 x 5,
what's the meaning of the last dimensions?thank you!

a problem about features

Hello, may I ask what the method do you use to extract features and regional features from videos?Thank you

Exception: Model not supported: RMN

when I run train.py,the following error appears：

A refinement report

RMN/train.py

Line 128 in 14a9eff

loss_count /= 10 if bsz == opt.train_batch_size else i % 10

Hi, Ganchao. I found the above judgement may miss some conditions during executing the project.
e.g. When the train_batch_size is set to 2 or 3, the step of the train_loader is 24390 (48779/2=24389.5) and 16260 (48779/3=16259.67) respectively. Here 48779 is the total number of samples for MSVD dataset. Note that the division operation is not completed. It means there are only 1 or 2 samples in the 24390th or 16260th step. And it doesn't meet the condition, bsz == opt.train_batch_szie. so the loss_count will be divided by 0 (i % 10). Ooops! : (
It could be refined like followings:

if bsz == opt.train_batch_size:
    loss_count /= 10
elif bsz < opt.train_batch_size and i % 10 == 0:
    loss_count /= 10
else:
    loss_count /= i % 10

The project on my server restart again now. If it still works well after executing one epoch, I will come back to report.

evaluate.py vs train.py

Hi, can i check whats the difference between the evaluate.py and train.py? tyvm

When I tried to run evaluatie.py it reported the function incorrectly

(rmn) E:\video_caption\rmn\RMN-master>python evaluate.py --dataset=msvd --model=RMN --result_dir=results/msvd_model --attention=gumbel --use_loc --use_rel --use_func --hidden_size=512 --att_size=512 --test_batch_size=2 --beam_size=
2 --eval_metric=CIDEr
335it [01:21, 4.13it/s]
init COCO-EVAL scorer
tokenization...
Traceback (most recent call last):
File "evaluate.py", line 107, in
metrics = evaluate(opt, net, opt.test_range, opt.test_prediction_txt_path, reference)
File "evaluate.py", line 75, in evaluate
scores, sub_category_score = scorer.score(reference, prediction_json, prediction_json.keys())
File "./caption-eval\cocoeval.py", line 64, in score
print('tokenization...')
OSError: [WinError 1] 函数不正确。

Inference on custom raw video

Hey @tgc1997
Thanks for providing the implementations of such an awesome work!!!

I wanted to know how does one go about using the pre-trained models for inferencing on raw custom videos?

text feature processing

Sir, can you share the link to process the text features and from which you have generated a caption.pkl file.

Problem about Feature Extraction

How to apply the code to my own dataset? Could you please provide the code about feature extraction?

result reproduce for msr-vtt dataset

Hi, Ganchao!
i have difficulty in reprodcing the experiment results for msr-vtt.
i have executed the project on msr-vtt several times and always got unideal results.
the cider scores just fluctuate from 45 to 46.5 which is far from the results i.e. 49.6 reported in the paper.
would it be convenient for u to share the random seed values set in ur experiments for msr-vtt with me?
training on msr-vtt is too time-consuming, 6 days or so when using a single gpu.
looking forward to ur help, thanks!

FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'

Obtained this error when running evaluate.py and train.py
May I know how to solve this issue?

hi! would like to know how to resolve the following issue

Question about training time,thanks

I use 8 GPU with 32batchsize, I trained 3epoch whiching need 11 hours.
how long did you use to train 20epoch
thanks for your work!

a bug report

hi! Ganchao here is a bug report.
i find out an error which might lead to inaccurate model reproduction or training results when i debug and reproduce the project.
the att_size=1024 set in run command console does not work. the reason is as follows:
despite the initial att_size parameter inside in initialize function of the SoftAttention & GumbelAttention class is opt.att_size(=1024), all of the att_size parameters within the reference instances actually are opt.hidden_size (=512 for msvd dataset / =1300 for msr-vtt dataset).
related code lines:

RMN/models/RMN.py

Line 18 in 14a9eff

def __init__(self, feat_size, hidden_size, att_size):

RMN/models/RMN.py

Line 45 in 14a9eff

def __init__(self, feat_size, hidden_size, att_size):

RMN/models/RMN.py

Line 171 in 14a9eff

    
           self.spatial_attn = SoftAttention(opt.region_projected_size, opt.hidden_size, opt.hidden_size)

RMN/models/RMN.py

Line 175 in 14a9eff

self.temp_attn = SoftAttention(feat_size, opt.hidden_size, opt.hidden_size)

RMN/models/RMN.py

Line 207 in 14a9eff

    
           self.spatial_attn = SoftAttention(region_feat_size, opt.hidden_size, opt.hidden_size)

RMN/models/RMN.py

Line 211 in 14a9eff

    
           self.relation_attn = SoftAttention(2*feat_size, opt.hidden_size, opt.hidden_size)

RMN/models/RMN.py

Line 245 in 14a9eff

    
           self.cell_attn = SoftAttention(opt.hidden_size, opt.hidden_size, opt.hidden_size)

RMN/models/RMN.py

Line 285 in 14a9eff

    
           self.module_attn = SoftAttention(opt.hidden_size, opt.hidden_size, opt.hidden_size)

RMN/models/RMN.py

Line 287 in 14a9eff

    
           self.module_attn = GumbelAttention(opt.hidden_size, opt.hidden_size, opt.hidden_size)

if this parameter att_size do work as our expected, the opt.hidden_size in the above code lines should be replaced by opt.att_size. is it right ?
thanks!

The link of visual and text features cannot be opened

Please provide a valid link. Thanks

Some questions about hidden size.

According to the code, the hidden size is set to 1300 instead of widely-used 1024 or 2048. What is the main concern in this point?

the mismatch error happened when using the pretarined model you provide.

awesome work!
when i reproduce the results you report in this repository (i.e. cider metric score is 97.8 on msvd dataset), errors indicating size mismatch for the whole Capmodel occurred as running evaluate.py with your pretrained file results/msvd_model/msvd_best_cider.pth.
e. g.
Runtime error: Error(s) in loading state_dictionary for CapModel:
size mismatch for encoder.bi_lstm1.weight_it_l0: copying a parameters with shape torch.Size([2048,1000]) from checkpoint, the shape in current model is torch.Size([5200,1000]).
size mismatch ……
size mismatch ……
it seems like you have modified the model while don't update the msvd_best_cider.pth.
if you do so please let me know
and i would appreciate it if you provide the new version PTH file so that i can reproduce the results you report in this repository.
by the way why the final high results was not published in the paper?
thanks!

TypeError: h5py objects cannot be pickled

When I try to run evaluate.py, I ran into an error. I tried some methods but couldn't solve the problem. I hope I can get your help.

hi! would like to know how to get these

a problem about sample.py

when I run sample.py line 102, in
net.load_state_dict(torch.load(opt.model_pth_path))
RuntimeError: Error(s) in loading state_dict for CapModel:
Unexpected key(s) in state_dict: "decoder.module_selection.loc_fc.weight", "decoder.module_selection.loc_fc.bias", "decoder.module_selection.rel_fc.weight", "decoder.module_selection.rel_fc.bias", "decoder.module_selection.func_fc.weight", "decoder.module_selection.func_fc.bias", "decoder.module_selection.module_attn.wh.weight", "decoder.module_selection.module_attn.wh.bias", "decoder.module_selection.module_attn.wv.weight", "decoder.module_selection.module_attn.wv.bias", "decoder.module_selection.module_attn.wa.weight".
can you help me thank you very much

a problem about msr-vtt_model.pth

Which directory is this msr-vtt_model.pth in?

problems about feature extraction models

Hi,tgc! I tried using Torch's fasterrcnn_resnet50_fpn pre-trained model to extract the region_features of the video, but found that the feature shapes I extracted were only [823, 4], which is far from [26, 36, 2048] and [26, 36, 5] in the dataset you provided. What does the extra dimension mean, or what do these three dimensions mean respectively?
I wonder that is it feasible to use Torchvision's fasterrcnn_resnet50_fpn model to extract features without using caffe's Fast R-CNN model?The sizeof features extracted using Torchvision's fasterrcnn_resnet50_fpn model is significantly insufficient.How can I extract more features and accurate feature dimensions that meet the requirements?

tgc1997 / rmn Goto Github PK

rmn's People

Contributors

Stargazers

Watchers

Forkers

rmn's Issues

Recommend Projects

Recommend Topics

Recommend Org