cshizhe / asg2cap Goto Github PK

Code accompanying the paper "Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs" (Chen et al., CVPR 2020, Oral).

License: MIT License

Python 100.00%

asg2cap's Introduction

Hi there 👋

asg2cap's People

Contributors

Stargazers

Watchers

asg2cap's Issues

where is the pretrained models and codes to extract features for new images?

Thanks for your great work !
I was interested in your work!

We also provide pretrained models and codes to extract features for new images.

Can you tell me where is the pretrained models and codes to extract features for new images.

Can you tell me the directory format of the dataset

Question about datasets

thank you for your great job sir！ I want to know in this work，can i use myself datasets？thank you！

Could you please tell me the details about how to generate ASG ?

Hello, shiche . Thanks for your great work ! I am interested in your work ! I checked the supplementary materials related to ASG, and I still have some doubts about the implementation details. I don't know how to achieve it if I change the datasets .Could you share the relevant code of Automatic ASG Generation ? Thank you very much.

The mtype=rgcn.flow.memory？

Your thesis is really great and I have some troubles. Excuse me, how to deal with mtype＝rgcn.flow.memory error in config file

I can't download at Baidu.

Can you upload the datasets and weights on google drive?

How to compute Div-n?

Hi, thanks for the awesome work! I'm trying to use your method as a comparison for my own work, and I am confused about the calculation of n-gram diversity (Div-n). It is defined in the paper as "the ratio of distinct n-grams to the total number of words in the best 5 sampled captions".

My questions are:

Does it mean that, for each image, you use 5 different ASGs to obtain 5 captions, calculate a Div-n score over these captions, and then average the Div-n scores overall images in the test set to get the final Div-n score?
How to obtain the best captions?
Which dataset split do you use for evaluating the n-gram diversity?
Would you mind providing me your implementation of the Div-n score?

I would be very happy if you can answer the above questions so that I can make a fair comparison to your work.

Best regards

Caused by: java.io.IOException: Unable to open "edu/stanford/nlp/models/lexp

I always appear this problem,how can i solve it?

For a new image, how do I get its annotated JSON file

Hi, Where is the supplementary material?

In this paper, "The details of automatic ASG generation are provided in the supplementary material.". But I cant find it. Please tell me where is it. THANKS!

How to automatically generate ASG.

I checked the supplementary materials related to ASG, and I still have some doubts about the implementation details. Can you share the relevant code of Automatic ASG Generation. Thank you very mush.

where is the supplement ？

ModuleNotFoundError: No module named 'caption.models.selfcritic'

Link error：ResNet101 pretrained on ImageNet

[ResNet101 pretrained on ImageNet]
The link points to https://pytorch.org/docs/stable/torchvision/models.html.
Please update it

How do you generate scene graph of mscoco?

please give some details about how to generate sg of mscoco,thanks！

Is there a script to generate image captions like Figure 5 in the paper

Thanks for your great work!

I want to know if there is a script to generate image captions like Figure 5 in the paper?

Can we fine tune model on a novel dataset?

Hi! great effort I must appreciate you guys first. I want to make an image captioning model for road accidents. Can you let me know, it it possible to fine tune the model trained on MSCOCO on accident images? If yes, kindly show a pathway as am a beginner in this field.
Thanks a lot in advance.

UnboundLocalError: local variable 'model_cfg' referenced before assignment

Thanks for your great work！
When try to run the code, the following problem occureed:

Could you please tell me how to solve the problem, thanks!

cannot reshape array of size 211979877 into shape(112742,2408)

While training the model, I encountered a problem that I couldn't solve

CPU&GPU

我想请问一下，为什么训练的时候CPU占用率很高接近90%，但是GPU没有用到，torch.cuda.is_available()是True的，希望您有空回复一下。

RuntimeError: CUDA error: device-side assert triggered

Could anyone tell me how to modify the code?

The detailed error information is as follows:

/opt/conda/conda-bld/pytorch_1579022034529/work/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexTypedexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [106,0,0] Assertion indexValue >=xValue < src.sizes[dim] failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1579022034529/work/aten/src/THC/generic/THCTensorScatterline=67 error=710 : device-side assert triggered

Oops! You've reached a dead end.

The above two links cannot be downloaded

Debug Error!

When I run the inference，there will be a "No such file or directory: 'java': 'java'"

After typing the training sentence,an keyerror has been raised for no reason. Please help me with this kind of error

After i download the target data file,the target file has been split into 4 files: objrels1 to objrels4.
The target file path is:
anaconda3/envs/asg2cap/controlimcap/driver/configs/ControllableImageCaption/VisualGenome/ordered_feature/SA/X_101_32x8d/

and the error is like:
python asg2caption.py $resdir/model.json $resdir/path.json $mtype --eval_loss --is_train --num_workers 8
2022-04-17 22:16:55,677 mp_encoder: ft_embed.weight, shape=torch.Size([512, 2560]), num:1310720
2022-04-17 22:16:55,678 mp_encoder: ft_embed.bias, shape=torch.Size([512]), num:512
2022-04-17 22:16:55,678 attn_encoder: attr_order_embeds, shape=torch.Size([20, 2048]), num:40960
2022-04-17 22:16:55,678 attn_encoder: layers.0.loop_weight, shape=torch.Size([2048, 512]), num:1048576
2022-04-17 22:16:55,678 attn_encoder: layers.0.weight, shape=torch.Size([6, 2048, 512]), num:6291456
2022-04-17 22:16:55,678 attn_encoder: layers.1.loop_weight, shape=torch.Size([512, 512]), num:262144
2022-04-17 22:16:55,678 attn_encoder: layers.1.weight, shape=torch.Size([6, 512, 512]), num:1572864
2022-04-17 22:16:55,678 attn_encoder: node_embedding.weight, shape=torch.Size([3, 2048]), num:6144
2022-04-17 22:16:55,678 decoder: embedding.we.weight, shape=torch.Size([11123, 512]), num:5694976
2022-04-17 22:16:55,678 decoder: attn_lstm.weight_ih, shape=torch.Size([2048, 1536]), num:3145728
2022-04-17 22:16:55,678 decoder: attn_lstm.weight_hh, shape=torch.Size([2048, 512]), num:1048576
2022-04-17 22:16:55,679 decoder: attn_lstm.bias_ih, shape=torch.Size([2048]), num:2048
2022-04-17 22:16:55,679 decoder: attn_lstm.bias_hh, shape=torch.Size([2048]), num:2048
2022-04-17 22:16:55,679 decoder: lang_lstm.weight_ih, shape=torch.Size([2048, 1024]), num:2097152
2022-04-17 22:16:55,679 decoder: lang_lstm.weight_hh, shape=torch.Size([2048, 512]), num:1048576
2022-04-17 22:16:55,679 decoder: lang_lstm.bias_ih, shape=torch.Size([2048]), num:2048
2022-04-17 22:16:55,679 decoder: lang_lstm.bias_hh, shape=torch.Size([2048]), num:2048
2022-04-17 22:16:55,679 decoder: attn.linear_query.weight, shape=torch.Size([512, 512]), num:262144
2022-04-17 22:16:55,679 decoder: attn.linear_query.bias, shape=torch.Size([512]), num:512
2022-04-17 22:16:55,679 decoder: attn.attn_w.weight, shape=torch.Size([1, 512]), num:512
2022-04-17 22:16:55,679 decoder: attn_linear_context.weight, shape=torch.Size([512, 512]), num:262144
2022-04-17 22:16:55,679 decoder: address_layer.0.weight, shape=torch.Size([512, 1024]), num:524288
2022-04-17 22:16:55,679 decoder: address_layer.0.bias, shape=torch.Size([512]), num:512
2022-04-17 22:16:55,679 decoder: address_layer.2.weight, shape=torch.Size([4, 512]), num:2048
2022-04-17 22:16:55,679 decoder: address_layer.2.bias, shape=torch.Size([4]), num:4
2022-04-17 22:16:55,679 decoder: memory_update_layer.0.weight, shape=torch.Size([512, 1024]), num:524288
2022-04-17 22:16:55,680 decoder: memory_update_layer.0.bias, shape=torch.Size([512]), num:512
2022-04-17 22:16:55,680 decoder: memory_update_layer.2.weight, shape=torch.Size([1024, 512]), num:524288
2022-04-17 22:16:55,680 decoder: memory_update_layer.2.bias, shape=torch.Size([1024]), num:1024
2022-04-17 22:16:55,680 decoder: sentinal_layer.0.weight, shape=torch.Size([512, 512]), num:262144
2022-04-17 22:16:55,680 decoder: sentinal_layer.0.bias, shape=torch.Size([512]), num:512
2022-04-17 22:16:55,680 decoder: sentinal_layer.2.weight, shape=torch.Size([1, 512]), num:512
2022-04-17 22:16:55,680 decoder: sentinal_layer.2.bias, shape=torch.Size([1]), num:1
2022-04-17 22:16:55,680 num params 33, num weights 25942021
2022-04-17 22:16:55,680 trainable: num params 32, num weights 25901061
2022-04-17 22:17:52,931 mp_fts (96738, 2048)
2022-04-17 22:17:53,020 num_data 3397459
/home/lianjunliang/anaconda3/lib/python3.9/site-packages/torch/utils/data/dataloader.py:487: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
2022-04-17 22:17:55,916 mp_fts (4925, 2048)
2022-04-17 22:17:55,921 num_data 172290
Traceback (most recent call last):
File "/home/lianjunliang/anaconda3/envs/asg2cap/controlimcap/driver/asg2caption.py", line 146, in
main()
File "/home/lianjunliang/anaconda3/envs/asg2cap/controlimcap/driver/asg2caption.py", line 94, in main
_model.train(trn_reader, val_reader, path_cfg.model_dir, path_cfg.log_dir,
File "/home/lianjunliang/anaconda3/envs/asg2cap/controlimcap/driver/framework/modelbase.py", line 191, in train
metrics = self.validate(val_reader)
File "/home/lianjunliang/anaconda3/envs/asg2cap/controlimcap/driver/caption/models/captionbase.py", line 66, in validate
for batch_data in val_reader:
File "/home/lianjunliang/anaconda3/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 530, in next
data = self._next_data()
File "/home/lianjunliang/anaconda3/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "/home/lianjunliang/anaconda3/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/home/lianjunliang/anaconda3/lib/python3.9/site-packages/torch/_utils.py", line 457, in reraise
raise exception
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/lianjunliang/anaconda3/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/lianjunliang/anaconda3/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/lianjunliang/anaconda3/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/lianjunliang/anaconda3/envs/asg2cap/controlimcap/readers/imgsgreader.py", line 398, in getitem
'mp_fts': self.mp_fts[self.img_id_to_ftidx_name[image_id][0]],
KeyError: '2334484'

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.