ruotianluo / image_captioning_ai_challenger Goto Github PK

Code for AI Challenger contest. (Generating chinese image captions)

Python 98.21% Shell 0.78% HTML 1.01%

image_captioning_ai_challenger's Introduction

Image Captioning in Chinese (trained on AI Challenger)

This provides the code to reproduce my result on AI Challenger Captioning contest (#3 on test b).

This is based on my ImageCaptioning.pytorch repository and self-critical.pytorch. (They all share a lot of the same git history)

Requirements

Python 2.7 PyTorch 0.2 (along with torchvision) tensorboard-pytorch jieba hashlib

Pretrained models (not supported)

Train your own network on AI Challenger

Download ai_challenger dataset and preprocessing

First, download the ai_challenger images from link. We need both training and validationd data. We decompress the data into a same folder, say data/ai_challenger, the structure would look like:

├── data
│   ├── ai_challenger
│   │   ├── caption_train_annotations_20170902.json
│   │   ├── caption_train_images_20170902
│   │   │   ├── ...
│   │   ├── caption_validataion_annotations_20170910.json
│   │   ├── caption_validation_images_20170910
│   │   │   ├── ...
│   ├── ...

Once we have the images and the annotations, we can now invoke the prepro_*.py script, which will read all of this in and create a dataset (two feature folders, a hdf5 label file and a json file).

$ python scripts/prepro_split_tokenize.py --input_json ./data/ai_challenger/caption_train_annotations_20170902.json ./data/ai_challenger/caption_validation_annotations_20170910.json --output_json ./data/data_chinese.json --num_val 10000 --num_test 10000
$ python scripts/prepro_labels.py --input_json data/data_chinese.json --output_json data/chinese_talk.json --output_h5 data/chinese_talk --max_length 20 --word_count_threshold 20
$ python scripts/prepro_reference_json.py --input_json ./data/ai_challenger/caption_train_annotations_20170902.json ./data/ai_challenger/caption_validation_annotations_20170910.json --output_json ./data/eval_reference.json
$ python scripts/prepro_ngrams.py --input_json data/data_chinese.json --dict_json data/chinese_talk.json --output_pkl data/chinese-train --split train

prepro_split_tokenize will conbine both training and validation data, and randomly the dataset into train, val and test. It will also tokenize the captions using jiebe.

prepro_labels.py will map all words that occur <= 20 times to a special 卍 token, and create a vocabulary for all the remaining words. The image information and vocabulary are dumped into data/chinese_talk.json and discretized caption data are dumped into data/chinese_talk_label.h5.

prepro_reference_json.py will prepare the json file for caption evaluation.

prepro_ngrams.py will prepare the file for self critical training.

(Check the prepro scripts for more options, like other resnet models or other attention sizes.)

Prepare the features

We use bottom-up features to get the best results. However, if the code should also support using resnet101 features.

Using resnet101

$ python scripts/prepro_feats.py --input_json data/data_chinese.json --output_dir data/chinese_talk --images_root data/ai_challenger --att_size 7

This extracts the resnet101 features (both fc feature and last conv feature) of each image. The features are saved in data/chinese_talk_fc and data/chinese_talk_att, and resulting files are about 100GB.

Using bottom-up-features

Here is the pre-extracted feature for downloading link.

Code for extracting the features is here

Download the evaluation code

Clone from link and link

Start training

mkdir xe
$ bash run_train.sh

Evaluate on test split

$ python eval.py --dump_images 0 --num_images -1 --split test  --model log_dense_box_bn/model-best.pth --language_eval 1 --beam_size 5 --temperature 1.0 --sample_max 1  --infos_path log_dense_box_bn/infos_dense_box_bn-best.pkl

To run ensemble:

python eval_ensemble.py --dump_images 0 --language_eval 1 --batch_size 5 --num_images -1 --split test  --ids dense_box_bn dense_box_bn1 --beam_size 5 --temperature 1.0 --sample_max 1

Acknowledgements

Thanks the original neuraltalk2 and awesome PyTorch team.

image_captioning_ai_challenger's People

Contributors

Stargazers

Watchers

Forkers

bityangke fword yangyaoyunshu starstylesky jxlijunhao wuzzh zjf zhhezhhe adamttong leeyimuye gzpan likeucode binbinbian issac8huxley porcofly vanpersie32 heyfluke yurenke tonyfaith changquanyou alfredfrank haiboowang xxllp featasy melody-xiaomi zhangzhizz flyfatty daoyijushi littlebadrobot liangyaorong yangxs king-zark jfsdcgy dawncaiyuan xiaoanshi dinghe xudi1997 yjingyu dencechen fangqin0703 amirunpri2018 freshzy sususushi michael-hsu christinaliang y78h11b09 sddai gorgeousyouth mbaey nlpdz zmskye qshuang123 danmeng90 litingfeng00 zhangyuewei98 rfhzhj wang-ii hell-to-heaven huaifeng1993 yzj1ang kdongyi maodong2056 poisonwine yudezhi razortollyx thicv wlufy anoop-qasolve ccnu-xyz

image_captioning_ai_challenger's Issues

about AI Challenger caption Leaderboard

hi, I want to know your bleu, cider and other scores. And do you remember the first place result？thank you for your reply

Code for extracting the features is not found

Hi, ruotian:
The code path of bottom-up feature extracting can't be found.
Is there something wrong with it?
Many thanks for restoring it.

bug 报告：Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument

解决方法：
local_logprob = local_logprob.cpu()

Could you share the performance of your model on your test split?

Could you share the performance of your model on your test split(10,000 images for test)? I used your train-val-test split on which I evaluated my own model, and I wanna take your performance results as baseline.

No such file or directory: 'log_dense_box_bn/infos_dense_box_bn.pkl'

数据和特征都准备好了，但是训练一直报错

read 240000 image features
assigned 220000 images to split train
assigned 10000 images to split val
assigned 10000 images to split test
Traceback (most recent call last):
File "train.py", line 228, in
train(opt)
File "train.py", line 48, in train
with open(os.path.join(opt.start_from, 'infos_'+opt.id+'.pkl')) as f:
IOError: [Errno 2] No such file or directory: 'log_dense_box_bn/infos_dense_box_bn.pkl'
Terminating BlobFetcher

Hello, is this code suitable for flickr8k-cn?

How to generate "data/eval_reference_new.json"?

Hi, Dr. Luo,
Thanks for your awesome project. When I use your code to train the model, and set --language_eval 1 in run_train.sh during the training procedure, an error occurred. It seems that in function language_eval() the program cannot find annFile = 'data/eval_reference_new.json' , which is located in eval_utils.py. Would you kindly tell me how to generate data/eval_reference_new.json if I want to set language_eval=1? Thanks for your help!

How to load the pre-extracted features in test1 from google drive?

I've downloaded the pre-extracted bottom-up features, but I found that the test1_bu_att is one large file with no file extension indicating how to read it properly. Can you give some instructions for this file?

训练时报错ValueError: could not broadcast input array from shape (7,7,2048) into shape (5,7,7)

使用python2.7, pytorch 2.1 post2
prepro脚本都已执行。训练时报错
Traceback (most recent call last):
File "train.py", line 233, in
train(opt)
File "train.py", line 118, in train
data = loader.get_batch('train')
File "/data/disk1/private/chenweize/concreteness_ch/dataset/Image_Captioning_AI_Challenger/dataloader.py", line 163, in get_batch
data['att_feats'][i*seq_per_img:(i+1)*seq_per_img, :att_batch[i].shape[0]] = att_batch[i]
ValueError: could not broadcast input array from shape (7,7,2048) into shape (5,7,7)
Terminating BlobFetcher

以及run_train.sh中train的--input_fc_dir与--input_att_dir为data/chinese_bu_fc(att)，其中的bu是否应该为talk？因为没看到生成chinese_bu_fc的文件夹？上面的代码是已经在将bu改为talk后运行的结果。如果不改，则会报错
Traceback (most recent call last):
File "train.py", line 233, in
train(opt)
File "train.py", line 118, in train
data = loader.get_batch('train')
File "/data/disk1/private/chenweize/concreteness_ch/dataset/Image_Captioning_AI_Challenger/dataloader.py", line 135, in get_batch
ix, tmp_wrapped = self._prefetch_process[split].get()
File "/data/disk1/private/chenweize/concreteness_ch/dataset/Image_Captioning_AI_Challenger/dataloader.py", line 260, in get
tmp = self.split_loader.next()
File "/data/disk1/private/chenweize/concreteness_ch/env/local/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 201, in next
return self._process_next_batch(batch)
File "/data/disk1/private/chenweize/concreteness_ch/env/local/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 221, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
IOError: Traceback (most recent call last):
File "/data/disk1/private/chenweize/concreteness_ch/env/local/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 40, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/data/disk1/private/chenweize/concreteness_ch/dataset/Image_Captioning_AI_Challenger/dataloader.py", line 189, in getitem
att_feat = np.load(os.path.join(self.input_att_dir, str(self.info['images'][ix]['id']) + '.npz'))['feat']
File "/data/disk1/private/chenweize/concreteness_ch/env/local/lib/python2.7/site-packages/numpy/lib/npyio.py", line 384, in load
fid = open(file, "rb")
IOError: [Errno 2] No such file or directory: 'data/chinese_bu_att/74e1bd18c6836e7e0b88e42923f1f7d9a87d9a91.jpg.npz'

an error during training progress about broadcasting

run_train.sh: the parameters we have changed as follows
#! /bin/sh

#larger batch

id="dense_box_bn"$1
ckpt_path="log_"$id
if [ ! -d $ckpt_path ]; then
mkdir $ckpt_path
fi
if [ ! -f $ckpt_path"/infos_"$id".pkl" ]; then
start_from=""
eelse
start_from="--start_from "$ckpt_path
fi

the error we meet when running the train.py
python train.py --id $id --caption_model denseatt --input_json data/chinese_talk.json --input_label_h5 data/chinese_talk_label.h5 --input_fc_dir data/chinese_talk_fc --input_att_dir data/chinese_talk_att --seq_per_img 5 --batch_size 50 --beam_size 1 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path $ckpt_path $start_from --save_checkpoint_every 3000 --language_eval 1 --val_images_use 10000 --max_epoch 37 --rnn_size 1300 --use_box 0 --use_bn 0
vipsl-422-1@vipsl-422-1:~/enjoy-zhangyi/ImageCaptioninginChinese$ bash run_train.sh
Tensorflow not installed; No tensorboard logging.
DataLoader loading json file: data/chinese_talk.json
vocab size is 4461
DataLoader loading h5 file: data/chinese_talk_fc data/chinese_talk_att data/cocotalk_box data/chinese_talk_label.h5
max sequence length in data is 20
read 240000 image features
assigned 220000 images to split train
assigned 10000 images to split val
assigned 10000 images to split test
Traceback (most recent call last):
File "train.py", line 229, in
train(opt)
File "train.py", line 115, in train
data = loader.get_batch('train')
File "/home/vipsl-422-1/enjoy-zhangyi/ImageCaptioninginChinese/dataloader.py", line 163, in get_batch
data['att_feats'][i*seq_per_img:(i+1)seq_per_img, :att_batch[i].shape[0]] = att_batch[i]
ValueError: could not broadcast input array from shape (7,7,2048) into shape (5,7,7)
Terminating BlobFetcher
Tensorflow not installed; No tensorboard logging.
DataLoader loading json file: data/chinese_talk.json
vocab size is 4461
DataLoader loading h5 file: data/chinese_talk_fc data/chinese_talk_att data/cocotalk_box data/chinese_talk_label.h5
max sequence length in data is 20
read 240000 image features
assigned 220000 images to split train
assigned 10000 images to split val
assigned 10000 images to split test
Traceback (most recent call last):
File "train.py", line 229, in
train(opt)
File "train.py", line 115, in train
data = loader.get_batch('train')
File "/home/vipsl-422-1/enjoy-zhangyi/ImageCaptioninginChinese/dataloader.py", line 163, in get_batch
data['att_feats'][iseq_per_img:(i+1)*seq_per_img, :att_batch[i].shape[0]] = att_batch[i]
ValueError: could not broadcast input array from shape (7,7,2048) into shape (5,7,7)
Terminating BlobFetcher

Support Python3

I edit some file to support py3 for someone who needs. The only issue is that it does not support multi GPU to train.

求ai challenger数据集

您好，最近我们在做caption相关的工作，希望能够基于ai challenger的这个数据集做一些工作。但是数据集的官方下载通道被关闭了，不知道您是否愿意分享一下给我们。

Idea behind denseatt?

Hi RT,
I applied your denseatt model on MSCOCO dataset and achieved great result, and I would love to learn which observations and thoughts led you to create this double attention mechanism?
Thanks a lot for the wonderful code.

can't find prepro_reference_json.py file

Hi, ruotian:
In the script subfolder, it seems that you lose the file prepro_reference_json.py.
Would you please add the file?
Many thanks.

TypeError: forward() takes exactly 4 arguments (5 given)

Hi Dr. Luo,
When I try to train some model（show tell, show tell attention, att2in.. etc.）using your Image_Captioning_AI_Challenger code, I meet the follwing error.

ps: The model code training of attention mechanism is normal

DataLoader loading json file:  data/chinese_talk.json
vocab size is  4461
DataLoader loading h5 file:  data/chinese_bu_fc data/chinese_bu_att data/chinese_bu_box data/chinese_talk_label.h5
max sequence length in data is 20
read 240000 image features
assigned 220000 images to split train
assigned 10000 images to split val
assigned 10000 images to split test
Read data: 0.80700802803
Traceback (most recent call last):
  File "train.py", line 241, in <module>
    train(opt)
  File "train.py", line 129, in train
    loss = crit(dp_model(fc_feats, att_feats, labels, att_masks), labels[:, 1:], masks[:, 1:])
  File "/home/andrewcao95/anaconda3/envs/Image_Captioning_AI_Challenger-newest/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/andrewcao95/anaconda3/envs/Image_Captioning_AI_Challenger-newest/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 58, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/andrewcao95/anaconda3/envs/Image_Captioning_AI_Challenger-newest/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
TypeError: forward() takes exactly 4 arguments (5 given)
Terminating BlobFetcher

I try to debug and find the problem, I find the problem source in https://github.com/ruotianluo/Image_Captioning_AI_Challenger/blob/master/train.py#L125-L130
the parameter att_masks is the reason.

if not sc_flag:
            loss = crit(dp_model(fc_feats, att_feats, labels, att_masks), labels[:,1:], masks[:,1:])
        else:
            gen_result, sample_logprobs = dp_model(fc_feats, att_feats, att_masks, opt={'sample_max':0}, mode='sample')
            reward = get_self_critical_reward(dp_model, fc_feats, att_feats, att_masks, data, gen_result, opt)
            loss = rl_crit(sample_logprobs, gen_result.data, Variable(torch.from_numpy(reward).float().cuda(), requires_grad=False))

Compare with your other project ImageCaptioning.pytorch and self-critical.pytorch 's part, I find you've made a lot of code changes here. But it's hard to modify this part code compare with the new version code. Because I find eval_utils.py and train.py both be modify a lot.

If I want to use Image_Captioning_AI_Challenger 's code to run those baseline model due to the chinese AI_Challenger dataset.

How to do the operation with minimum modification cost？

Terminating BlobFetcher

feature

what is the represention in chinese_bu_fc，chinese_bu_att and chinese_bu_box？
how many boxes you use?
can you give me the code of extracting feature?

Hello Dr.Luo! Could you please briefly introduce the "use_maxout" in your codes?

Hello Dr.Luo!
I am a beginner and I find that "use_maxout" appears in the codes of the models for many times.
Could you please briefly introduce the "use_maxout" in your codes?

罗博士您好，我是初学者，能简要介绍一下use_maxout吗？
我在看您代码里models文件夹下的语言模型的代码，
发现您应该是手写的LSTM而不是用的pytorch中写好的nn.lstm 或 nn.LSTMCell，
您这样做的目的是什么？
use_maxout大概有什么作用？

或者麻烦您转告这是哪篇论文提出的？?

训练问题

您好！我用resnet18提取了数据集的特征，但是不知道怎样训练，好像训练代码是针对于bottom-up-attention的。同时也不是很清楚如何用bottom-up-attention提出AI Challenger训练集的特征

No module named pyciderevalcap.ciderD.ciderD

The original code:
import sys
sys.path.append("cider")
from pyciderevalcap.ciderD.ciderD import CiderD
sys.path.append("AI_Challenger/Evaluation/caption_eval")

I have added coco-caption into my current dir, however, I cannot import module pyciderevalcap.ciderD.ciderD. How to load this module?

bug报告：txt = txt + ix_to_word[str(ix)]

https://github.com/ruotianluo/Image_Captioning_AI_Challenger/blob/master/misc/utils.py#L25
需要加一句
ix = int(ix.cpu().numpy())
否则报错

eval erro

您好，使用您github上的参数训练模型，在eval时遇到了如下错误：
Traceback (most recent call last):
File "eval.py", line 137, in
vars(opt))
File "/home/dusen/YKQ/Image_Captioning_AI_Challenger/eval_utils.py", line 97, in eval_split
data['att_masks'][np.arange(loader.batch_size) * loader.seq_per_img]]
KeyError: 'att_masks'
希望能得到您的解答，万分感谢！