Code Monkey home page Code Monkey logo

image_captioning_ai_challenger's Introduction

Image Captioning in Chinese (trained on AI Challenger)

This provides the code to reproduce my result on AI Challenger Captioning contest (#3 on test b).

This is based on my ImageCaptioning.pytorch repository and self-critical.pytorch. (They all share a lot of the same git history)

Requirements

Python 2.7 PyTorch 0.2 (along with torchvision) tensorboard-pytorch jieba hashlib

Pretrained models (not supported)

Train your own network on AI Challenger

Download ai_challenger dataset and preprocessing

First, download the ai_challenger images from link. We need both training and validationd data. We decompress the data into a same folder, say data/ai_challenger, the structure would look like:

├── data
│   ├── ai_challenger
│   │   ├── caption_train_annotations_20170902.json
│   │   ├── caption_train_images_20170902
│   │   │   ├── ...
│   │   ├── caption_validataion_annotations_20170910.json
│   │   ├── caption_validation_images_20170910
│   │   │   ├── ...
│   ├── ...

Once we have the images and the annotations, we can now invoke the prepro_*.py script, which will read all of this in and create a dataset (two feature folders, a hdf5 label file and a json file).

$ python scripts/prepro_split_tokenize.py --input_json ./data/ai_challenger/caption_train_annotations_20170902.json ./data/ai_challenger/caption_validation_annotations_20170910.json --output_json ./data/data_chinese.json --num_val 10000 --num_test 10000
$ python scripts/prepro_labels.py --input_json data/data_chinese.json --output_json data/chinese_talk.json --output_h5 data/chinese_talk --max_length 20 --word_count_threshold 20
$ python scripts/prepro_reference_json.py --input_json ./data/ai_challenger/caption_train_annotations_20170902.json ./data/ai_challenger/caption_validation_annotations_20170910.json --output_json ./data/eval_reference.json
$ python scripts/prepro_ngrams.py --input_json data/data_chinese.json --dict_json data/chinese_talk.json --output_pkl data/chinese-train --split train

prepro_split_tokenize will conbine both training and validation data, and randomly the dataset into train, val and test. It will also tokenize the captions using jiebe.

prepro_labels.py will map all words that occur <= 20 times to a special token, and create a vocabulary for all the remaining words. The image information and vocabulary are dumped into data/chinese_talk.json and discretized caption data are dumped into data/chinese_talk_label.h5.

prepro_reference_json.py will prepare the json file for caption evaluation.

prepro_ngrams.py will prepare the file for self critical training.

(Check the prepro scripts for more options, like other resnet models or other attention sizes.)

Prepare the features

We use bottom-up features to get the best results. However, if the code should also support using resnet101 features.

  • Using resnet101
$ python scripts/prepro_feats.py --input_json data/data_chinese.json --output_dir data/chinese_talk --images_root data/ai_challenger --att_size 7

This extracts the resnet101 features (both fc feature and last conv feature) of each image. The features are saved in data/chinese_talk_fc and data/chinese_talk_att, and resulting files are about 100GB.

  • Using bottom-up-features

Here is the pre-extracted feature for downloading link.

Code for extracting the features is here

Download the evaluation code

Clone from link and link

Start training

mkdir xe
$ bash run_train.sh

Evaluate on test split

$ python eval.py --dump_images 0 --num_images -1 --split test  --model log_dense_box_bn/model-best.pth --language_eval 1 --beam_size 5 --temperature 1.0 --sample_max 1  --infos_path log_dense_box_bn/infos_dense_box_bn-best.pkl

To run ensemble:

python eval_ensemble.py --dump_images 0 --language_eval 1 --batch_size 5 --num_images -1 --split test  --ids dense_box_bn dense_box_bn1 --beam_size 5 --temperature 1.0 --sample_max 1

Acknowledgements

Thanks the original neuraltalk2 and awesome PyTorch team.

image_captioning_ai_challenger's People

Contributors

gujiuxiang avatar raoyongming avatar ruotianluo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

image_captioning_ai_challenger's Issues

Hello Dr.Luo! Could you please briefly introduce the "use_maxout" in your codes?

Hello Dr.Luo!
I am a beginner and I find that "use_maxout" appears in the codes of the models for many times.
Could you please briefly introduce the "use_maxout" in your codes?

罗博士您好,我是初学者,能简要介绍一下use_maxout吗?
我在看您代码里models文件夹下的语言模型的代码,
发现您应该是手写的LSTM而不是用的pytorch中写好的nn.lstm 或 nn.LSTMCell,
您这样做的目的是什么?
use_maxout大概有什么作用?

或者麻烦您转告这是哪篇论文提出的??

TypeError: forward() takes exactly 4 arguments (5 given)

Hi Dr. Luo,
When I try to train some model(show tell, show tell attention, att2in.. etc.)using your Image_Captioning_AI_Challenger code, I meet the follwing error.

ps: The model code training of attention mechanism is normal

DataLoader loading json file:  data/chinese_talk.json
vocab size is  4461
DataLoader loading h5 file:  data/chinese_bu_fc data/chinese_bu_att data/chinese_bu_box data/chinese_talk_label.h5
max sequence length in data is 20
read 240000 image features
assigned 220000 images to split train
assigned 10000 images to split val
assigned 10000 images to split test
Read data: 0.80700802803
Traceback (most recent call last):
  File "train.py", line 241, in <module>
    train(opt)
  File "train.py", line 129, in train
    loss = crit(dp_model(fc_feats, att_feats, labels, att_masks), labels[:, 1:], masks[:, 1:])
  File "/home/andrewcao95/anaconda3/envs/Image_Captioning_AI_Challenger-newest/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/andrewcao95/anaconda3/envs/Image_Captioning_AI_Challenger-newest/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 58, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/andrewcao95/anaconda3/envs/Image_Captioning_AI_Challenger-newest/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
TypeError: forward() takes exactly 4 arguments (5 given)
Terminating BlobFetcher

I try to debug and find the problem, I find the problem source in https://github.com/ruotianluo/Image_Captioning_AI_Challenger/blob/master/train.py#L125-L130
the parameter att_masks is the reason.

if not sc_flag:
            loss = crit(dp_model(fc_feats, att_feats, labels, att_masks), labels[:,1:], masks[:,1:])
        else:
            gen_result, sample_logprobs = dp_model(fc_feats, att_feats, att_masks, opt={'sample_max':0}, mode='sample')
            reward = get_self_critical_reward(dp_model, fc_feats, att_feats, att_masks, data, gen_result, opt)
            loss = rl_crit(sample_logprobs, gen_result.data, Variable(torch.from_numpy(reward).float().cuda(), requires_grad=False))

Compare with your other project ImageCaptioning.pytorch and self-critical.pytorch 's part, I find you've made a lot of code changes here. But it's hard to modify this part code compare with the new version code. Because I find eval_utils.py and train.py both be modify a lot.

If I want to use Image_Captioning_AI_Challenger 's code to run those baseline model due to the chinese AI_Challenger dataset.

How to do the operation with minimum modification cost?

feature

what is the represention in chinese_bu_fc,chinese_bu_att and chinese_bu_box?
how many boxes you use?
can you give me the code of extracting feature?

训练时报错ValueError: could not broadcast input array from shape (7,7,2048) into shape (5,7,7)

使用python2.7, pytorch 2.1 post2
prepro脚本都已执行。训练时报错
Traceback (most recent call last):
File "train.py", line 233, in
train(opt)
File "train.py", line 118, in train
data = loader.get_batch('train')
File "/data/disk1/private/chenweize/concreteness_ch/dataset/Image_Captioning_AI_Challenger/dataloader.py", line 163, in get_batch
data['att_feats'][i*seq_per_img:(i+1)*seq_per_img, :att_batch[i].shape[0]] = att_batch[i]
ValueError: could not broadcast input array from shape (7,7,2048) into shape (5,7,7)
Terminating BlobFetcher

以及run_train.sh中train的--input_fc_dir与--input_att_dir为data/chinese_bu_fc(att),其中的bu是否应该为talk?因为没看到生成chinese_bu_fc的文件夹?上面的代码是已经在将bu改为talk后 运行的结果。如果不改,则会报错
Traceback (most recent call last):
File "train.py", line 233, in
train(opt)
File "train.py", line 118, in train
data = loader.get_batch('train')
File "/data/disk1/private/chenweize/concreteness_ch/dataset/Image_Captioning_AI_Challenger/dataloader.py", line 135, in get_batch
ix, tmp_wrapped = self._prefetch_process[split].get()
File "/data/disk1/private/chenweize/concreteness_ch/dataset/Image_Captioning_AI_Challenger/dataloader.py", line 260, in get
tmp = self.split_loader.next()
File "/data/disk1/private/chenweize/concreteness_ch/env/local/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 201, in next
return self._process_next_batch(batch)
File "/data/disk1/private/chenweize/concreteness_ch/env/local/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 221, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
IOError: Traceback (most recent call last):
File "/data/disk1/private/chenweize/concreteness_ch/env/local/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 40, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/data/disk1/private/chenweize/concreteness_ch/dataset/Image_Captioning_AI_Challenger/dataloader.py", line 189, in getitem
att_feat = np.load(os.path.join(self.input_att_dir, str(self.info['images'][ix]['id']) + '.npz'))['feat']
File "/data/disk1/private/chenweize/concreteness_ch/env/local/lib/python2.7/site-packages/numpy/lib/npyio.py", line 384, in load
fid = open(file, "rb")
IOError: [Errno 2] No such file or directory: 'data/chinese_bu_att/74e1bd18c6836e7e0b88e42923f1f7d9a87d9a91.jpg.npz'

How to generate "data/eval_reference_new.json"?

Hi, Dr. Luo,
Thanks for your awesome project. When I use your code to train the model, and set --language_eval 1 in run_train.sh during the training procedure, an error occurred. It seems that in function language_eval() the program cannot find annFile = 'data/eval_reference_new.json' , which is located in eval_utils.py. Would you kindly tell me how to generate data/eval_reference_new.json if I want to set language_eval=1? Thanks for your help!

训练问题

您好!我用resnet18提取了数据集的特征,但是不知道怎样训练,好像训练代码是针对于bottom-up-attention的。同时也不是很清楚如何用bottom-up-attention提出AI Challenger训练集的特征

No module named pyciderevalcap.ciderD.ciderD

The original code:
import sys
sys.path.append("cider")
from pyciderevalcap.ciderD.ciderD import CiderD
sys.path.append("AI_Challenger/Evaluation/caption_eval")

I have added coco-caption into my current dir, however, I cannot import module pyciderevalcap.ciderD.ciderD. How to load this module?

eval erro

您好,使用您github上的参数训练模型,在eval时遇到了如下错误:
Traceback (most recent call last):
File "eval.py", line 137, in
vars(opt))
File "/home/dusen/YKQ/Image_Captioning_AI_Challenger/eval_utils.py", line 97, in eval_split
data['att_masks'][np.arange(loader.batch_size) * loader.seq_per_img]]
KeyError: 'att_masks'
希望能得到您的解答,万分感谢!

Support Python3

I edit some file to support py3 for someone who needs. The only issue is that it does not support multi GPU to train.

The width and the height of the images

Hi, ruotian, thanks a lot for your code. But I have a problem. I find there are some images which have information in json file but don't exist in the image folder. In this case, how do you process the width and the height of these images?

pre_trained model for test

Hi~I want to test the effect to determine if this model is available to me. So,I don't know if you would like to share the training model. if you can ,that will be a sincerely thanks for you.

Idea behind denseatt?

Hi RT,
I applied your denseatt model on MSCOCO dataset and achieved great result, and I would love to learn which observations and thoughts led you to create this double attention mechanism?
Thanks a lot for the wonderful code.

No such file or directory: 'log_dense_box_bn/infos_dense_box_bn.pkl'

数据和特征都准备好了,但是训练一直报错

read 240000 image features
assigned 220000 images to split train
assigned 10000 images to split val
assigned 10000 images to split test
Traceback (most recent call last):
File "train.py", line 228, in
train(opt)
File "train.py", line 48, in train
with open(os.path.join(opt.start_from, 'infos_'+opt.id+'.pkl')) as f:
IOError: [Errno 2] No such file or directory: 'log_dense_box_bn/infos_dense_box_bn.pkl'
Terminating BlobFetcher

求ai challenger数据集

您好,最近我们在做caption相关的工作,希望能够基于ai challenger的这个数据集做一些工作。但是数据集的官方下载通道被关闭了,不知道您是否愿意分享一下给我们。

an error during training progress about broadcasting

run_train.sh: the parameters we have changed as follows
#! /bin/sh

#larger batch

id="dense_box_bn"$1
ckpt_path="log_"$id
if [ ! -d $ckpt_path ]; then
mkdir $ckpt_path
fi
if [ ! -f $ckpt_path"/infos_"$id".pkl" ]; then
start_from=""
eelse
start_from="--start_from "$ckpt_path
fi

the error we meet when running the train.py
python train.py --id $id --caption_model denseatt --input_json data/chinese_talk.json --input_label_h5 data/chinese_talk_label.h5 --input_fc_dir data/chinese_talk_fc --input_att_dir data/chinese_talk_att --seq_per_img 5 --batch_size 50 --beam_size 1 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path $ckpt_path $start_from --save_checkpoint_every 3000 --language_eval 1 --val_images_use 10000 --max_epoch 37 --rnn_size 1300 --use_box 0 --use_bn 0
vipsl-422-1@vipsl-422-1:~/enjoy-zhangyi/ImageCaptioninginChinese$ bash run_train.sh
Tensorflow not installed; No tensorboard logging.
DataLoader loading json file: data/chinese_talk.json
vocab size is 4461
DataLoader loading h5 file: data/chinese_talk_fc data/chinese_talk_att data/cocotalk_box data/chinese_talk_label.h5
max sequence length in data is 20
read 240000 image features
assigned 220000 images to split train
assigned 10000 images to split val
assigned 10000 images to split test
Traceback (most recent call last):
File "train.py", line 229, in
train(opt)
File "train.py", line 115, in train
data = loader.get_batch('train')
File "/home/vipsl-422-1/enjoy-zhangyi/ImageCaptioninginChinese/dataloader.py", line 163, in get_batch
data['att_feats'][i*seq_per_img:(i+1)seq_per_img, :att_batch[i].shape[0]] = att_batch[i]
ValueError: could not broadcast input array from shape (7,7,2048) into shape (5,7,7)
Terminating BlobFetcher
Tensorflow not installed; No tensorboard logging.
DataLoader loading json file: data/chinese_talk.json
vocab size is 4461
DataLoader loading h5 file: data/chinese_talk_fc data/chinese_talk_att data/cocotalk_box data/chinese_talk_label.h5
max sequence length in data is 20
read 240000 image features
assigned 220000 images to split train
assigned 10000 images to split val
assigned 10000 images to split test
Traceback (most recent call last):
File "train.py", line 229, in
train(opt)
File "train.py", line 115, in train
data = loader.get_batch('train')
File "/home/vipsl-422-1/enjoy-zhangyi/ImageCaptioninginChinese/dataloader.py", line 163, in get_batch
data['att_feats'][i
seq_per_img:(i+1)*seq_per_img, :att_batch[i].shape[0]] = att_batch[i]
ValueError: could not broadcast input array from shape (7,7,2048) into shape (5,7,7)
Terminating BlobFetcher

scst

scst训练的时候结合CIDER和bleu两个训练的效果比单独的CIDER好吗?他们各自的权值是多少呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.