Code Monkey home page Code Monkey logo

visdial's Introduction

VisDial

Code for the paper

Visual Dialog
Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra
arxiv.org/abs/1611.08669
CVPR 2017 (Spotlight)

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Given an image, dialog history, and a follow-up question about the image, the AI agent has to answer the question.

Demo: demo.visualdialog.org

This repository contains code for training, evaluating and visualizing results for all combinations of encoder-decoder architectures described in the paper. Specifically, we have 3 encoders: Late Fusion (LF), Hierarchical Recurrent Encoder (HRE), Memory Network (MN), and 2 kinds of decoding: Generative (G) and Discriminative (D).

models

If you find this code useful, consider citing our work:

@inproceedings{visdial,
  title={{V}isual {D}ialog},
  author={Abhishek Das and Satwik Kottur and Khushi Gupta and Avi Singh
    and Deshraj Yadav and Jos\'e M.F. Moura and Devi Parikh and Dhruv Batra},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2017}
}

Setup

All our code is implemented in Torch (Lua). Installation instructions are as follows:

git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
TORCH_LUA_VERSION=LUA51 ./install.sh

Additionally, our code uses the following packages: torch/torch7, torch/nn, torch/nngraph, Element-Research/rnn, torch/image, lua-cjson, loadcaffe, torch-hdf5. After Torch is installed, these can be installed/updated using:

luarocks install torch
luarocks install nn
luarocks install nngraph
luarocks install image
luarocks install lua-cjson
luarocks install loadcaffe
luarocks install luabitop
luarocks install totem

NOTE: luarocks install rnn defaults to torch/rnn, follow these steps to install Element-Research/rnn.

git clone https://github.com/Element-Research/rnn.git
cd rnn
luarocks make rocks/rnn-scm-1.rockspec

Installation instructions for torch-hdf5 are given here.

NOTE: torch-hdf5 does not work with few versions of gcc. It is recommended that you use gcc 4.8 / gcc 4.9 with Lua 5.1 for proper installation of torch-hdf5.

Running on GPUs

Although our code should work on CPUs, it is highly recommended to use GPU acceleration with CUDA. You'll also need torch/cutorch, torch/cudnn and torch/cunn.

luarocks install cutorch
luarocks install cunn
luarocks install cudnn

Training your own network

Preprocessing VisDial

The preprocessing script is in Python, and you'll need to install NLTK.

pip install nltk
pip install numpy
pip install h5py
python -c "import nltk; nltk.download('all')"

VisDial v1.0 dataset can be downloaded and preprocessed as specified below. The path provided as -image_root must have four subdirectories - train2014 and val2014 as per COCO dataset, VisualDialog_val2018 and VisualDialog_test2018 which can be downloaded from here.

cd data
python prepro.py -download -image_root /path/to/images
cd ..

To download and preprocess Visdial v0.9 dataset, provide an extra -version 0.9 argument while execution.

This script will generate the files data/visdial_data.h5 (contains tokenized captions, questions, answers, image indices) and data/visdial_params.json (contains vocabulary mappings and COCO image ids).

Extracting image features

Since we don't finetune the CNN, training is significantly faster if image features are pre-extracted. Currently this repository provides support for extraction from VGG-16 and ResNets. We use image features from VGG-16. The VGG-16 model can be downloaded and features extracted using:

sh scripts/download_model.sh vgg 16  # works for 19 as well
cd data
# For all models except mn-att-ques-im-hist
th prepro_img_vgg16.lua -imageRoot /path/to/images -gpuid 0
# For mn-att-ques-im-hist
th prepro_img_vgg16.lua -imageRoot /path/to/images -imgSize 448 -layerName pool5 -gpuid 0

Similarly, ResNet models released by Facebook can be used for feature extraction. Feature extraction can be carried out in a similar manner as VGG-16:

sh scripts/download_model.sh resnet 200  # works for 18, 34, 50, 101, 152 as well
cd data
th prepro_img_resnet.lua -imageRoot /path/to/images -cnnModel /path/to/t7/model -gpuid 0

Running either of these should generate data/data_img.h5 containing features for train, val and test splits corresponding to VisDial v1.0.

Training

Finally, we can get to training models! All supported encoders are in the encoders/ folder (lf-ques, lf-ques-im, lf-ques-hist, lf-ques-im-hist, hre-ques-hist, hre-ques-im-hist, hrea-ques-im-hist, mn-ques-hist, mn-ques-im-hist, mn-att-ques-im-hist), and decoders in the decoders/ folder (gen and disc).

Generative (gen) decoding tries to maximize likelihood of ground-truth response and only has access to single input-output pairs of dialog, while discriminative (disc) decoding makes use of 100 candidate option responses provided for every round of dialog, and maximizes likelihood of correct option.

Encoders and decoders can be arbitrarily plugged together. For example, to train an HRE model with question and history information only (no images), and generative decoding:

th train.lua -encoder hre-ques-hist -decoder gen -gpuid 0

Similarly, to train a Memory Network model with question, image and history information, and discriminative decoding:

th train.lua -encoder mn-ques-im-hist -decoder disc -gpuid 0

Note: For attention based encoders, set both imgSpatialSize and imgFeatureSize command line params, feature dimensions are interpreted as (batch X spatial X spatial X feature). For other encoders, imgSpatialSize is redundant.

The training script saves model snapshots at regular intervals in the checkpoints/ folder.

It takes about 15-20 epochs to train models with generative decoding to convergence, and 4-8 epochs for discriminative decoding.

Evaluation

We evaluate model performance by where it ranks human response given 100 response options for every round of dialog, based on retrieval metrics — mean reciprocal rank, R@1, R@5, R@10, mean rank.

Model evaluation can be run using:

th evaluate.lua -loadPath checkpoints/model.t7 -gpuid 0

Note that evaluation requires image features data/data_img.h5, tokenized dialogs data/visdial_data.h5 and vocabulary mappings data/visdial_params.json.

Running Beam Search & Visualizing Results

We also include code for running beam search on your model snapshots. This gives significantly nicer results than argmax decoding, and can be run as follows:

th generate.lua -loadPath checkpoints/model.t7 -maxThreads 50

This would compute predictions for 50 threads from the val split and save results in vis/results/results.json.

cd vis
# python 3.6
python -m http.server
# python 2.7
# python -m SimpleHTTPServer

Now visit localhost:8000 in your browser to see generated results.

Sample results from HRE-QIH-G available here.

Download Extracted Features & Pretrained Models

v0.9

Extracted features for v0.9 train and val are available for download.

Pretrained models

Trained on v0.9 train, results on v0.9 val.

EncoderDecoderCNNMRRR@1R@5R@10MRDownload
lf-quesgenVGG-160.50480.39740.60670.664917.8003lf-ques-gen-vgg16-18
lf-ques-histgenVGG-160.50990.40120.61550.674017.3974lf-ques-hist-gen-vgg16-18
lf-ques-imgenVGG-160.52060.42060.61650.676017.0578lf-ques-im-gen-vgg16-22
lf-ques-im-histgenVGG-160.51460.40860.62050.682816.7553lf-ques-im-hist-gen-vgg16-26
lf-att-ques-im-histgenVGG-160.53540.43540.63550.694116.7663lf-att-ques-im-hist-gen-vgg16-80
hre-ques-histgenVGG-160.50890.40000.61540.673917.3618hre-ques-hist-gen-vgg16-18
hre-ques-im-histgenVGG-160.52370.42230.62280.681116.9669hre-ques-im-hist-gen-vgg16-14
hrea-ques-im-histgenVGG-160.52380.42130.62440.684216.6044hrea-ques-im-hist-gen-vgg16-24
mn-ques-histgenVGG-160.51310.40570.61760.677017.6253mn-ques-hist-gen-vgg16-102
mn-ques-im-histgenVGG-160.52580.42290.62740.687416.9871mn-ques-im-hist-gen-vgg16-78
mn-att-ques-im-histgenVGG-160.53410.43540.63180.690317.0726mn-att-ques-im-hist-gen-vgg16-100
lf-quesdiscVGG-160.54910.41130.70200.79647.1519lf-ques-disc-vgg16-10
lf-ques-histdiscVGG-160.57240.43190.73080.82516.2847lf-ques-hist-disc-vgg16-8
lf-ques-imdiscVGG-160.57450.43310.73980.83405.9801lf-ques-im-disc-vgg16-12
lf-ques-im-histdiscVGG-160.59110.44900.75630.84935.5493lf-ques-im-hist-disc-vgg16-8
lf-att-ques-im-histdiscVGG-160.60790.46920.77310.86355.1965lf-att-ques-im-hist-disc-vgg16-20
hre-ques-histdiscVGG-160.56680.42650.72450.82076.3701hre-ques-hist-disc-vgg16-4
hre-ques-im-histdiscVGG-160.58180.44610.73730.83425.9647hre-ques-im-hist-disc-vgg16-4
hrea-ques-im-histdiscVGG-160.58210.44560.73780.83415.9646hrea-ques-im-hist-disc-vgg16-4
mn-ques-histdiscVGG-160.58310.43880.75070.84345.8090mn-ques-hist-disc-vgg16-20
mn-ques-im-histdiscVGG-160.59710.45620.76270.85395.4218mn-ques-im-hist-disc-vgg16-12
mn-att-ques-im-histdiscVGG-160.60820.47000.77240.86235.2930mn-att-ques-im-hist-disc-vgg16-28

v1.0

Extracted features for v1.0 train, val and test are available for download.

Pretrained models

Trained on v1.0 train + v1.0 val, results on v1.0 test-std. Leaderboard here.

EncoderDecoderCNNNDCGMRRR@1R@5R@10MRDownload
lf-ques-im-histgenVGG-160.51210.456835.0855.9264.0218.8140lf-ques-im-hist-gen-vgg16-24
hre-ques-im-histgenVGG-160.52450.456134.7856.1863.7218.7778hre-ques-im-hist-gen-vgg16-20
mn-ques-im-histgenVGG-160.52800.458035.0556.3563.9219.3128mn-ques-im-hist-gen-vgg16-92
lf-att-ques-im-histgenVGG-160.53620.469736.5857.4064.4818.9550lf-att-ques-im-hist-gen-vgg16-82
mn-att-ques-im-histgenVGG-160.53670.465036.0056.8064.2519.3470mn-att-ques-im-hist-gen-vgg16-100
lf-ques-im-histdiscVGG-160.45310.554240.9572.4582.835.9532lf-ques-im-hist-disc-vgg16-8
hre-ques-im-histdiscVGG-160.45460.541639.9370.4581.506.4082hre-ques-im-hist-disc-vgg16-4
mn-ques-im-histdiscVGG-160.47500.554940.9872.3083.305.9245mn-ques-im-hist-disc-vgg16-12
lf-att-ques-im-histdiscVGG-160.49760.570742.0874.8285.055.4092lf-att-ques-im-hist-disc-vgg16-24
mn-att-ques-im-histdiscVGG-160.49580.569042.4274.0084.355.5852mn-att-ques-im-hist-disc-vgg16-24

License

BSD

visdial's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

visdial's Issues

Issue with prepro.py at line 150.

Why is the loop over 'j' at line no. 150 in prepro.py script necessary, when at test time, we only need to look at the options in the last round.

prepro.py: TypeError: slice indices must be integers or None or have an __index__ method

Hi VisDial team,

Thank you for sharing the great work!
After I ran the command "python prepro.py -download 1", I have the following ouput:

Reading json...
train2014
Tokenizing captions...
Tokenizing questions...
Tokenizing answers...
val2014
Tokenizing captions...
Tokenizing questions...
Tokenizing answers...
Building vocabulary...
Words: 8845
Encoding based on vocabulary...
Creating data matrices...
Traceback (most recent call last):
File "prepro.py", line 161, in
captions_train, captions_train_len, questions_train, questions_train_len, answers_train, answers_train_len, options_train, options_train_list, options_train_len, answers_train_index, images_train_index, images_train_list = create_data_mats(data_train_toks, ques_train_inds, ans_train_inds, args)
File "prepro.py", line 94, in create_data_mats
captions[i][0:caption_len[i]] = data_toks[image_id]['caption_inds'][0:max_cap_len]
TypeError: slice indices must be integers or None or have an index method

Do you know how to fix this?

Thank you!

Best,
Rui

Inconsistent tensor size, training with mn-att-ques-im-hist

Hello,
I am trying to execute train.lua with mn-att-ques-im-hist encoder, I downloaded the data_img_pool5.h file on the google drive link you provided on issue #12 link: https://drive.google.com/open?id=0B-iGspODhEtrUXg5dXV5TlRJUmM

I execute the model on CPU with:
th train.lua -encoder mn-att-ques-im-hist -decoder gen -gpuid -1 -rnnHiddenSize 380 -numEpochs 40 -numLayers 1

and I get the below error:
/home/ubuntu/torch2/install/bin/luajit: /home/ubuntu/torch2/install/share/lua/5.1/nn/CAddTable.lua:16: inconsistent tensor size, expected r_ [400 x 512], t [400 x 512] and src [400 x 380] to have the same number of elements, but got 204800, 204800 and 152000 elements respectively at /home/ubuntu/torch2/pkg/torch/lib/TH/generic/THTensorMath.c:887 stack traceback: [C]: in function 'add' /home/ubuntu/torch2/install/share/lua/5.1/nn/CAddTable.lua:16: in function 'func' .../ubuntu/torch2/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval' .../ubuntu/torch2/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward' ./model.lua:229: in function 'forwardBackward' ./model.lua:74: in function 'trainIteration' train.lua:72: in main chunk [C]: in function 'dofile' ...ntu/torch2/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

Please can you tell how to solve this?
Regards,
Enid

About sampling

Hi,

In your code here model.lua, shouldn't the decoder first couple with the encoder, so the sampling could also conditioned on the image features and the previous conversions?

It seems to me that, the sampling is only based on the word in the previous time-step, without the knowledge of the image features and so on.

Best,
Rui

bad argument #2 to 'add' (sizes do not match at /torch-addons/cutorch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:269)

local img_tr = nn.Dropout(0.5)(

This is the error which I am getting on running train.lua with mn-att-ques-im-hist encoder and gen decoder. And this error corresponds to the encoder forward pass in the forwardBackward function.
I am able to get it running by changing the params.imgFeatureSize (whose value is 4096) in the above-mentioned line to 512.
@abhshkdz

Invalid permutation error reading image feature you gave

Hi, I just want to run the pretrained model but error occurs reading image features
(Invalid permutation error).

I downloaded preprocessed data below.

preprocessed data (data/)
visdial_data_trainval.h5, visdial_params_trainval.json, data_img_vgg16_relu7_trainval.h5

pretrained model (checkpoints/)
lf-att-ques-im-hist-disc-vgg16-24.t7

I run the command below (test set)
th evaluate.lua -loadPath checkpoints/lf-att-ques-im-hist-disc-vgg16-24.t7 -gpuid 0 -split test

stack trace
{
useGt : false
inputQues : "data/visdial_data_trainval.h5"
batchSize : 30
split : "test"
loadPath : "checkpoints/lf-att-ques-im-hist-disc-vgg16-24.t7"
inputJson : "data/visdial_params_trainval.json"
saveRanks : true
saveRankPath : "models/test.json"
backend : "cudnn"
gpuid : 1
inputImg : "data/data_img_vgg16_relu7_trainval.h5"
}
DataLoader loading json file: data/visdial_params.json
Vocabulary size (with ,): 11403

DataLoader loading h5 file: data/visdial_data.h5
DataLoader loading h5 file: data/data_img.h5
Reading image features..
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/torch/Tensor.lua:543: Invalid permutation
stack traceback:

[C]: in function 'assert'
/root/torch/install/share/lua/5.1/torch/Tensor.lua:543: in function 'permute'
dataloader.lua:71: in function 'initialize'
evaluate.lua:81: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

Error with command "python prepro.py -download 1 -image_root /path/to/coco/images

Hi, I have an issue at prepro.py file.
I did "python prepro.py -download 1 -image_root /path/to/coco/images"
It returns

 python prepro.py -download 1 -image_root /path/to/coco/images
/home/ai8503/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
usage: prepro.py [-h] [-download] [-train_split {train,trainval}]
                 [-input_json_train INPUT_JSON_TRAIN]
                 [-input_json_val INPUT_JSON_VAL]
                 [-input_json_test INPUT_JSON_TEST] [-image_root IMAGE_ROOT]
                 [-input_vocab INPUT_VOCAB] [-output_json OUTPUT_JSON]
                 [-output_h5 OUTPUT_H5] [-max_ques_len MAX_QUES_LEN]
                 [-max_ans_len MAX_ANS_LEN] [-max_cap_len MAX_CAP_LEN]
                 [-word_count_threshold WORD_COUNT_THRESHOLD]
prepro.py: error: unrecognized arguments: 1

So, I changed this command to "python prepro.py -downalod -image_root /path/to/coco/images", and it worked well. But I have an issue at line 286.

Saving hdf5...
[train2014] Preparing image paths with image_ids...
  0%|                                                                 | 0/82783 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "prepro.py", line 286, in <module>
    out['unique_img_train'] = get_image_ids(data_train, args, 'train')
  File "prepro.py", line 188, in get_image_ids
    image_ids[i] = id2path[image_id]
KeyError: 378466

I think that the json file seems to be the problem.
How can I solve this problem?

feature extraction error

{
imgSize : 224
layerName : "relu7"
cnnModel : "models/vgg16/VGG_ILSVRC_16_layers.caffemodel"
batchSize : 50
outName : "data_img.h5"
inputJson : "visdial_params.json"
gpuid : 3
cnnProto : "models/vgg16/VGG_ILSVRC_16_layers_deploy.prototxt"
backend : "nn"
imageRoot : "/home/tommy/caffe-recurrent/data/coco/tools/images"
}
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 553432081
Successfully loaded models/vgg16/VGG_ILSVRC_16_layers.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8: 1 1 4096 1000
Processing 82783 images...
/home/tommy/torch/install/bin/lua: ...e/tommy/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
/home/tommy/torch/install/share/lua/5.1/nn/THNN.lua:110: bad argument #4 to 'v' (weight tensor must be 2D (nOutputPlane,nInputPlanekHkW) at /tmp/luarocks_cunn-1.0-0-5194/cunn/lib/THCUNN/SpatialConvolutionMM.cu:13)
stack traceback:
[C]: in function 'v'
/home/tommy/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'SpatialConvolutionMM_updateOutput'
...orch/install/share/lua/5.1/nn/SpatialConvolution.lua:79: in function <...orch/install/share/lua/5.1/nn/SpatialConvolution.lua:76>
(tail call): ?
[C]: in function 'xpcall'
...e/tommy/torch/install/share/lua/5.1/nn/Container.lua:58: in function 'rethrowErrors'
.../tommy/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../tommy/torch/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
prepro_img.lua:94: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: ?

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
...e/tommy/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
.../tommy/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../tommy/torch/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
prepro_img.lua:94: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: ?

Pytorch starter code

Hi,

Do you guyz plan to release starter code in pytorch for the challenge? visdial-rl does provide some insights but is tailored more for Visual Dialog Agents as described in the paper "Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning"

Evaluation of pretrained late fusion model fails

th evaluate.lua -loadPath checkpoints/lf-qih-d.t7 -gpuid 0

Using the pre-trained models and the preprocessed data available for download results in the following error:

Setting up model..==== 104200/104242 =========>.]  ETA: 0ms | Step: 0ms         
Encoder:        lf-ques-im-hist
Decoder:        gen
Evaluating..
numThreads      40504
/home/user/torch/install/bin/lua: ...me/user/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
In 1 module of nn.ConcatTable:
In 3 module of nn.Sequential:
...ome/user/torch/install/share/lua/5.1/rnn/SeqLSTM.lua:99: nn.SeqLSTM expecting previous call to setZeroMask(zeroMask) with maskzero=true

Issue on extracting image feature

Hi i'm getting problem on extracting image feature.
I ran this command, "th prepro_img_vgg16.lua -imageRoot ~/Desktop/2014/ -gpuid 0"
It returns

/home/ai8503/torch/install/bin/lua: ...me/ai8503/torch/install/share/lua/5.1/trepl/init.lua:389: ...me/ai8503/torch/install/share/lua/5.1/trepl/init.lua:389: ...home/ai8503/torch/install/share/lua/5.1/hdf5/ffi.lua:56: expected align(#) on line 579
stack traceback:
	[C]: in function 'error'
	...me/ai8503/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
	prepro_img_vgg16.lua:3: in main chunk
	[C]: in function 'dofile'
	.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: ?

How can I fix this?

dimension doesn't match during evalution

###function Model:predict in model.lua line215
ranks[{{startId, nextStartId - 1}, {}}] = self:retrieveBatch(batch) :view(nextStartId - startId, -1, self.params.numOptions);
the size of self:retrieveBatch(batch) is 300X1 while it is going to be reshaped to 10X300
So, there is a error that The number of covered elements is not a multiple of all elements.

Models for Android Use?

Hi all,

I was hoping to find out more information regarding using your model in my current standalone android app. I'm interested in keeping the same domain initially as I'm only attempting to incorporate this model within Android. Would I just need the data.h5, params.json, and img.h5 files or would I be able to skip that step since my domain is the same?

Thanks.

Beam search with length normalized log likelihood?

Hi guys,

About beam search here, if I understand it correctly, this is the normal beam search without length normalized log likelihood, then it should tend to find a shorter sequence, right? Did you also try the beam search with the length normalized log likelihood?

Thank you!

Best,
Rui

Why is the dimension of input image features in attention case 14x14x4096?

imgFeats = imgFeats:view(-1, self.params.imgSpatialSize, self.params.imgSpatialSize, self.params.imgFeatureSize)

Shouldn't the dimensions be 14x14x512 in case of using the image features for the attention-based encoders?
I am facing this issue (in the encoder forward pass of forwardBackward function) while running the mn-att-ques-im-hist encoder and gen decoder.
Exact problem is also mentioned in #24

Error in forwardbackward() function while training the data

After setting up dependencies, on execution of th train.lua -encoder lf-ques -decoder gen -gpuid 0 error as shown in the snapshot below, is raised. I tried on the different PC configurations and the error still persists hence it is not a system specific problem.
image

THCudaCheck FAIL error=2 : out of memory... MultiGPU training

Hi,

I try to train the model using command "th train.lua -encoder hre-ques-hist -decoder gen -gpuid 1" . But somehow it shows THCudaCheck FAIL ...error=2 : out of memory. I was able to perform training after I reduced batch size less than 40.

I would like to know how can I utilize two GPU for this training? Kindly advise.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.