facebookarchive / memnn Goto Github PK

Memory Networks implementations

License: Other

Lua 84.98% MATLAB 7.82% Shell 2.34% Python 2.95% C 1.92%

memnn's Introduction

Memory-Augmented Neural Networks

This project contains implementations of memory augmented neural networks. This includes code in the following subdirectories:

MemN2N-lang-model: This code trains MemN2N model for language modeling, see Section 5 of the paper "End-To-End Memory Networks". This code is implemented in Torch7 (written in Lua); more documentation is given in the README in that subdirectory.
MemN2N-babi-matlab: The code for the MemN2N bAbI task experiments of Section 4 of the paper "End-To-End Memory Networks". This code is implemented in Matlab; more documentation is given in the README in that subdirectory.
DBLL: Code to train MemN2N on tasks from the paper "Dialog-based Language Learning". This code is implemented in Torch7; more documentation is given in the README in that subdirectory.
HITL: Code to train MemN2N on tasks from the paper "Dialogue Learning With Human-in-the-Loop". This code is implemented in Torch7; more documentation is given in the README in that subdirectory.
AskingQuestions: Code to train MemN2N on tasks from the paper "Learning through Dialogue Interactions". This code is implemented in Torch7; more documentation is given in the README in that subdirectory.
KVmemnn: Code to train MemN2N on tasks from the paper "Key-Value Memory Networks for Directly Reading Documents". This code is implemented in Torch7; more documentation is given in the README in that subdirectory.
EntNet-babi: Code to train an Entity Network on bAbI tasks, as described in the paper "Tracking the World State with Recurrent Entity Networks". This code is implemented in Torch7; more documentation is given in the README in that subdirectory.

Other 3rd party implementations

python-babi: MemN2N implementation on bAbI tasks with very nice interactive demo.
theano-babi: MemN2N implementation in Theano for bAbI tasks.
tf-lang: MemN2N language model implementation in TensorFlow.
tf-babi: Another MemN2N implementation of MemN2N in TensorFlow, but for bAbI tasks.

memnn's People

Contributors

Stargazers

Watchers

Forkers

skaasj rockt zbessinger claudiouzelac kaynewest kalyanp bendalexis yutarochan peratham raaka1 rtvt123 beronx86 liangkai tomzhang ilovecv xuchenglin28 kastnerkyle suensummit zhoubolei shikharateverest zencoding kreukle victorhcm fdoperezi ml-ai-nlp-ir ml-lab snazz2001 mindis milstein amiltonwong codeaudit albert1988 tpnguyen nimishzynga jianbotang noa kdjyss jamezhetianswang rustjson luojiahuli jude7 zh4ngx yangyuphd shincling nashjojo njuhugn eriche2016 zuiwufenghua xuehui1991 imds2n15 sdutheone g-wang bruinbear vseledkin leetaewoo binyam cosmoharrigan binbinbian soroushmehr cherrypotter leoleishi yanweifu sigmaquan gorov caomw vaibhav-mehta sara62 cheng6076 anirudh9119 dhc123 yfdu1989 lynex likaiguo fiskio jnhwkim xindaya bilguun hfxunlp hyzcn pi19404 dapeng2018 jiangnandexue hitluobin ericxsun jhb86253817 nicholasyuan emilywg albert-ho chagge lijian8 miradel51 cyloss quanpn90 bigdatafly ssaw14 fakhraddin sikuma samdsc001 imclab wonyonyon

memnn's Issues

help with KB based KVMemNN for QA experiments

I want to use KB for question answering using KVMemNN. This repository only contains code for experimenting KVMemNN using wiki-documents. I appreciate it if you provide source code for conducting KB based experiments mentioned in the paper. Thank you.

Can't position encoding be like temporal encoding

Why should the position encoding change along the dimension of the word embedding?
Shouldn't the entire embedding be multiplied elementwise by a constant value?
Consider the sentence "john, went, to, the, hallway", doesn't it suffice to multiply element-wise "john" by a small constant value, say 0.1, and, the last word "hallway" by a larger value.
I am trying to understand the reason behind varying the weight of position encoding along the dimension of a word embedding

Temporal encoding

Is temporal encoding equivalent to using "time words"?

can't run MemN2N-lang-model

Hi,

I'm a beginner of torch. While I run the MemN2N-lang-model in my ubuntu 15.10(CUDA 7.5 is installed and my data is copied from https://github.com/wojzaremba/lstm/tree/master/data ), I get a error:

Could you tell me what's wrong with this?

Thanks~

Dataset can no longer be downloaded

The human annotated dataset is no longer available and without that it is difficult to reproduce the given work. Could you please provide a work around for that?

<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>AllAccessDisabled</Code>
  <Message>All access to this object has been disabled</Message> 
  <RequestId>00EC33BECAAF3FB7</RequestId>
  <HostId>3gzW5QZH/lqRs4tq5zuQcaFbrQtrjgluiSx/leIG3SW9IRtAniZZ10iW3kCZyums5G29LV9gnJs=</HostId>
</Error>

Cannot read "dict-hash.txt"

Hello,

I have a problem while running the file "build_hash.sh". When it tries to run "build_hash.lua", it does not detect dictFile="./data/torch/dict-hash.txt" , I get the error that dict-hash.txt is nil, but it has the processed information. I cannot fix the error, do you know how can I solve it?

Thank you so much,

Andrea

run AskingQuestions/reinforce have some error

first i run the ./setup_data.sh to download the data,and then
i run the code of the AskingQuestions/reinforce/try.sh.
but i meet the errors about this
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-8495/cutorch/lib/THC/generated/../generic/THCTensorMathPointwise.cu line=64 error=59 : device-side assert triggered
/home/wuqiong/torch/install/bin/luajit: /home/wuqiong/.luarocks/share/lua/5.1/nn/Normalize.lua:40: cuda runtime error (59) : device-side assert triggered at /tmp/luarocks_cutorch-scm-1-8495/cutorch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:64
stack traceback:
[C]: in function 'abs'
/home/wuqiong/.luarocks/share/lua/5.1/nn/Normalize.lua:40: in function 'func'
/home/wuqiong/.luarocks/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
/home/wuqiong/.luarocks/share/lua/5.1/nngraph/gmodule.lua:380: in function 'func'
/home/wuqiong/.luarocks/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
/home/wuqiong/.luarocks/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
...g/Mem/MemNN-raw/AskingQuestions/reinforce/RL_memmnet.lua:248: in function 'Forward_Policy_AQorQA'
...g/Mem/MemNN-raw/AskingQuestions/reinforce/RL_memmnet.lua:128: in function 'test'
...g/Mem/MemNN-raw/AskingQuestions/reinforce/RL_memmnet.lua:451: in function 'train'
train_RL.lua:18: in main chunk
[C]: in function 'dofile'
...iong/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50
THCudaCheckWarn FAIL file=/tmp/luarocks_cutorch-scm-1-8495/cutorch/lib/THC/THCCachingHostAllocator.cpp line=196 error=59 : device-side assert triggered
THCudaCheckWarn FAIL file=/tmp/luarocks_cutorch-scm-1-8495/cutorch/lib/THC/THCCachingHostAllocator.cpp line=211 error=59 : device-side assert triggered

Is the code have some error?

Vocabulary size of z/R in output module of EntNet

Hi,

I have been looking at the code and I'm not sure why the output vocabulary size consists of both the word & key embeddings when the key is not tied -- link to code. The step is followed by a narrow operation limiting the logsoftmax to only the words. Is there any reason for the design choice or we can get rid of the extra rows from z/R.

any plan on 1705.05414v2, Key-Value Retrieval Networks for Task-Oriented Dialogue?

the position encoding equation is inconsistent with the equation in the paper

Hi, thanks for sharing the code. Line9 - line 17 in MemN2N-babi-matlab/build_model.m implements the position encoding? It is different from the equation in the paper (section 4.1). Can you explain why they are not the same and why you chose to implement a different one here?

NaN in gradient on A matrix

For a model with adjacent weight tying, as in section 2.2.1, the gradient goes to NaN after a while.
The model is designed to work in bAbI (1k dataset). I tried lowering the learning rate to 1e-5 from 1e-2, that didn't help.
The parameters are initialized according to section 4.2 of the paper. The weights A,C,T_A(temporal encoding), T_C, are initialized from a gaussian with mean=0 and std=0.1. Number of hops are set to 3. Maximum gradient norm is set to 40. Batch size is 32, and embedding dimension is 40.
During training, gradients of A and T_A becomes NaN after about 10 epochs. This doesn't happen for C and T_C. The learning rate anneals at rate of 0.5 after every 15 epochs.

What can I try to address the NaN in gradients of A and T_A? These weights are used only during the first hop.

On some tasks, we observed a large variance in the performance of our model (i.e. sometimes failing badly, other times not, depending on the initialization). To remedy this, we repeated each training 10 times with different random initializations, and picked the one with the lowest training error.

What were the other initializations that worked for you?

Training EntNet on CBT dataset

I have read the paper "TRACKING THE WORLD STATE WITH RECURRENT ENTITY NETWORKS".
In the section 5 experiments, table 4 shows the result of testing Entnet on CBT(children's book test).
I have some problem about training it.
In CBT dataset it looks like {story, query, candidate, answer}.
And bAbI is {story, query, answer}
If I want to train CBT dataset, how can I feed candidate to the model ?
Or candidate only used to prepare data as window sentence ?

Thanks!

KVmemnn setup.sh

setup.sh installs the C library into the torch path, but not the library/*.lua.

For now, I can just setup up so library/*.lua files are findable underneath
${s%luarocks}../share/lua/5.1/library, by continuing the setup.sh approach.

A better approach might install libmemnn and its lua interface via a .spec file
(in which case "library/" might probably not a great name for the lua interface)

Getting provided dataset encounters 403 forbidden

how to download the mentioned dataset, need help!!!

When will it be converted to a Non-Matlab language?

I can't afford Matlab, and I can't read Matlab code. When will this get ported to a different language like Python, C++, Java, etc? I'd like to see it in use with one of the free and open source languages.

Broken link in MemN2N-babi-matlab/README.md

The link to fb.ai/babi is a relative link, actually pointing to https://github.com/facebook/MemNN/blob/master/MemN2N-babi-matlab/fb.ai/babi, which does not work.

Facing problem in HITL

While running online_simulate.lua I am getting the following error :

Running the model

Would it be possible to provide some more details about how to train/run the model and what output to expect? For example what will running the script th online_simulate.lua [params] produce and how long should it run for?

Human-in-the-Loop cannot be downloaded

Data in setup_data.sh and setup_turk_data.sh are no longer available to download:

<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>AllAccessDisabled</Code>
  <Message>All access to this object has been disabled</Message> 
  <RequestId>00EC33BECAAF3FB7</RequestId>
  <HostId>3gzW5QZH/lqRs4tq5zuQcaFbrQtrjgluiSx/leIG3SW9IRtAniZZ10iW3kCZyums5G29LV9gnJs=</HostId>
</Error>

Why the stored memory is used 2 times in a single layer?

Hi MemNN authors (@tesatory) !

I read the paper "End-To-End Memory Networks", and I have 1 question.
Why you use the stored memory x1 , .., xi 2 times in a single layer like figure1? (I attached the figure just in case

)

Why that is not 1, or 3 or more times? Is there any Mathematical reason or just rule of thumb?

Why use element-wise sum rather than concatenate these features?

Hi MemNN authors (@tesatory) ,

Thank you for the great code!
I have a question regarding this line. Why do you use element-wise sum rather than concatenate these two features and feed it to a fully connected layer? Is there any advantages using the element-wise sum?

Thank you!

Best,
Rui

KVMemNN for KBQA

Excuse me , now I am applying the model of key-value memory networks to the KBQA,
but my experiment only shows the 46% accuracy on the test-set,and the paper of Key-Value MemNN shows the 93% accuracy on KB task,
does it really get the so high accuracy on kbqa?
In addition, I only train the model with the qa-dataset without any pre-train process,does it master?
Thank you very much if any one can give me a reply or guidance

Hash-lookup with KVMemNN

Hi,

I have a question regarding the hash-lookup performed by KVMemNNs.
Do you compute the hashes based on the actual words or on the embeddings?
And what kind of hashing function do you use?

Best regards,
Sebastian