Code Monkey home page Code Monkey logo

dependency-based-convolutional-neural-networks-for-sentence-embedding's Introduction

This is the code for our ACL 2015 paper "Dependency-based Convolutional Neural Networks for Sentence Embedding".

Thanks to Yoon Kim for sharing his code and giving suggestions to us for this project. Our project is extended based on the code for his paper "Convolutional Neural Networks for Sentence Classification" (EMNLP 2014). We post this project with Yoon's permission. You are welcome to adapt and optimize our project, but please do not use our code for commercial purpose.

This code runs on python 2.66 and Theano 0.7.

Our model is purely based on words. There is no POS tag information included in our model. There are many ways to improve the performance by including tag info. The most simplest way is treat tag as words and include the tags in convolution. Another way is to use different convolution filters (w in paper) for the words with different tags. You are welcome for discussing or collaborating in the extension with us.

The paper can be found : http://people.oregonstate.edu/~mam/pdf/papers/DCNN.pdf

This version only contains tree+sib model, and this can be easily extend to tree+sib+seq model.

file description:

1. folder "TREC" contains the TREC dataset with 6 categories. Data is from here: http://cogcomp.cs.illinois.edu/Data/QA/QC/ . "TREC_all.txt" is the original data. After we parsed the TREC data set with Stanford parser, we get "TREC_all_parsed.txt". "label_all.txt" is the label for each sentence in "TREC_all.txt".

2. "preindex.py" reforms the sentence into a tree format from the parse file.

3. "process_TREC.py" is the file for text precessing.

4. "conv_net_classes.py" contains some basic function for CNN

5. "conv_sib_gpu.py" is our main function.

6. folder "data" is where you should put the word2vec binary file in order to let "process_TREC.py" works. You could find the file here: https://code.google.com/p/word2vec/

7. "log_170.txt" is the accuracy for training, dev and testing set in each epoch. This result is generated by GPU. 170 means this is the result with 170 as batch size. For other training settings you can find in "conv_sib_gpu.py"

Instruction:

first step: download word2vec file and save it in "data" folder.

second step:
run "python process_TREC.py" ("preindex.py" will be run in this file).

third step:
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python conv_sib_gpu.py 170

This code uses GPU(Tesla K80), but the code still works for CPU. If you want to test on cpu, you could change the above device=gpu to device=cpu and floatX=float32 to floatX=float64. Since there is a precision difference between gpu and cpu, the results will be slightly different in some cases. Compared with other hyperparameters, the performace of the model is relatively sensitive to batch_size and lr_decay. I would suggest to tune these two hyperparameter first. 

In our implementation, we use 10% of training data as dev set. We do not recycle the dev set to train the model again. Some people do this and I believe this will improve the performance.

Mingbo Ma

[email protected]

EECS

Oregon State University

Sep 25 2015

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.