Code Monkey home page Code Monkey logo

vlstm's Introduction

vLSTM

Vectorized Long Short-term Memory (LSTM) using Matlab and GPU

It supports both the regular LSTM described here and the multimodal LSTM described here.

If you are interested, visit here for details of the experiments described in the multimodal LSTM paper.

Hardware/software requirements

To run the code, you have to have a NVidia GPU with at least 4GB GPU memory. The code was tested in Ubuntu 14.04 and Windows 7 using MATLAB 2014b.

Character level language generation

The task is the same as that in the char-rnn project, which is a good indicator to show if the LSTM implementation is effective.

Generation using a pre-trained model

Open the applications/writer folder but don't enter it. Run lstm_writer_test.m and it will start to generate. In the first a few lines of lstm_writer_val.m you can adjust the starting character. Currently, it starts with "I", so a typical generation is like

I can be the most programmers who would be try to them. But I was anyway that the most professors and press right. It's hard to make them things like the startups that was much their fundraising the founders who was by being worth in the side of a startup would be to be the smart with good as work with an angel round by companies and funding a lot of the partners is that they want to competitive for the top was a strange could be would be a company that was will be described startups in the paper we could probably be were the same thing that they can be some to investors...

Data generation and training

Paul Graham's essay is used in this sample. All text is stored in data/writer/all_text.mat as a string. You may load it manually and see the content. The whole text contains about 2 million characters. To generate the training data, please run data/writer/gen_char_data_from_text_2.m. It will generate four .mat files under data/writer/graham, each file contains 10000 character sequences of length 50, so the four files adds upto 2 million characters.

Once the data is ready, you may run lstm_writer_train.m under applications/writer to start the training. During training, intermediate models will be saved under results/writer. You may launch another Matlab and run lstm_writer_test.m with the newly saved model instead of writer.mat to test it.

Multimodal LSTM for speaker naming

The training procedure of the Multimodal speaker naming LSTM as well as the pre-processed data (the one you can use off-the-shelf) has been releaseed. Please follow the instruction below to perform the training.

Download data

Please go here or here to download all the pre-processed training data and put all the files under data/speaker-naming/processed_training_data/, following the existing folder structure inside.

In addition, please go here or here to download the pre-processed multimodal validation data and put all the files under data/speaker-naming/raw_full/, following the existing folder structure inside.

Start training

Once all the data is in place, you may start to train 3 types of models, namly the model only classifies the face features, the model only classifies the audio features and the model simultaneously classifies the face+audio multimodal features (multimodal LSTM).

To train the face only model, you may run this script.
To train the audio only model, you may run this script.
To train the face+audio multimodal LSTM model, you may run this script.

Meanwhile, you can also run tests for the aforementioned three models by using the pre-train models.
This script for testing the pre-train face only model.
This script for testing the pre-train audio only model.
This script for testing the pre-train face-audio multimodal LSTM model.

Citations

Jimmy SJ. Ren, Yongtao Hu, Yu-Wing Tai, Chuan Wang, Li Xu, Wenxiu Sun, Qiong Yan, "Look, Listen and Learn - A Multimodal LSTM for Speaker Identification", The 30th AAAI Conference on Artificial Intelligence (AAAI-16).

vlstm's People

Contributors

jimmy-ren avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.