Code Monkey home page Code Monkey logo

variational-recurrent-autoencoder-tensorflow's Introduction

Gerating Sentences from a Continuous Space

Tensorflow implementation of Generating Sentences from a Continuous Space.

Prerequisites

  1. Python packages:
    • Python 3.4 or higher
    • Tensorflow r0.12
    • Numpy

Setting up the environment:

  1. Clone this repository:
git clone https://github.com/Chung-I/Variational-Recurrent-Autoencoder-Tensorflow.git
  1. Set up conda environment:
conda create -n vrae python=3.6
conda activate vrae
  1. Install python package requirements:
pip install -r requirements.txt

Usage

Training:

python vrae.py  --model_dir models --do train --new True

Reconstruct:

python vrae.py --model_dir models --do reconstruct --new False --input input.txt --output output.txt

Sample (this script read only the first line of input.txt, generate num_pts samples, and write them into output.txt):

python vrae.py --model_dir models --do sample --new False --input input.txt --output output.txt

Interpolate (this script requires that input.txt consists of only two sentences; it generate num_pts interpolations between them, and write those interpolated sentences into output.txt)::

python vrae.py --model_dir models --do interpolate --new False --input input.txt --output output.txt

model_dir: The location of the config file config.json and the checkpoint file.

do: Accept 4 values: train, encode_decode, sample, or interpolate.

new: create models with fresh parameters if set to True; else read model parameters from checkpoints in model_dir.

config.json

Hyperparameters are not passed from command prompt like that in tensorflow/models/rnn/translate/translate.py. Instead, vrae.py reads hyperparameters from config.json in model_dir.

Below are hyperparameters in config.json:

  • model:

    • size: embedding size, and encoder/decoder state size.
    • latent_dim: latent space size.
    • in_vocab_size: source vocabulary size.
    • out_vocab_size: target vocabulary size.
    • data_dir: path to the corpus.
    • num_layers: number of layers for encoder and decoder.
    • use_lstm: use lstm for encoder and decoder or not. Use BasicLSTMCell if set to True; else GRUCell is used.
    • buckets: A list of pairs of [input size, output size] for each bucket.
    • bidirectional: bidirectional_rnn is used if set to True.
    • probablistic: variance is set to zero if set to False.
    • orthogonal_initializer: orthogonal_initializer is used if set to True; else uniform_unit_scaling_initializer is used.
    • iaf: inverse autoregressive flow is used if set to True.
    • activation: activation for encoder-to-latent layer and latent-to-decoder layer.
      • elu: exponential linear unit.
      • prelu: parametric linear unit. (default)
      • None: linear.
  • train:

    • batch_size
    • beam_size: beam size for decoding. Warning: beam search is still under implementation. NotImplementedError would be raised if beam_size is set to be greater than 1.
    • learning_rate: learning rate parameter passed into AdamOptimizer.
    • steps_per_checkpoint: save checkpoint every steps_per_checkpoint steps.
    • anneal: do KL cost annealing if set to True.
    • kl_rate_rise_factor: KL term weight is increasd by this much every steps_per_checkpoint steps.
    • max_train_data_size: Limit on the size of training data (0: no limit).
    • feed_previous: If True, only the first of decoder_inputs will be used (the "GO" symbol), and all other decoder inputs will be generated by: next = embedding_lookup(embedding, argmax(previous_output)). In effect, this implements a greedy decoder. It can also be used during training to emulate http://arxiv.org/abs/1506.03099. If False, decoder_inputs are used as given (the standard decoder case).
    • kl_min: the minimum information constraint. Should be a non-negative float (where 0 is no constraint).
    • max_gradient_norm: gradients will be clipped to maximally this norm.
    • word_dropout_keep_prob: probability of randomly replacing some fraction of the conditioned-on word tokens with the generic unknown word token UNK. when equal to 0, the decoder sees no input.
  • reconstruct:

    • feed_previous
    • word_dropout_keep_prob
  • sample:

    • feed_previous
    • word_dropout_keep_prob
    • num_pts: sample num_pts points.
  • interpolate:

    • feed_previous
    • word_dropout_keep_prob
    • num_pts: sample num_pts points.

Data

Penn TreeBank corpus is included in the repo. We also provide a Chinese poem corpus, its preprocessed version (set {"model":{"data_dir": "<corpus_dir>"}} in <model_dir>/config.json to it), and its pretrained model (set model_dir to it), all of which can be found here.

variational-recurrent-autoencoder-tensorflow's People

Contributors

chung-i avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

variational-recurrent-autoencoder-tensorflow's Issues

TensorFlow placeholder error

I'm very excited to see this project here -- I've been wondering about this technique ever since reading the paper!

I'm encountering a problem when I try to run main.py. It completes the first epoch of training, with this result…

Epoch: 1 Train costs: -0.670
Epoch: 1 Train KL divergence: -4.003
Epoch: 1 Train reconstruction costs: 0.000

…but then when it switches to testing, it halts at the first batch with this error:

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Train/Model/Placeholder' with dtype int32 and shape [32,5]

Have you encountered errors like this before? Any advice?

Thanks for your consideration!

Trained model produces completely different sentences in reconstruct/interpolate

Hi, I trained this model in the default config, on 100 thousand cleaned sentences taken from the google corpus, for ~16000 steps.
When I was running the sampling tasks, it seems the model has completely reconstructed my inquiry. For example, when I use the input "what" for reconstruction, I get "is there american yes" as output.
May I ask what am I doing wrong...? The same phenomenon is observed for all other tasks as well (for example, in interpolation I do get a sentence moving to another... just neither of them is my original input. Both are very far apart from my input indeed.)
I don't think this is expected to happen, right? Since the paper includes a sentence completion task.

---note---
My suspects are that

  1. The default vocab size was 20000, so maybe many words are not encoded at all? I can't test on this right now because increasing the size significantly slows down the training.
  2. The default word keep rate was 0. Had I set it to 1, would the model be able to at least keep the fed previous?

Thanks in advance!

TypeError: 'Tensor' object is not iterable.

Thanks for your great work! I wonder have you tried using LSTM to run the model? Seems this error will occur when I set use_lstm=True: TypeError: 'Tensor' object is not iterable.. And it points to this line of code. Do you have any idea about it? Thanks in advance!

Training time

It's been training on a single-GPU machine for 3 days, and there is no sign of finishing. How long should it take?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.