Code Monkey home page Code Monkey logo

text-generation's Introduction

Text Generation

This repository demonstrates the word-level language modelling using LSTM (Long Short-Term Memory) networks in Tensorflow and Keras. The LSTM model is trained to predict next word in the sequence by capturing semantic inherent context of the words it has seen in the sequence during training time. The trained LSTM model can then be used to generate any number of words as a new text sequence that share similar statistical properties as the original training sequence data.

Requirements

  • Python 3.9
  • tensorflow 2.8
  • numpy 1.22.3

Usage

  • load_doc() loads the data and clean_doc() preprocesses the data to make it suitable for training the LSTM model. Moreover, save_doc() is used to save the data to the desired path attributed to out_filename variable.
  • build_model() builds the model. The average loss and associated model accuracy are printed after every epoch. Furthermore, to train a new network, run Text Generation.py. To extend the model to use pre-trained GloVe embeddings, please set pre_trained variable to True. Besides, all hyperparameters to control training and generation of the new text sequences are provided in the given .py file.

Generation

  • generate_seq() uses the training dataset to randomly sample the line (set of words) from it and then create vocabulary dictionary for the words lie in the randomly sampled sequence.
  • generate_seq() then generates the words using the preTrained LSTM-based language model.
  • n_words variable can be set to generate as many words as intended to form a new text sequence.

Samples

Input text sequence from The Republic by Plato dataset

when he said that a man when he grows old may learn many things for he can no more learn much than he 
can run much youth is the time for any extraordinary toil of course and therefore calculation and 
geometry and all the other elements of instruction which are a

Generated text sequence

preparation for dialectic should be presented to the name of idle spendthrifts of whom the other is the 
manifold and the unjust and is the best and the other which delighted to be the opening of the soul of 
the soul and the embroiderer will have to be said at

text-generation's People

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.