Code Monkey home page Code Monkey logo

poems_categorisation_neural_networks's Introduction

Poem Categorisation using Neural Networks

The purpose of this project is to categorize poems into different types.

Dataset Used

The dataset used for training was compiled by PoetryFoundation.org which is available on Kaggle and is also included in the repo. The dataset contains the 574 poems along with their poets and their genre (type). We only require the poems and their type for our training needs.

Packages Required

The code was built on a Python 3.6 environment with the following external libraries:

  • keras
  • pandas
  • numpy
  • scikit-learn
  • nltk
  • textblob

What's in the Repo

  1. poems.csv - Dataset used for training the model (also available on Kaggle)
  2. poems.py - Main trainer program that pre-processes the data, builds and trains the model

Solution

Pre-Processing Data

I had never worked with text data for categorisation before this point so I had to research a bit in order to train a model that could categorize poems in the form of text. Cleaning of the dataset comes first. First of all, not all the words in the poem necessarily contribute in determining the category of the poem. Words like a, you, my, is, etc. have least impact so they were removed with the help of stopwords python module. The most frequent and least frequent words were also removed. Suffixes such as ing, ly, s, etc. were removed for maintaining consistency across all poems. At this point all the basic pre-processing has been done.

Now in order to work with text in neural networks we need to convert them to certain numbers or hashes so that they can be fed into the neural networks and trained. For this task the function for one hot coding in the keras pre-processing library was used. The function creates hashes for each word in the dataset vocabulary.

Building the Model

I have built a fairly simple model that uses three layers namely Embedding Layer, Flatten Layer, and Dense Layer for training. Embedding is a layer that’s frequently used while dealing with text classification. Flatten is used to reduce the dimensionality of Embedding layer’s output since the Dense layer takes in one dimensional data as input.

Output Snippet

  • A snippet showing last few Epochs and the Accuracy

Snippet

Conclusion

The model gives an accuracy that varies from 71.55% to 72.3% which I think is fair enough for a model this simple and for a dataset this limited. But is this accuracy good enough? Probably not, but the model can definitely be improved using various other layers and a better dataset.

There are three categories of poems in the dataset. So, even if a person randomly guesses the categories his probability of being accurate will be 33.33% and the model gives an accuracy of 72% so we can say that model is performing a significantly good categorization.

Note

I haven't deployed this model but I probably will in the near future.

poems_categorisation_neural_networks's People

Contributors

anantsinghcross avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.