Code Monkey home page Code Monkey logo

genre_classification's Introduction

Judging Books By Their Cover: A Multi-Label Genre Classification System

  • This repository contains the code for the Judging Books By Their Cover project.
  • This project aims to create a multi-label classification system capable of predicting the genres of a book.
  • The task mentioned above is accomplished with the help of transfer learning and NLP.

โ“ Problem Statement

In the past decade, online bookstores/ e-commerce websites have taken over traditional bookstores. Most people now order their books online. In such a scenario, e-bookstores face the challenge of effectively organizing the books according to parameters like genres, ratings, user votes, etc. Each book can belong to multiple genres and appeal to different people. The task is to create a system that can predict the genres of a book based on its description and help in the effective organization of data.

๐Ÿ“ Solution

  • Multi-Label Classification refers to the problem of identifying the categories for input data that may belong to multiple categories simultaneously. There may be more than one label for the input data, or there may be no matching labels at all. This approach is excellent for problems like genre classification, where each input text represents a unique blend of different genres. Furthermore, the flexibility of this approach allows us to take care of situations where the input data does not belong to any of the categories in the dataset. This saves us from generating wrong predictions which can be misleading or harmful.
  • In this project, we use the ULMFit approach to create a multi-label classifier which can accomplish the aforementioned task.
  • We use the AWD-LSTM pre-trained language model provided by fastai and fine-tune it on our dataset. After fine-tuning, we use transfer learning to create the multi-label genre predictor.

โณ Dataset

  • We use the Science Fiction Books (10,000+) dataset from Kaggle.
  • The dataset contains 12 CSV files which contain the most relevant information you can find on a book page.
  • Each CSV file represents data from a subgenre of science fiction. For example, sf_aliens.csv contains information about books belonging to the genre Science Fiction (Aliens).
  • Download the dataset and place it in the main directory.
  • Download from here.

๐Ÿ“ Description of files

  • Part 1- genre_classification_language_model.ipynb: In this notebook, we preprocess our data and create a custom language model by fine-tuning the AWD-LSTM pre-trained model provided by fastai.
  • Part 2- genre_classification_eda.ipynb: In this notebook, we perform EDA to gain a better understanding about the dataset.
  • Part 3- genre_classifier_multi_label.ipynb: In this notebook, we use the fine-tuned language model created in Part-1 and re-purpose its knowledge to create a multi-label classifier.
  • Part 4- genre_classification_app.ipynb: Finally, in this part, we put our models to test. We create a random story generator and a multi-label genre predictor using ipywidgets.

๐Ÿ› ๏ธ Requirements

  • Python 3.5+
  • voila
  • fastai
  • spacy==2.2.4
  • matplotlib
  • seaborn
  • plotly
  • packaging
  • ipywidgets==7.5.1
  • Linux

Contributors

  • Mehul Jain

genre_classification's People

Contributors

mehulfollytobevice avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.