Code Monkey home page Code Monkey logo

dhrse_project's Introduction

Linguistic analysis of Literary texts written by Shakespeare

Dataset description

This project contains a dataset with the complete text of "Hamlet" and a code that analyzes the 10 most frequent words used in the text. "Hamlet" is a tragedy written by William Shakespeare. The play, which is among Shakespeare's most famous and frequently performed works, explores themes of treachery, revenge, incest, and moral corruption. It is structured in five acts, with a series of scenes within each act. The dataset is organized as follows:

File structure

The dataset is organized as follows:

hamlet.txt: This plain text file contains the entire play "Hamlet," organized by acts and scenes. Each act and scene are clearly marked, and character dialogues are presented in the order they appear in the play.

Method and the process

Stopwords removal Stop word removal is a process in natural language processing (NLP) where common, non-essential words (such as "and," "the," "is," etc.) are removed from the text. These words are often filtered out because they do not carry significant meaning and can clutter the analysis. Removing stop words helps in focusing on the more important terms and improves the efficiency of text processing tasks.

Lemmatisation Lemmatization is a process in natural language processing (NLP) that reduces words to their base or root form, known as a "lemma." Unlike stemming, which merely removes prefixes or suffixes to produce a root form, lemmatization considers the context and morphological analysis of the words, ensuring the base form is a valid word. This makes lemmatization more sophisticated and accurate compared to stemming.

N-gram Extraction

The project employs n-gram techniques to extract key terms from the targeted texts. Specifically, bigram and trigram approaches have been employed to analyze the text and uncover significant word patterns.

Visuliasing the data

example_graph.png

Required libraries

  • pandas
  • gensim
  • spacy
  • nltk
  • plotly
  • re
  • math

dhrse_project's People

Contributors

fernandaalvaf avatar annafurtado avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.