nu_453_nlp's Introduction

NU_453_NLP

The sotu_corpus_small.csv file contains 101 speeches and does not have any of the cell breaks. Please use this one for the project.

Project Scope

The project design is to utilize NLP techniques to preform data mining, determine term frequency–inverse document frequency (TF-IDF) values, latent Dirichlet allocation (LDA) estimations, topic modeling, and sentiment analysis of 101 State of the Union addresses from 1791 to 2019.

Desired Outcome

Sentiment analysis, topic modeling, TF-IDF and LDA values to derive deeper insights of American politics through the centuries and deepen understanding of NLP processes and results.

Corpus Development

Corpus is to be developed from SOTU addresses published to the State of the Union website. A scoped down assortment of all 243 files was used for speed and simplicity.

Model

The NLP modeling will incorporate a variety of scripts and/or Jupyter notebooks from the MSDS 453 Winter 2019 course, those discovered on GitHub, and the SOTU Kaggle website.

GitHub credits:

Daniel Bashir, https://github.com/db7894/sentiment-of-the-union

Shayne, https://github.com/shngli/SOTU-mining

Recommend Projects

ksenluu / nu_453_nlp Goto Github PK

nu_453_nlp's Introduction

NU_453_NLP

Project Scope

Desired Outcome

Corpus Development

Model

nu_453_nlp's People

Contributors

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent