The sotu_corpus_small.csv file contains 101 speeches and does not have any of the cell breaks. Please use this one for the project.
The project design is to utilize NLP techniques to preform data mining, determine term frequency–inverse document frequency (TF-IDF) values, latent Dirichlet allocation (LDA) estimations, topic modeling, and sentiment analysis of 101 State of the Union addresses from 1791 to 2019.
Sentiment analysis, topic modeling, TF-IDF and LDA values to derive deeper insights of American politics through the centuries and deepen understanding of NLP processes and results.
Corpus is to be developed from SOTU addresses published to the State of the Union website. A scoped down assortment of all 243 files was used for speed and simplicity.
The NLP modeling will incorporate a variety of scripts and/or Jupyter notebooks from the MSDS 453 Winter 2019 course, those discovered on GitHub, and the SOTU Kaggle website.
GitHub credits:
Daniel Bashir, https://github.com/db7894/sentiment-of-the-union