A full Data Science pipeline project about the analysis of newspapers headlines of Perú.
To show how the narrative changes over a period of time in local media, from both independent and main news outlets in Perú.
- Show asociations between different words over a period of time to see how the narrative changes around ceirtain topics
- Show how the media can control the narrative, by looking into the different reactions people have to the tweets
The project consists of 6 pipelines that go from retrieving the data to the data structures that can be used for Data Analysis and Visualisation. Below is a picture of the pipelines flow.
You can also check the sister project nlp-newspapersDashboard where I buil a dashboard with the data comming from this project.
- Data Retrieval
- Cleaning and Preprocessing
- Feature Engineering
- EDA
- Sentiment & Emotion Analysis
- Topic Modeling
Full documentation about the project and the data warehouse can be found in the documentation. And if you want to learn more abouth the building process you can read the accompaning blog post series on Nou de Data.
- Python version:
3.11.5
- Packages: Kedro, Pandas, Numpy, Plotly, Requests, Gensim, Textblob, PySentimiento