Analyse the Common Crawl News dataset.
Features:
- Scan the Common Crawl News dataset for mentions of German political parties
- Store the raw HTML and the extracted news articles in the a SQLite database
- Analysis with plotly and pandas
-
pip install -e .
-
python pipeline.py
-
Open and run analysis.ipynb