Code Monkey home page Code Monkey logo

visualization-app-on-world-events's Introduction

Visualization App on World Events

Making some cool maps about what is happening in Ukraine right now.

#SupportUkraine

Workflow

View Google Drive

1. Preprocess images

image_preprocessing.ipynb

  • Requires:
    • Pillow: pip3 install PyMuPDF Pillow
  • Input: data (see Google Drive for all files; sample on GitHub)
  • Output: data_resized (see Google Drive for all files; sample on GitHub)

2. Extract text with Optical Character Recognition

text_extraction.ipynb

  • Requires:
    • Pillow: pip3 install PyMuPDF Pillow
    • PyTesseract: pip install pytesseract
    • Tesseract: brew install tesseract
  • Input: data_resized (see Google Drive for all files; sample on GitHub)
  • Output: raw_text (see Google Drive for all files; sample on GitHub)

3. Process text

text_processing.ipynb

  • Input: raw_text (see Google Drive for all files; sample on GitHub)
  • Output: cleaned_text (see Google Drive for all files; sample on GitHub)

4. Name Entities Recognition and analysis

  • Run Named Entity Recognition on GPU: Open In Colab
    • Input: cleaned_text (see Google Drive for all files; sample on GitHub)
    • Output: charts in Notebook and on website

5. Geolocation

  • Run Geoparsing on GPU: Open In Colab
    • Input: cleaned_text (see Google Drive for all files; sample on GitHub)
    • Output: geoparse.csv
  • Run Dynamic Mapping on GPU: Open In Colab
    • Requires Mapbox API token
    • Input: geoparse_clean.csv
    • Output: world map and Ukraine map in Notebook and on website

6. Web page

See our live website here and interact with our maps!

visualization-app-on-world-events's People

Contributors

alexdseo avatar auderoy avatar

Watchers

James Cloos avatar

Forkers

candysan-alive

visualization-app-on-world-events's Issues

6. Create Dynamic Map

Make two dynamic maps that display the geolocated entities over time (day-by-day):
a) The World map, where we can observe the
activities/mentions of the larger entities, such as countries
b) Ukraine map, to show the smaller entities that appear in the news such as cities.

7. Host a Webpage

Showcase our work including all entities distribution, dynamical maps, as well as the additional analysis and explanations.

5. Geo-parsing

Perform the geoparsing on GPE and LOC entities. Use Python GeoPy library. Be cognizant of the potential API limits and make sure to optimize for the number of queries.

4. Named Entities Recognition and Analysis (1)

Perform the Named Entity Recognition on the text you extracted. Use SpaCy or NLTK

Get all the: (when using SpaCy)
PERSON, NORP, FAC, ORG, GPE, LOC, PRODUCT, EVENT, DATE and TIME from the text

4. Named Entities Recognition and Analysis (3)

Compare the distributions of the entities between the news sources.

**Note: The Fox News dataset is missing days between March 1st and March 11th. When comparing the news sources, consider this and exclude those dates from CNN and Aljazeera. At the end, we should get 15 bar charts, one for each required entity type and each news source.

3. Text preprocessing

Save the extracted text into a preferable format that will allow you to fetch the
text again when needed. It can be saved as a JSON, or as a DataFrame and then
pickled, or in the set of textual files properly named and organized in the folders, so you know the date and the news source of each text.

Make sure to showcase that script can remove the unnecessary extra text from the menus or the ads.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.