Code Monkey home page Code Monkey logo

companysustainabilityassessment-backend's Introduction

Company sustainability evaluation using AI-based news analysis

by Ayko Schwedler as part of the bachelor thesis.

Technologies used

  • Technologies: Python
  • Python packages: see requirements.txt

How to start it

  1. Install desired PyTorch version (CUDA or normal), to be found at https://pytorch.org/get-started/locally/.
  2. Install all modules from requirements.txt with pip install -r requirements.txt.
  3. If adjustments are desired, there are many options in config.yaml, e.g. how many threads should be used to search for news items. All possible config changes have been described in detail in the file.
  4. The API is started with uvicorn api:app --host 0.0.0.0 --port 8000 --reload and the news analysis with python news_text_analysis.py.

Functionality

This backend is used to periodically (and therefore independently) search for news articles and then evaluate them using various AI algorithms. In addition, the results as well as the management of the data are made available via an API.

How the analysis works

  1. The stored company name and its stored synonyms are each entered in parallel as search terms in GNews. This will yield ~100 news articles per search.
  2. Now the analysis is started. Multiprocessing is used for this (depending on whether a CUDA-capable GPU can be used).
    1. First it is checked which companies appear in the given news article.
      • If at least one company is present and the news article has not yet been analysed, continue.
    2. Now perform a classification for the sustainability indicators in the given news article.
    3. Then perform a sentiment analysis for the news article in general.
    4. Save the results in the database.

Several optimisations were carried out, among others: News articles that already exist, are re-examined for relevant companies (if new ones have been added), but not analysed again using resource-intensive AI.

Example analysis

Analyzed news article: Microsoft extends security log retention following State Department ... - Cybersecurity Dive

Named companies are:

  • Name: Microsoft

Results of classification:

Label Prob
Not Relevant to ESG 0.9
Risk Management and Internal Control 0.72
Data Safety 0.56
Corporate Governance 0.35
Environmental Management 0.28
Climate Risks 0.27
Supply Chain (Economic / Governance) 0.26
Land Acquisition and Resettlement (S) 0.24
Biodiversity 0.19
Values and Ethics 0.18
Wastewater Management 0.16
Responsible Investment & Greenwashing 0.15
Strategy Implementation 0.15
Waste Management 0.14
Product Safety and Quality 0.14
Surface Water Pollution 0.12
Human Rights 0.12
Supply Chain (Social) 0.11
Forced Labour 0.11
Natural Resources 0.1
Employee Health and Safety 0.1
Planning Limitations 0.1
Retrenchment 0.1
Emergencies (Social) 0.1
Soil and Groundwater Impact 0.09
Physical Impacts 0.09
Land Acquisition and Resettlement (E) 0.09
Discrimination 0.09
Hazardous Materials Management 0.08
Land Rehabilitation 0.08
Emergencies (Environmental) 0.08
Energy Efficiency and Renewables 0.07
Animal Welfare 0.07
Disclosure 0.07
Economic Crime 0.06
Indigenous People 0.06
Landscape Transformation 0.06
Legal Proceedings & Law Violations 0.06
Water Consumption 0.06
Labor Relations Management 0.05
Minimum Age and Child Labour 0.05
Air Pollution 0.05
Greenhouse Gas Emissions 0.04
Freedom of Association and Right to Organise 0.04
Communities Health and Safety 0.04
Supply Chain (Environmental) 0.03
Cultural Heritage 0.03

Obtained sentiment: 6.99/10 (Neutral)

How the API works

The REST API is made available with FastAPI. The following functions exist:

Query parameters marked with * are optional.

API Access Points company_name date_range max_sentiment indicator_name synonym_name
/companies
/do_news_exist ✓ *
/news_minimum ✓ *
/news ✓ *
/sustainability_indicators
/indicator_stats ✓ *
/companies (POST)
/synonyms (POST)
/companies (DELETE)
/synonyms (DELETE)

Modelling of the database by means of an ER diagram.

  • Each news item is assigned at least one company and each sustainability indicator exactly once.
  • A news indicator consists of exactly one sustainability indicator.

ER diagramm of the database

Possible improvements to be made

  • Instead of only providing results for either one or all indicators, let the API user choose various indicators per request.
  • Also let the user choose multiple companies for sake of comparison.
  • Allow analysis of news in multiple languages, not just english.
  • Should problems arise: Upgrade company identification from basic string matching to a more advanced technology, for example NER. This would, however, also increase the processing time per news article.
  • Use text/sentence similarity algorithms to analyze only one of multiple news, if these have the same topic.
  • Let the user select various news agencies to select from, instead of always using Google News.
  • Further analysis on the data.

Notes

  • A large part of this code is in German, as the Bachelor thesis itself is written in German.

companysustainabilityassessment-backend's People

Contributors

aykosc avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.