Code Monkey home page Code Monkey logo

predicting-merger-decision-outcomes's People

Contributors

adellegia avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

predicting-merger-decision-outcomes's Issues

Model using SVM

Model

  • Run grid search using Linear SVC
  • move all gridsearch to a new notebook
  • Select best parameters
  • Evaluate model
  • feature importance for phase2=0 and phase2=1
  • Run by section
  • save evaluation metrics, confusion matrix

Writeup

  • Check false positives and false negatives
  • Describe confusion matrix - nature of mergers, nace_code, year, article
  • Describe confusion matrix using n-gram importance
  • Writeup

Robustness

  • change random seed
  • change pre-process - lowercase, stopwords
  • Compare metrics

Labels

  1. Phase1 vs. Phase2
  2. Conditions vs. W/o conditions
  3. OPTIONAL: Competition vs. Anticompetitive (variation of conditions label added Art 8.3)

Convert to text representations

  • N-grams, TFIDF
  • word embeddings
  • n-gram embeddings for SVC and get feature importance
  • clustering by topic (BERTtopic?)
  • check maximum token length

Model using DNNs

Option 1

  • word embeddings + multilayer perceptron

Option 2

  • legal-bert
  • transfer learning
  • how to finetune

Further pre-processing of outliers text

  • add function to parse by section for outliers
  • change functions - extract_year, extract_simp_text (simp_text must be <=5 len and art6.1b)
  • re-run parsing for all?

Create tables, plots, and charts

Visualization

  • wordcloud per section using n-gram features
  • wordcloud of all sections n-gram features for svm_wc main and robustness
  • [ ]

Baseline model logit regression and svm phase2

Baseline model

  • Remove obvious words?
  • Run grid search using LOGIT and SVM
  • Select best parameters
  • Evaluate model
  • feature importance for phase2=0 and phase2=1
  • For reproducibility: Create pipeline to vectorize, oversample/undersample, train, evaluate

Labels

  1. Phase1 vs. Phase2
  2. Conditions vs. W/o conditions
  3. Competition vs. Anticompetitive (variation of conditions label added Art 8.3)

Modeling of main research question

main research question

  • single_bidder: 1 if there was no competition for the contract
  • contract_eu_funded: 1 if the contract was supported by EU funding
  • est_value: the raw estimated value of the contract (currency/VAT unclear)
  • contract_value: the raw value of the contract (currency/VAT unclear )

Update typos in pre-processing

  • use preprocess_function
  • save pre-processed df again to remove obvious words and duplicates plus other data cleaning in baseline notebook

Setup experiments to do

Questions

  • What to do with simplified decisions? probably disregard
  • How many models to run? Main: logit, svm, legal_bert
  • Only 1 logit for baseline full_text with phase2?
  • Why use precision-recall, f1-score, instead of accuracy?

Experiments

  • Experiment1: using label1:phase2 vs. label2:wc (with conditions) full text
  • Experiment2: using single section texts from four main sections
  • Experiment 3: using word embeddings for deep learning

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.