Code Monkey home page Code Monkey logo

legal-entropy's Introduction

Python

Entropy in Legal Language

This repository contains the code developed for the paper:

"Entropy in Legal Language" by Roland Friedrich, Mauro Luzzatto, Elliott Ash (2020), Proceedings of the 2020 Natural Legal Language Processing (NLLP) Workshop, 24 August 2020

A novel method has been introduced to measure the word ambiguity, i.e. local word entropy, in the corpora, based on a word2vec model.

The code has been developed to investigate the word ambiguity in the written text of opinions by the U.S. Supreme Court (SCOTUS) and the German Bundesgerichtshof (BGH), which are representative courts of the common-law and civil-law court systems.

Getting Started

Installation

Download the github repository:

git clone https://github.com/MauroLuzzatto/legal-entropy

Run the makefile to install all python modules needed to run the code:

make init

Or install python requirements and spacy module manually:

pip install -r requirements.txt
python -m spacy download en_core_web_sm
python -m spacy download de_core_news_sm

Run Code

After the installation run the code as follows:

  1. Define the corpora to be processed in corpus_setup.py
  2. Define the corpora to be evaluated in experiment_setup.py
  3. run TextPreprocessing.py
  4. run ModelTraining.py
  5. run EntropyEvaluation.py
  6. run EntropyVisualization.py

Code Overview

The code is structured in five parts:

  1. Experiment Setup
  2. Text Preprocessing
  3. Model Training
  4. Entropy Calculation
  5. Entropy Visualization

1) Experiment Setup

In the experiment setup the relevant corpus are loaded and the type of experiment is defined.

  • corpus_setup.py: defined the corpora that should be loaded and preprocessed
  • experiment_setup.py: define the experiments that should be conducted
  • config.ini: define the main path, where the results should be saved

2) Text Preprocessing

In a first step the text is preprocessed and cleaned. The corpus is split into a set of cleaned (e.g. lowercase, lemmatize) sentences. This also includes the creation of bigrams and trigrams using gensim.

  • TextPreprocessing.py: main class for the text preprocessing and cleaning
  • preprocessing.py: contains helper functions for the data preparation.
  • n_grams.json: define the the threshold and min_count of words for the bigram and trigram creation

3) Model Training

After the text preprocessing, the word2vec model is trained using a defined set of hyperparameter.

  • ModelTraining.py: main class for word2vec model training, the hyperparameters are defined in the json file
  • hyperparemeters.json: define the hyperparameters for the word2vec model training

4) Entropy Calculation

The trained word2vec models are used to calculate the conditional probability of each center words context words. Based on this probability distribution the entropy on word level is calculated (local word entropy).

  • EntropyEvaluation.py: main class used for the entropy calculation based on the predicted context word probability

5) Entropy Visualization

Finally, the calculated word entropies are visualized on a corpus level.

  • Visualization.py: functions for visualizing the results

Authors

legal-entropy's People

Contributors

mauroluzzatto avatar dependabot[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.