Entropy in Legal Language

This repository contains the code developed for the paper:

"Entropy in Legal Language" by Roland Friedrich, Mauro Luzzatto, Elliott Ash (2020), Proceedings of the 2020 Natural Legal Language Processing (NLLP) Workshop, 24 August 2020

A novel method has been introduced to measure the word ambiguity, i.e. local word entropy, in the corpora, based on a word2vec model.

The code has been developed to investigate the word ambiguity in the written text of opinions by the U.S. Supreme Court (SCOTUS) and the German Bundesgerichtshof (BGH), which are representative courts of the common-law and civil-law court systems.

Getting Started

Installation

Download the github repository:

git clone https://github.com/MauroLuzzatto/legal-entropy

Run the makefile to install all python modules needed to run the code:

make init

Or install python requirements and spacy module manually:

pip install -r requirements.txt

python -m spacy download en_core_web_sm

python -m spacy download de_core_news_sm

Run Code

After the installation run the code as follows:

Define the corpora to be processed in corpus_setup.py
Define the corpora to be evaluated in experiment_setup.py
run TextPreprocessing.py
run ModelTraining.py
run EntropyEvaluation.py
run EntropyVisualization.py

Code Overview

The code is structured in five parts:

Experiment Setup
Text Preprocessing
Model Training
Entropy Calculation
Entropy Visualization

1) Experiment Setup

In the experiment setup the relevant corpus are loaded and the type of experiment is defined.

corpus_setup.py: defined the corpora that should be loaded and preprocessed

experiment_setup.py: define the experiments that should be conducted

config.ini: define the main path, where the results should be saved

2) Text Preprocessing

In a first step the text is preprocessed and cleaned. The corpus is split into a set of cleaned (e.g. lowercase, lemmatize) sentences. This also includes the creation of bigrams and trigrams using gensim.

TextPreprocessing.py: main class for the text preprocessing and cleaning
preprocessing.py: contains helper functions for the data preparation.
n_grams.json: define the the threshold and min_count of words for the bigram and trigram creation

3) Model Training

After the text preprocessing, the word2vec model is trained using a defined set of hyperparameter.

ModelTraining.py: main class for word2vec model training, the hyperparameters are defined in the json file
hyperparemeters.json: define the hyperparameters for the word2vec model training

4) Entropy Calculation

The trained word2vec models are used to calculate the conditional probability of each center words context words. Based on this probability distribution the entropy on word level is calculated (local word entropy).

EntropyEvaluation.py: main class used for the entropy calculation based on the predicted context word probability

5) Entropy Visualization

Finally, the calculated word entropies are visualized on a corpus level.

Visualization.py: functions for visualizing the results

Authors

Mauro Luzzatto - Maurol

djoncon / legal-entropy Goto Github PK

legal-entropy's Introduction

Entropy in Legal Language

Getting Started

Installation

Run Code

Code Overview

1) Experiment Setup

2) Text Preprocessing

3) Model Training

4) Entropy Calculation

5) Entropy Visualization

Authors

legal-entropy's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent