Code Monkey home page Code Monkey logo

hatecrime's Introduction

Hate crime detection and entity extraction

This repo contains code and data for the following paper: "Reporting the unreported: Event Extraction for Analyzing the Local Representation of Hate Crimes". EMNLP 2019

The method includes a Mulit-instance Learning models for detecting hate crimes in local news articles and a RNN model for extracting the entities of each hate crime. The annotated datasets for three types of crimes (hate, kidnapping and homicide) is included in the Data folder.

Getting Started

In order to run the code you need to download the Data and embeddings directory.

Prerequisites

This project uses Python 3.6.2. The following libraries must be installed:

sklearn 0.19.1
tensorflow-gpu 1.11.0
pandas 0.23.0
nltk 3.2.5
tqdm 4.28.1
numpy 1.14.3

Parameters

All the parameters of the code are denoted in params.json. The parameters are defined as following:

  "hidden_size": 100 # hidden size of the LSTM
  "art_filter_sizes": [2, 3, 4] # filter sizes in detect model
  "art_num_filters": 10 # number of different filters in detect model
  "pretrain": true # if set to true, uses Glove embeddings
  "embedding_size": 300 # size of the embedding
  "learning_rate": 0.00001 # learning rate for the detect task
  "keep_ratio": 0.75 # keep ratio for the detect task
  "epochs": 30 # number of epochs
  "entity_keep_ratio": 0.75 # keep ratio in extract task
  "entity_learning_rate": 0.001 # learning rate in extract task
  "batch_size": 5 # size of batches, shows the number of articles in each batch

Running the detection code

In order to run the detection code, use the following script:

python3 run_detect.py --model <MODEL_NAME> --goal <GOAL> --dataset <DATASET> --params <PARAMS_FILE>

substitude the following tokens according to the task in mind:

  • <MODEL_NAME>: you can either use MICNN (the model used in the paper) or ATTN (the hierarchical attention baseline)
  • <GOAL>: the goal of the task is either train or predicts
  • <DATASET>: use one of the three datasets (hate, kidnap or homicide) to perform the detection
  • <PARAM_FILE>: is the .json file that includes all the model parameters. The model uses params.json as default.

Running the extraction code

In order to run the detection code, use the following script:

python3 run_extract.py --goal <GOAL> --params <PARAMS_FILE>

substitude the following tokens according to the task in mind:

  • <GOAL>: the goal of the task is either train or predicts
  • <PARAM_FILE>: is the .json file that includes all the model parameters. The model uses params.json as default.

hatecrime's People

Contributors

aidamd avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.