Code Monkey home page Code Monkey logo

hierarchical-earthquake-casualty-information-retrieval's Introduction

Hierarchical-Earthquake-Casualty-Information-Retrieval

Near-real-time Earthquake-induced Fatality Estimation using Crowdsourced Data and Large-Language Models

overview

This is the repo for the hierarchical classifier and information extractor in the Project Human Loss Estimation via LLMs, which introduces a novel methodology for effective automatic earthquake casualty information retrieval from social media by leveraging the LLMs' adavanced ability on natural language understanding.

The repo contains:

  • The hierarchical event classifer and human cost imformation extractor we proposed.
  • The code for visualizing models' results.
  • The sample output of our models.

(Please feel free to email sxu83[AT]jh[DOT]edu or chenguang[DOT]wang[AT]stonybrook[DOT]edu for any questions or feedback)

Contents

Overview

The framework for near-real-time earthquake-induced fatality estimation hinges on a sophisticated hierarchical data extraction process, designed to tackle the challenges of extracting reliable information from social media and traditional media platforms. This process begins with a hierarchical classifier, based on XLM-RoBERTa and trained on the CrisisNLP dataset, which efficiently filters through the vast sea of noisy crowdsourced data. This classifier identifies earthquake-related information with high accuracy, ensuring that only relevant data is processed.

Once filtered, the relevant data is passed to the extraction module, which utilizes Few-Shot Learning techniques based on GPT-J to retrieve the exact number of earthquake-induced casualties. This allows for accurate extraction of critical data points, such as casualty statistics, without requiring extensive training or fine-tuning.

The hierarchical structure of the classifier ensures that only the most pertinent information is processed, significantly reducing noise and improving data handling efficiency. By leveraging the robust prior knowledge embedded in GPT-J, the framework swiftly adapts to varying linguistic patterns and nuances across different regions, enabling the accurate processing of multilingual text.

In practice, this framework provides rapid and precise estimations of human losses during seismic events, making it invaluable for emergency response teams, governments, and non-governmental organizations. It empowers decision-makers to act promptly based on reliable data, ultimately saving lives and resources in the wake of natural disasters.

Run code

The code of training the hierarchical classifier and human loss extraction are located respectively in the:

  • code/Hierarchical Event Classifier
  • code/Human Cost Information Extraction

To get the hierarchical classifier and human loss extraction running, follow these steps:

Pre-requisites

  1. Docker: Ensure Docker is installed on your server. If not, install it from here.

  2. Pull Docker Image: Download the specific PyTorch Docker image:

    docker pull pytorch/pytorch:1.12.0-cuda11.3-cudnn8-devel
  3. Run Docker Image: Start the Docker container with this command:

    docker run -it --name pytorch-container pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel

Setup and Installation

  1. Clone the GitHub repository to your local machine:

    git clone https://github.com/SusuXu-s-Lab/Hierarchical-Earthquake-Casualty-Information-Retrieval.git
  2. Download the data from CrsisNLP. For convenience, this repository includes the data. Pretrained models can be obtained from HuggingFace. The earthquake crowdsourced data not included in the CrisisNLP dataset was gathered through our Crowdsourced Data Retrival Project.

  3. Navigate to the repository's directory and install the necessary Python dependencies:

    cd Hierarchical-Earthquake-Casualty-Information-Retrieval
    pip install -r requirements.txt
  4. Run the code:

  • Update the out_dir and data_dir variables according to your requirements in the code.
  • To change the model, replace the model card from HuggingFace Models
  • to add data, just place the csv/xlsx/tsv in the data folder and change the file path inside the code accordingly

To add new data, place the CSV/XLSX/TSV files in the data folder and update the file paths in the code. Once these adjustments are made, run the code:

cd [classifier or extraction code folder]
python [classifier or extraction code file name].py`

The results will be available in: output/[out_dir]

  1. Visualization: visualize the results using visualization.ipynb.

Performance Comparison

This figure compares fatalities from (a) the 2022 Luding, China earthquake, (b) the 2022 Philippines earthquake, and (c) the 2021 Haiti earthquake. The data, extracted using our framework from Twitter (dashed line with blue markers) and news articles (solid red line), is compared against manually searched data (dashed line with black markers) and the final official toll (black dotted line). Each data point is labeled with the earliest reporting time (UTC). Refer to our paper for the complete results and dicussion.

overview

Prompt

We used the following prompts for few-shot learning to extract inforamtion from crowdsourced data:

    Extract casualty statistics from tweets.

    [Tweet]: {Example 1}
    [Query]: |Deaths|Injured|City|Country|Earthquake|
    [Key]: |{death number}|{injury number}|{city name}|{contry name}|{yes or no}|

    ###

    [Tweet]: {Example 2}
    [Query]: |Deaths|Injured|City|Country|Earthquake|
    [Key]: |{death number}|{injury number}|{city name}|{contry name}|{yes or no}|

    ###

    ... 

    ###

    [Tweet]: {Example n}
    [Query]: |Deaths|Injured|City|Country|Earthquake|
    [Key]: |{death number}|{injury number}|{city name}|{contry name}|{yes or no}|

    ###

    [Tweet]: {Tweet from crowdsourced data}
    [Query]: |Deaths|Injuries|City|Country|Earthquake|
    [Key]: 

ToDo

  • Include newer crisis data in training
  • Implement our method on latest models.
  • Modify the paper

Citation

We kindly request that you cite our paper if you find our code beneficial. Your acknowledgment is greatly appreciated.

@article{wang2023near,
  title={Near-real-time earthquake-induced fatality estimation using crowdsourced data and large-language models},
  author={Wang, Chenguang and Engler, Davis and Li, Xuechun and Hou, James and Wald, David J and Jaiswal, Kishor and Xu, Susu},
  journal={arXiv preprint arXiv:2312.03755},
  year={2023}
}

@inproceedings{hou2022near,
  title={Near-Real-Time Seismic Human Fatality Information Retrieval from Social Media with Few-Shot Large-Language Models},
  author={Hou, James and Xu, Susu},
  booktitle={Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems},
  pages={1141--1147},
  year={2022}
}

hierarchical-earthquake-casualty-information-retrieval's People

Contributors

c-steve-wang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.