Signal Storm: Leveraging Machine Learning to Identify Requests for Help During Natural Disasters

Project Overview

This code creates a machine learning pipeline that can be used to classify tweets sent during an emergency so that help can be sent from an appropriate agency. The project also includes a website where individuals can input new messages and get classification results in several categories.

Installation and Setup

Codes and Resources Used

Editor: VSCode
Python Version: 3.12.0

Python Packages Used

General Purpose: numpy, pandas
Data Manipulation: SQLAlchemy
Data Visualization: matplotlib, plotly
Natural Language Processing: nltk
NLTK Resources: punkt, averaged_perception_tagger, maxent_ne_chunker, wordnet
Machine Learning: scikit-learn, joblib
Web App: Flask, Bootstrap

Instructions

Note: If you're using a virtual environment, please make sure its activated before you run these commands.

To set up the database and machine learning model, run the following commands:
- To run ETL pipeline that cleans data and stores in database: python data/process_data.py data\01_raw\disaster_messages.csv data\01_raw\disaster_categories.csv data\02_stg\stg_disaster_response.db
- To train the ML pipeline that trains the classifier on the base parameters and save the resulting model: python models\train_classifier.py data\02_stg\stg_disaster_response.db models\classifier.pkl The script will then issue the following prompts. Respond "yes", "no" or "exit":
  1. Decide whether to retrain the base model. If the user chooses to retrain, the script loads the base parameters, builds a model using these parameters, trains the model, evaluates it, and saves it to a pickle file.
  2. Decide to estimate the grid search runtime. If the user chooses to estimate, the script loads the grid search parameters and runs a grid search on a small subset of the data to estimate the runtime.
  3. Decide to run a full grid search. If the user chooses to run the grid search, the script runs the grid search, saves the results, and saves the best parameters found by the grid search.
  4. Decide to retrain the model using the optimized parameters found by the grid search. If the user chooses to retrain, the script loads the optimized parameters, builds a model using these parameters, trains the model, evaluates it, and saves it to a pickle file.
  WARNING: If you're running the pipeline locally, this might take a few minutes. The script will run use n-1 cores.
To run the Flask app:

Go to app directory: cd app
Run the web app: python run.py
Copy http://127.0.0.1:3000 or the equivalent into your browser to view the app
- Note: This is the local host, and is restricted to your local machine. The second address is the network address of your server which can be access from any machine on your local network.

Data

The model was built on a combination of the following two data sets:

disaster_messages.csv
- Contains messages set during the disaster. Each message is labeled with one or more disaster-related categories, such as "water", "food", "medical help", etc.
- Messages can be in a variety of languages.'original' messages are predominately in Haitian Creole that were translated into English. The corresponding note or English translation is in the 'message' column.
- Messages are classified into the genres There are three values: direct, news and social
disaster_categories.csv
- Contains the corresponding categories for each message in the disaster_messages dataset. Each category is represented by a binary value (0 or 1), indicating whether the message belongs to that category or not.
- The 'related' column indicates if the message is related to the disaster or not. In the raw data, there are three possible values: 1 (related), 0 (not related) and 2 (ambiguous). The ambiguous messages have been dropped from the training set.

Model Design

The model is designed as a machine learning pipeline that processes text and classifies it into one of the 36 categories in the dataset. The pipeline consists of three main steps:

Text Processing: The text data is first processed using a custom tokenize function from the nltk library. This function normalizes the case, lemmatizes and tokenizes the text. It also handles URL detection and replacement, punctuation removal and stop word removal.
Vectorization and TF-IDF Transformation: The processed text is then vectorized using CountVectorizer with the custom tokenizer. After vectorization, a TF-IDF transformation is applied to the vectorized data.
Multi-output Classification: The transformed data is classified using a RandomForestClassifier.

The trained model is saved to a pickle file for future use.

Tuning the Model for Accuracy

Here are the median values for the original model:

output_class	precision	recall	f1-score
0	96	100	98
1	75	8	14
macro avg	85	54	57
weighted avg	96	96	95

I used GridSearchCV to tune the model for accuracy, and tested the following parameters:

Parameter	Values
`vect__ngram_range`	((1, 1), (1, 2))
`clf__estimator__n_estimators`	[50, 100, 200]
`clf__estimator__min_samples_split`	[2, 3, 4]

This process resulted in the following 'optimized' values:

Parameter	Original Value	Optimized Value
`vect__ngram_range`	(1, 1)	(1, 2)
`clf__estimator__n_estimators`	100	200
`clf__estimator__min_samples_split`	2	2

Here are the median values for the optimized model:

output_class	precision	recall	f1-score
0	96	100	98
1	78	4	7
macro avg	85	52	53
weighted avg	95	96	94

Here are the percent changes between the two model:

output_class	precision	recall	f1-score
0	0.00	0.0	0.00
1	4.00	-50.0	-50.00
macro avg	0.00	-3.7	-7.02
weighted avg	-1.04	0.0	-1.05

The data shows that the optimized model increase precision by 4% for relevant tweets, but decrease recall and the f1-score by 50%. This means that the optimized model is detecting positive cases more accurately, but at the expense of being able to detect all positive cases. This is not a trade that we want to make because we want to make sure that we're capturing as many true positive requests for help. Furthermore, the recall rates for both models are extremely low, highlighting a major drawback in prioritizing precision at a considerable cost to overall performance.

In addition, changing vect__ngram_range from (1, 1) to (1, 2) and clf__estimator__n_estimators from 100 to 200 increased training time from approximately one minute to seven minutes (a 700% increase in computational time). In addition, the optimized model is also 561.85 MB larger than the original model (when neither file is compressed).

Conclusion and Recommendations

Conclusion

The machine learning pipeline developed in this project demonstrates a promising approach to classifying disaster-related messages into 36 categories. However, the model's performance varies across different classes, with some classes achieving high precision at the cost of reduced recall. This trade-off is not ideal for our use case, as we aim to capture as many true positive requests for help as possible.

The model's performance was optimized using GridSearchCV, which significantly increased the computational time. While this resulted in improved precision for some classes, the overall F1-score, which balances precision and recall, and decreases for others. This suggests that the model's performance could be further improved.

Recommendations

Optimize Grid Search for Weighted F1-Score instead of Accuracy: Accuracy is not always the best metric for evaluating a model's performance, especially for imbalanced datasets. Optimizing for the weighted F1-score, which considers both precision and recall, could lead to a more balanced model.
Use a Translation API for Consistent Tweet Translations: The dataset contains messages in various languages, and the quality of translations can significantly impact the model's performance. Using a reliable translation API could ensure consistent and accurate translations.
Consider Class Imbalance: Some classes in the dataset have significantly fewer samples than others, which can bias the model towards the majority classes. Techniques such as oversampling the minority classes or undersampling the majority classes could help address this issue.
Feature Engineering: Additional features could be engineered from the text data to potentially improve the model's performance. For example, the length of the message, the number of words, or the presence of certain keywords could be useful features.

License

MIT License

ghgeist / disaster_response_project Goto Github PK

disaster_response_project's Introduction

Signal Storm: Leveraging Machine Learning to Identify Requests for Help During Natural Disasters

Project Overview

Installation and Setup

Codes and Resources Used

Python Packages Used

Instructions

Data

Model Design

Tuning the Model for Accuracy

Conclusion and Recommendations

Conclusion

Recommendations

License

disaster_response_project's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent