Code Monkey home page Code Monkey logo

nlp-with-disaster-tweets's Introduction

Disaster Tweets Classification

Overview

This project focuses on building a machine learning model to predict whether a tweet is about a real disaster or not. In today's digital age, Twitter has become a crucial communication channel during emergencies. The widespread use of smartphones allows individuals to report real-time observations of emergencies, making it a valuable source of information for disaster relief organizations and news agencies.

However, the challenge lies in distinguishing between tweets that genuinely announce a disaster and those that use disaster-related terms metaphorically or in different contexts. For instance:

Example Tweet

While the word "DISASTER" is used to express frustration and inconvenience related to heavy traffic, it doesn't imply an actual disaster like a natural calamity. In this context, the tweet is about a common, non-emergency situation. This ambiguity underscores the need for a reliable machine learning model capable of discerning the true nature of tweets during emergencies.

Data

The dataset for this project is obtained from Kaggle and can be accessed directly on Kaggle's NLP Getting Started Competition Data Page.

Sample Tweets. It includes a collection of tweets that have been manually classified into two categories:

  • Real Disasters (Target = 1): Tweets associated with actual disasters or emergencies.
  • Non-Disasters (Target = 0): Tweets that are not related to real disasters.

Code

The Jupyter Notebook contains the entire code for data loading, preprocessing, model training, and evaluation. The key steps include:

  1. Data Cleaning and Exploration: Checking for missing values and exploring the dataset.
  2. Text Preprocessing: Cleaning the text data by removing URLs, HTML tags, and non-alphabetic characters. Tokenization and removal of stop words are also performed.
  3. TF-IDF Vectorization: Transforming the cleaned text data into numerical features using TF-IDF vectorization.
  4. Model Building: Creating an ensemble model using three classifiers - Multinomial Naive Bayes, Logistic Regression, and Support Vector Machine.
  5. Model Evaluation: Assessing the model performance on a validation set, including accuracy, confusion matrix, and classification report.

Results

The ensemble model achieved a validation accuracy of approximately 81.44%.Detailed metrics, including precision, recall, and F1 score, can be found in the classification report.

Testing

The trained model is applied to the test dataset (test.csv), and the predictions are stored in submission.csv.

Future enhancements

  • there is a potential avenue to improve the model's performance by incorporating additional data.
  • exploring advanced NLP techniques and fine-tuning the model for improved accuracy.

nlp-with-disaster-tweets's People

Contributors

payal-soni28 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.