Code Monkey home page Code Monkey logo

phishy's Introduction

Phishing URL Detector Plugin API

image image

Table of Content

Introduction

Phishing URL Detection

Overview

This project is about detecting phishing URLs using machine learning algorithms. The project consists of three main parts: data loading and cleaning, feature extraction, and model training and evaluation. The project uses the Gradient Boosting Classifier to classify phishing URLs with an accuracy of over 96%.

Installation

To run the project, you can follow these steps:

  1. Clone the repository: git clone https://github.com/your-username/Phishing-URL-Detection.git
  2. Install the required packages: pip install -r requirements.txt
  3. Run the Flask application: python app.py

To see project click here.

Directory Tree

├── db
│   ├── load_data.py
│   ├── save_data.py
│   ├── train_model.py
├── pickle
│   ├── model.pkl
├── Phishing URL Detection.ipynb
├── README.md
├── app.py
├── database.db
├── feature.py
├── phishing.csv
├── requirements.txt

Files

  • app.py: Flask web application for testing the model
  • feature.py: script for extracting features from URLs
  • database.db: SQLite database for storing URLs and their labels
  • phishing.csv: dataset containing URLs and their labels
  • pickle/model.pkl: serialized model object
  • joblib/gbc_model.joblib: serialized model object using joblib
  • db/load_data.py: script for loading data into the database
  • db/save_data.py: script for saving data to the database
  • db/train_model.py: script for training and evaluating the model
  • Phishing URL Detection.ipynb: Jupyter notebook containing the project code and documentation
  • README.md: readme file explaining the project

Technologies Used

Model Comparison

ML Model Accuracy F1 Score Recall Precision
Gradient Boosting Classifier 0.974 0.977 0.994 0.986
Multi-layer Perceptron 0.971 0.974 0.990 0.991
XGBoost Classifier 0.969 0.973 0.993 0.984
Random Forest 0.966 0.970 0.994 0.984
Support Vector Machine 0.964 0.968 0.980 0.965
Decision Tree 0.958 0.962 0.991 0.993
K-Nearest Neighbors 0.956 0.961 0.991 0.989
Logistic Regression 0.934 0.941 0.943 0.927
Naive Bayes Classifier 0.914 0.922 0.907 0.922

The table above shows the performance metrics of various machine learning models trained on the phishing URL dataset. The accuracy, F1 score, recall, and precision are reported for each model. The results show that the Gradient Boosting Classifier has the highest accuracy, F1 score, recall, and precision among all models, with an accuracy of 0.974, F1 score of 0.977, recall of 0.994, and precision of 0.986.```

Feature importance for Phishing URL Detection

image

Conclusion

The present research work aimed to explore various machine learning models and perform exploratory data analysis on a phishing dataset to understand the features that affect the models' ability to detect whether a URL is safe or not.

The research project involved the creation of a notebook, which provided a significant learning experience in the domain of phishing detection. The project's findings revealed that certain features, such as "HTTPS," "AnchorURL,""LinkInScriptTags,""PrefixSuffix," and "WebsiteTraffic," were crucial in classifying URLs as phishing URLs or not.

After testing various machine learning models, the Gradient Boosting Classifier emerged as the best-performing model, with an accuracy of 97.4%. This performance indicates a promising reduction in the likelihood of malicious attachments.

Overall, this project showcases the significance of machine learning models in detecting phishing URLs and the importance of feature selection in the model's performance. Future research can extend this project to evaluate more advanced features and models, leading to even more accurate results.

Contributing

If you would like to contribute to the project, you can create a pull request with your changes. Please make sure to follow the project's coding conventions and include tests for any new functionality.

phishy's People

Contributors

jayanth-mkv avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.