Phishing URL Detector Plugin API

Table of Content

Introduction
Installation
Directory Tree
Model Comparison
Conclusion

Introduction

Phishing URL Detection

Overview

This project is about detecting phishing URLs using machine learning algorithms. The project consists of three main parts: data loading and cleaning, feature extraction, and model training and evaluation. The project uses the Gradient Boosting Classifier to classify phishing URLs with an accuracy of over 96%.

Installation

To run the project, you can follow these steps:

Clone the repository: git clone https://github.com/your-username/Phishing-URL-Detection.git
Install the required packages: pip install -r requirements.txt
Run the Flask application: python app.py

To see project click here.

Directory Tree

├── db
│   ├── load_data.py
│   ├── save_data.py
│   ├── train_model.py
├── pickle
│   ├── model.pkl
├── Phishing URL Detection.ipynb
├── README.md
├── app.py
├── database.db
├── feature.py
├── phishing.csv
├── requirements.txt

Files

app.py: Flask web application for testing the model
feature.py: script for extracting features from URLs
database.db: SQLite database for storing URLs and their labels
phishing.csv: dataset containing URLs and their labels
pickle/model.pkl: serialized model object
joblib/gbc_model.joblib: serialized model object using joblib
db/load_data.py: script for loading data into the database
db/save_data.py: script for saving data to the database
db/train_model.py: script for training and evaluating the model
Phishing URL Detection.ipynb: Jupyter notebook containing the project code and documentation
README.md: readme file explaining the project

Technologies Used

Model Comparison

ML Model	Accuracy	F1 Score	Recall	Precision
Gradient Boosting Classifier	0.974	0.977	0.994	0.986
Multi-layer Perceptron	0.971	0.974	0.990	0.991
XGBoost Classifier	0.969	0.973	0.993	0.984
Random Forest	0.966	0.970	0.994	0.984
Support Vector Machine	0.964	0.968	0.980	0.965
Decision Tree	0.958	0.962	0.991	0.993
K-Nearest Neighbors	0.956	0.961	0.991	0.989
Logistic Regression	0.934	0.941	0.943	0.927
Naive Bayes Classifier	0.914	0.922	0.907	0.922

The table above shows the performance metrics of various machine learning models trained on the phishing URL dataset. The accuracy, F1 score, recall, and precision are reported for each model. The results show that the Gradient Boosting Classifier has the highest accuracy, F1 score, recall, and precision among all models, with an accuracy of 0.974, F1 score of 0.977, recall of 0.994, and precision of 0.986.```

Feature importance for Phishing URL Detection

Conclusion

The present research work aimed to explore various machine learning models and perform exploratory data analysis on a phishing dataset to understand the features that affect the models' ability to detect whether a URL is safe or not.

The research project involved the creation of a notebook, which provided a significant learning experience in the domain of phishing detection. The project's findings revealed that certain features, such as "HTTPS," "AnchorURL,""LinkInScriptTags,""PrefixSuffix," and "WebsiteTraffic," were crucial in classifying URLs as phishing URLs or not.

After testing various machine learning models, the Gradient Boosting Classifier emerged as the best-performing model, with an accuracy of 97.4%. This performance indicates a promising reduction in the likelihood of malicious attachments.

Overall, this project showcases the significance of machine learning models in detecting phishing URLs and the importance of feature selection in the model's performance. Future research can extend this project to evaluate more advanced features and models, leading to even more accurate results.

Contributing

If you would like to contribute to the project, you can create a pull request with your changes. Please make sure to follow the project's coding conventions and include tests for any new functionality.

jayanth-mkv / phishy Goto Github PK

phishy's Introduction

Phishing URL Detector Plugin API

Table of Content

Introduction

Phishing URL Detection

Overview

Installation

Directory Tree

Files

Technologies Used

Model Comparison

Conclusion

Contributing

phishy's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent