Light

arvindaroo / gcp-clickbait-detection Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 1.14 MB

License: MIT License

Python 88.37% JavaScript 6.15% HTML 5.48%

gcp-clickbait-detection's Introduction

AI/ML tools in GCP

Team Members -

Anurag Khanra - PES1UG19CS072
Arvind Krishna - PES1UG19CS090

Setup

Requirements
- GCP account with resources available
- The auth token for the GCP user named as key.json in API/
- Python3, pip & virtualenv
Usage
- Clone the repository using git clone [email protected]:ArvindAROO/GCP-clickbait-detection.git
- Create a virtualenv using source venv/bin/activate # ./venv/Scripts/activate
- Activate the same with virtualenv venv # python -m venv venv
- Install dependencies with pip3 install -r requirements.txt
- Start the backend with cd API && python app.py
- Load the plugin/ folder into any Chromium-based browsers extension using load unpacked option

Working

As soon as the button is clicked, the extension with a API call to the backend, with the link of the current website in included in the query
The backend - A Flask app, fetches that websites by scraping the same, and separates the title & body which will be used for further processing

Usage of GCP

Pretrained models
- Tensorflow says "A pre-trained model is a saved network that was previously trained on a large dataset, typically on a large-scale task. You either use the pretrained model as is or use transfer learning to customize this model to a given task.
- Using a pretrained Cloud Natural Language API to find category of the text available in both the title & the body
- Comparing them to each other could give a reasonable accuracy about whether the body given is even related to the title
AutoML Model
- Google describes AutoML as _"AutoML enables developers with limited machine learning expertise to train high-quality models specific to their business needs."
- The Dataset, found on kaggle had 32000 rows with titles of news articles which were classified as either clickbait or not clickbait
- This dataset was cleaned and upload to AutoML with MultiLabel Classification and trained to a reasonable accuracy of 99.94% & a precision of 99.6% and deployed
- Then our backend with fetch the title and send it to this endpoint, and get a response about it being a clickbait with its confidence
- Model Statistics:
The final result is the combination of both of these at 60:40 ratio

Dataset

This dataset - https://www.kaggle.com/datasets/amananandrai/clickbait-dataset was cleaned, corrected and finally used as for the AutoML model Schema:

Column: Headline: Contains the headline of the news article as a string
Column: Clickbait: A boolean - like value which could be either "Clickbait" or "clickbaitnt" depending on either the headline being a clickbait or not

Eg:

Example Images

A non clickbait article vs A clickbait article

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.