Code Monkey home page Code Monkey logo

fake-news's Introduction

Fake News Detector Powered By Machine Learning

A complete example of building an end-to-end machine learning project from initial idea to deployment.

This repo accompanies the blog post series describing how to build a fake news detection application. The posts included here:

  • Initial Setup and Tooling: Describes project ideation, setting up your repository, and initial project tooling.

  • Exploratory Data Analysis: Describes how to acquire a dataset and perform exploratory data analysis with tools like Pandas in order to better understand the problem.

  • Building a V1 Model Training/Testing Pipeline: Describes how to get a functional training/evaluation pipeline for the first ML model (a random-forest classifier), including how to properly test various parts of your pipeline.

  • Error Analysis and Model V2: Describes how to interpret what your first model has learned through feature analysis (via techniques like Shapley values) and error analysis. Also works toward a second model powered by Roberta.

  • Model Deployment and Continuous Integration: Describes how to deploy your model using FastAPI and Docker and build an accompanying Chrome extension. Also illustrates key components of a continuous integration system for collaborating on the application with other team members in a scalable and reproducible fashion.

Features

How to Use It

Go to the root directory of the repo and run:

pip install -r requirements.txt

Download the data from this link into data/raw.

You're ready to go!

Train

To train the random forest baseline, run the following from the root directory:

dvc repro train-random-forest

Your output should look something like the following:

INFO - 2021-01-21 21:26:49,779 - features.py - Creating featurizer from scratch...
INFO - 2021-01-21 21:26:49,781 - tree_based.py - Initializing model from scratch...
INFO - 2021-01-21 21:26:49,781 - train.py - Training model...
INFO - 2021-01-21 21:26:50,163 - features.py - Saving featurizer to disk...
INFO - 2021-01-21 21:26:50,169 - tree_based.py - Featurizing data from scratch...
INFO - 2021-01-21 21:26:59,360 - tree_based.py - Saving model to disk...
INFO - 2021-01-21 21:26:59,459 - train.py - Evaluating model...
INFO - 2021-01-21 21:26:59,584 - train.py - Val metrics: {'val f1': 0.7587628865979381, 'val accuracy': 0.7266355140186916, 'val auc': 0.8156070164865074, 'val true negative': 381, 'val false negative': 116, 'val false positive': 235, 'val true positive': 552}

Deploy

Once you have successfully trained a model using the step above, you should have a model checkpoint saved in model_checkpoints/random_forest.

Now build your deployment Docker image:

docker build . -f deploy/Dockerfile.serve -t fake-news-deploy

Once your image is built, you can run the model locally via a REST API with:

docker run -p 8000:80 -e MODEL_DIR="/home/fake-news/random_forest" -e MODULE_NAME="fake_news.server.main" fake-news-deploy

From here you can interact with the API using Postman or through a simple cURL request:

curl -X POST http://127.0.0.1:8000/api/predict-fakeness -d '{"text": "some example string"}'

fake-news's People

Contributors

mihail911 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.