Detecting Fake News with Natural Language Processing and Machine Learning in Python
This project employs various natural language processing techniques and machine learning algorithms to classify fake news articles. Using scikit-learn libraries in Python, we aim to differentiate between legitimate and fabricated news.
To set up and run this project on your local machine for development and testing, follow these steps:
Ensure you have the following:
- Python 3.6: If not already installed, download Python from python.org and set up PATH variables if necessary.
- Anaconda: Alternatively, you can download Anaconda from anaconda.com.
- Required Packages: After installing Python or Anaconda, install the necessary packages by running the following commands:
- If using Python 3.6:
pip install -U scikit-learn pip install numpy pip install scipy
- If using Anaconda, run these commands in Anaconda prompt:
conda install -c scikit-learn conda install -c anaconda numpy conda install -c anaconda scipy
- If using Python 3.6:
We use the LIAR dataset, originally designed for Fake News Detection, which contains statements classified into "True" and "False" categories.
- DataPrep.py: Preprocesses and analyzes data, including exploratory data analysis and data quality checks.
- FeatureSelection.py: Implements feature extraction and selection methods, including bag-of-words, n-grams, and term frequency weighting.
- classifier.py: Builds and evaluates classifiers, including Naive Bayes, Logistic Regression, SVM, Stochastic Gradient Descent, and Random Forest.
- prediction.py: Utilizes the final Logistic Regression classifier to predict the class of a news headline provided by the user.
Our best-performing model, Logistic Regression, achieved an F1 score in the 70s range. The learning curves illustrate model performance.
Future improvements could include feature selection techniques like POS tagging, word2vec, and topic modeling, as well as increasing the training data size to enhance model accuracy.
- Clone this repository to your local machine.
- Navigate to the project directory.
- Run the
prediction.py
file as follows:- Anaconda:
python prediction.py
- Python 3.6: Replace
python
with the full path to your Python executable.
- Anaconda:
Follow the on-screen instructions to input a news headline and receive the classification and probability of truth.
Made By: Shivam Verma