Code Monkey home page Code Monkey logo

accmohamedsaber / financial-fraud-detection-using-llms Goto Github PK

View Code? Open in Web Editor NEW

This project forked from amitkedia007/financial-fraud-detection-using-llms

0.0 0.0 0.0 2.11 MB

The aim of this dissertation is to assess the effectiveness of LLMs such as FinBERT and GPT-2 in detecting fraudulent activities in financial reports and statements. This repo provides the code for implementing LLMs, traditional machine learning and deep learning models on the labelled dataset

Python 19.14% Jupyter Notebook 80.86%

financial-fraud-detection-using-llms's Introduction

Financial Fraud Detection Using AI

Project Overview

Introduction: This project utilizes machine learning, deep learning, and Large Language Models (LLMs) to detect financial fraud. It's based on a comprehensive dataset derived from financial filings to the U.S. Securities and Exchange Commission (SEC), aiming to compare and enhance AI models in identifying fraudulent financial activities. (For more information checkout the pdf in the repo)

Objective: The goal is to foster a collaborative platform where data scientists and researchers can develop, test, and improve AI models for detecting financial fraud.

Dataset Description

Source: The dataset includes financial filings from 170 companies, split equally between those involved in fraudulent and non-fraudulent activities.

Structure: Each dataset entry contains details such as Central Index Key (CIK), filing year, company name, and a categorical indicator of fraud.

Final Dataset: Finally the dataset is out on Kaggle do check it out here..

Data Preprocessing

Preprocessing steps involve text cleaning, tokenization, and transforming data into machine-readable formats, ensuring balanced and fair model training.

Model Implementation

The project encompasses a variety of models, including Logistic Regression, SVM, Random Forest, XGBoost, ANN, HAN, GPT-2, and FinBERT, selected for their NLP capabilities and potential in fraud detection.

To Reproduce

Codebase: Complete code for data extraction, preprocessing, model training, and evaluation is available in this repository.

Environment: A requirements.txt file is provided for setting up a consistent environment.

Documentation: Each script is documented with clear instructions in the README.md, guiding through environment setup, script execution, and result interpretation.

Contribution Guidelines

Getting Started:

  • Fork the repository.
  • Setup your environment with requirements.txt.
  • Familiarize yourself with the code and dataset.

Contributing:

  • Add or improve models, or refine preprocessing methods.
  • Ensure your code is documented and aligns with the project's style.
  • Submit pull requests with a detailed description of changes.

Reporting Issues:

  • Use GitHub Issues for bug reports, feature requests, or discussions.
  • Provide detailed bug descriptions and reproduction steps.

Community:

  • Engage in discussions, share results, ask questions.
  • Adhere to community guidelines for a collaborative environment.

License

This project is open-source, available under MIT License.

Acknowledgements

Thanks to all contributors and community members for their valuable participation and insights in advancing AI in financial fraud detection.

financial-fraud-detection-using-llms's People

Contributors

amitkedia007 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.