Code Monkey home page Code Monkey logo

dataspeakgpt's Introduction

DataSpeakGPT

Overview

DataSpeakGPT is an advanced and feature-rich text processing and Optical Character Recognition (OCR) suite powered by the cutting-edge GPT-3.5 Turbo language model. This comprehensive toolset consists of two robust scripts, FileReaderGPT.py and OcrGPT.py, designed to handle a diverse range of file formats and significantly enhance text recognition accuracy.

Features

1. FileReaderGPT.py

File Format Support

  • CSV, JSON, PDF, Text: FileReaderGPT.py supports reading and processing files in these formats, providing a versatile solution for different data structures and content types.

GPT-3.5 Turbo Integration

  • Natural Language Processing: Leveraging the GPT-3.5 Turbo language model, the script offers sophisticated natural language processing capabilities for user interactions and content analysis.

PDF Extraction

  • PyPDF2 and pdfplumber Integration: Seamless integration with PyPDF2 and pdfplumber for efficient PDF text extraction, ensuring reliable and accurate processing of PDF documents.

Dynamic Chunking

  • Optimized Processing: For large text files, FileReaderGPT.py dynamically chunks the content, optimizing interactions with GPT-3.5 Turbo and enhancing performance.

2. OcrGPT.py

Optical Character Recognition (OCR)

  • EasyOCR Library: OcrGPT.py utilizes the EasyOCR library for accurate Optical Character Recognition from images, supporting multiple languages for enhanced versatility.

GPT-3.5 Turbo Text Enhancement

  • Grammar and Word Fixing: After OCR, GPT-3.5 Turbo is employed to fix grammar issues and improve the recognized text, ensuring the highest quality output.

Multilingual Support

  • Language Selection: OcrGPT.py supports OCR in multiple languages, providing flexibility for users working with diverse linguistic content.

Interactive User Experience

  • Real-time GPT-3.5 Turbo Responses: Users can interactively experience real-time responses from GPT-3.5 Turbo, providing an engaging and dynamic user experience.

Usage

  1. FileReaderGPT.py:

    • Run the script.
    • Enter the path of the file you want to process.
    • Experience intelligent file content analysis and receive improvement suggestions.
  2. OcrGPT.py:

    • Run the script.
    • Enter the path of the image you want to perform OCR on.
    • Witness accurate text extraction and GPT-3.5 Turbo-powered text enhancement.

Getting Started

  1. Installation:

    • Ensure Python and required dependencies are installed.
    • Clone the repository: git clone <repository-url>
    • Navigate to the project directory: cd DataSpeakGPT
    • Install necessary dependencies: pip install -r requirements.txt
  2. Examples:

    • FileReaderGPT.py: python FileReaderGPT.py
    • OcrGPT.py: python OcrGPT.py

Contributing

Contributions are welcome! Please follow the contribution guidelines.

License

This project is licensed under the MIT License.


DataSpeakGPT empowers users with advanced text processing and OCR capabilities, seamlessly integrating GPT-3.5 Turbo for unparalleled natural language understanding. Elevate your data transformation and refinement processes with this comprehensive suite.

Explore the limitless possibilities of DataSpeakGPT and transform your data into refined, polished information effortlessly.

dataspeakgpt's People

Contributors

mshojaei77 avatar

Stargazers

 avatar  avatar Hossein Molaei avatar Ali Dehkhodaei avatar Vargha Khallokhi avatar Elmira Ghorbani avatar mahdi ebrahimpour avatar  avatar some happy boy avatar ALi.w avatar  avatar Bach avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.