Light

kartikvedi / medical-data-extraction Goto Github PK

View Code? Open in Web Editor NEW

This project forked from prathyyyyy/medical-data-extraction

0.0 0.0 0.0 6.54 MB

Medical Data Extraction By Pytesseract (Google Optical Character Recognition Engine) and Computer Vision

Python 34.79% Jupyter Notebook 65.21%

medical-data-extraction's Introduction

Medical Data Extraction :

Introduction :

This project is to implement medical data extraction , and this project will auto classify and extract useful information from medicalcare documents.
Implemented this project by using libraries - Pytesseract(Runs On Google Optical Character Recognition-OCR), Computer Vision, Regex, PDF2Image, Pytest.
At first we use PDF2Image library to convert PDF into image, clean the image with Computer Vision by Adaptive Thresholding Techinique and extract useful data by using Pytesseract(OCR) and regex.
This project works well on medicalcare documents(like extracting name, patient details, medicine) and this saves time as it reduces human work and saves time from 15 min to 2 min.

Major Libraries and tools :

Pdf2image - This library is used in this project to convert medical document pdf file to text
openCV - This library is used in this project for processing and cleaning the image by adaptive threshold method.
pytesseract - This library is used in this project for converting image containing useful information to text and this library runs on google optical character recognition engine (OCR).
regex - This library is used in this project for extracting useful and required text from text data .
Pytest - This library is used in this project to perform effective testing .

Project workflow :

Documents : Sample data used in this project - [https://github.com/prathyyyyy/Medical-Data-Extraction/tree/main/document_extractor/Documents].
Source code : Major files - [https://github.com/prathyyyyy/Medical-Data-Extraction/tree/main/document_extractor/src].
Test files - [https://github.com/prathyyyyy/Medical-Data-Extraction/tree/main/document_extractor/tests].
Minor files : These are minor files and notebook for learning about dataset - [https://github.com/prathyyyyy/Medical-Data-Extraction/tree/main/document_extractor/Notebook].

Project tasks :

Collect and dataset.
Analyse the data and get useful insights from data .
Use pdf2image to convert pdf to image.
use computer vision (opencv) to process and clean the image by adaptive thresholding technique.
By pyteserract (OCR) convert the processed and clean image to extract the data .
By Regular expression (RE) extract useful information from the extracted text data.
By pytest test the project and quality of code .

Future Goals Regarding This Project :

As this model works well and by using advanced Computer Vision Concept like region of intrest(ROI) we can extract data with high accuracy
Using further regex we can extract more useful fields in document to get more insight and save further time.

medical-data-extraction's People

Contributors

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.