AksharaJaana
AksharaJaana is a package which uses tesseract ocr in the backend to convert the read-only kannada text to editable format. A Special feature of this is it can separate columns in the page and thus making it easier to read and edit. Do Conider use this package if necessary and feel free to dm me for any clarifications my mail id is: [email protected]. Happy coding and installing
The Requirements
Conda environment is prefered for the smooth use
- AksharaJaana (pip package), check out the latest version available
- Tesseract
- poppler
Details for Installation
Complete installation including requirements
Ubuntu
- Installing tesseract-ocr in the system
- open terminal
- sudo apt-get update -y
- sudo apt-get install -y tesseract-ocr
- Installing poppler in the system
- sudo apt-get install -y poppler-utils
- Installing python and pip (if pip is not installed)
- sudo apt install python==3.6.9
- Installing packages for AksharaJaana
- pip install AksharaJaana
Windows
- Installing tesseract-ocr in the system
- go to the website --> click on --> tesseract-ocr-w64-setup-v5.0.0-alpha.20200328.exe (64 bit) resp.
- After downloading open that file --> hit enter --> you will give an option to choose the languages. --> choose kannada in both script and language
- next check if this folder *C:\Program Files\Tesseract-OCR* is present
- If yes, follow below procedure
- Add C:\Program Files\Tesseract-OCR\ to your system PATH by doing the following:
1. Click on the Windows start button, search for Edit the system environment variables, click on Environment Variables
2. under System variables, look for and double-click on PATH, click on New
3. then add C:\Program Files\Tesseract-OCR, click OK
- if no, manually add the folder tesseract-ocr to the Program Files in the C drive which must be present at the download section (after extraction) and follow the same procedure
- Installing poppler in the system
- go to this page
- click on --> poppler-0.54_x86
- after downloading open that file --> hit enter --> you will give an option to choose the languages. --> choose kannada in both script and lang
- next after installation of that files. check if this folder is present C:\Program Files\Tesseract-OCR\
- if yes, then search for follow below procedure
- Add C:\Program Files\poppler-0.68.0_x86\bin to your system PATH by doing the following:
1. Click on the Windows start button, search for Edit the system environment variables, click on Environment Variables
2. under System variables, look for and double-click on PATH, click on New
3. then add C:\Users\Program Files\poppler-0.68.0_x86\bin, click OK
- if no, manually add the folder poppler-0.68.0_x86 to the Program Files in the C drive which must be present at the download section (after extraction) and follow the same procedure
- Installing python and pip in the system (If pip is not installed)
- Click here and download python
- Installing packages for AksharaJaana
- open command prompt
- pip install AksharaJaana
- Reboot the system before starting to use
Python Script
Its in test.py in Github Repo
import AksharaJaana.main as ak
text = ak.ocr_engine("Your file Path")
print(text)