cneud / ocr-gt Goto Github PK
View Code? Open in Web Editor NEWOCR & Ground Truth Resources
Home Page: https://cneud.github.io/ocr-gt/
OCR & Ground Truth Resources
Home Page: https://cneud.github.io/ocr-gt/
ht @M3ssman
This could be very simple, just a Jekyll page iterating through the entries in the YAML data and presenting them in a minimal HTML version at https://cneud.github.io/ocr-gt. It would even automatically rebuild for every change to the data. Or do you have another plan?
Since the majority of the information contained here has now been merged into kba/awesome-ocr and also due to the fact that Markdown tables are neither fun to edit nor specifically practical to reuse, I wonder: some information from here cannot easily be carried over to the awesome list without messing it up, but may still be relevant for some - would a JSON file make more sense perhaps? Or are there any other ideas, opinions on this? @kba @jbaiter
Would you like to add links to these repos?
I notices that you are the one who uploaded the data to these repos a few years ago :-)
I assume that the first two are the same as the existing pol & spa data in your table.
Still, you can add them as alternative links.
First of all, great work, thank you for putting together this list :-)
However, currently the only metric available in the comparison is the number of pages, which can be misleading. Archiscribe is listed with >3000 pages and e.g. the various IMPACT datasets with merely 50-200 pages. This makes it seem like the Archiscribe corpus is far bigger than it is: For each page, only one line was transcribed, while for IMPACT this was done for every line.
I think that, since for most OCR engines the line is the basis for training models, the comparison should reflect that.
https://files.ifi.uzh.ch/cl/OCR19thSAC/
OCR and partly crowd-sourced corrections of OCR for Text+Berg Digital project.
README: https://files.ifi.uzh.ch/cl/OCR19thSAC/readme.txt
License: CC-BY (https://files.ifi.uzh.ch/cl/OCR19thSAC/license.html)
The IAM Handwriting Database contains forms of handwritten English text which can be used to train and test handwritten text recognizers and to perform writer identification and verification experiments.
https://fki.tic.heia-fr.ch/databases/iam-handwriting-database
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.