c-jordi / pdf2data Goto Github PK
View Code? Open in Web Editor NEWA pdf segmentation and annotation tool for archival documents.
License: MIT License
A pdf segmentation and annotation tool for archival documents.
License: MIT License
So far, only a handful of font types (Calibri, Times New Roman, etc.) are considered for the extraction of features. However, we may encounter many different fonts. We need to implement the following:
This feature will be part of the feature extraction process, contained in server/application/feature
.
Running the feature extraction with textline leads to a ValueError.
ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 2 is required by AgglomerativeClustering.
Despite we will add more complex models to extract features, perhaps add the following extra, to each of the tree levels
We need to create one entry per page in the Table DB. The keys to fill in are quite straightforward, and the only question that arises is, when to do it??
Since this is just a matter of commit this to the DB, could you decide @c-jordi when is the best moment and implement it?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.