Code Monkey home page Code Monkey logo

core's People

Contributors

b2m avatar bertsky avatar cneud avatar finkf avatar hnesk avatar j23d avatar joschrew avatar kba avatar m3ssman avatar mehmedgit avatar mexthecat avatar mikegerber avatar mweidling avatar stweil avatar tdoan2010 avatar witiko avatar wrznr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

core's Issues

Feature request for ocrd workspace

It would be nice to

  • list-group: list the USE attribute of all file groups.
  • list-id: list the group ID of all files in a file group
  • ``: get mimetype of a file referenced by USE and GROUPID or by ID

Naming conflict in model spec

FileNotFoundError: [Errno 2] No such file or directory: '/home/kmw/projects/dwds/ocr/src/ocrd_kraken/env/lib/python3.5/site-packages/ocrd/model/yaml/ocrd_oas3.spec.yml'

but we have
pyocrd/ocrd/model/yaml/ocrd_oas3.yml

cached file names should retain extension

Currently files are cached by the URL sans all non-alnum characters removed. This confuses tools that rely on the file extension to detect file type.

Easy fix would be to replace 1:n non-alnum characters with .

(Smoke) testing

To ensure code remains functional, some basic test of functionality is required. Helps while refactoring the code (e.g. #28 #9 #20)

First, fix output (currently XML declaration is duplicated for each page tree).

Then have an example (e.g. the one in @OCR-D/spec) with the expected output and create a test script to ensure it's still produced.

Make sure the tool is deployable and functional by adding CI to test continuously. Extend test script / examples to reflect extended CLI/specs.

Replace "pyocrd"

Still used in some constants and filenames such as tempfile names.

Drop dependency on xmllint

xmllint from libxml2 arguably produces the most predictable pretty-printed XML but it should be an optional dependency, gracefully fall back to lxml tools when not available.

Line detection via ocrd: PAGE XML is overwritten, result is empty

Using

ocrd process -m ocrd-assets/dist/mets.xml characterize/exif segment-region/tesserocr

works great (i.e. metadata and regions show up in OUTPUT PAGE XML).
However, adding line detection

ocrd process -m ocrd-assets/dist/mets.xml characterize/exif segment-region/tesserocr segment-line/tesserocr

results in “empty” XML:

<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2017-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2017-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2017-07-15/pagecontent.xsd">
        <Page imageFileName="http://localhost:5001/00000005.tif">
        </Page>
</PcGts>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.