Code Monkey home page Code Monkey logo

ocrmultieval's Introduction

ocrmultieval

Proof-of-concept for extensible evaluation of (intermediate) results of an OCR workflow

Installation

make deps install

Architecture

All evaluation functionality is provided by backends.

Every backend inherits from EvalBackend and must implement a compare_files method, that accepts paths to and media types of the Ground Truth and detection results, does the actual evaluation and returns an EvalReport.

An EvalReport is a map of metrics to their resp. value and can be serialized as JSON or CSV for further processing/analysis.

The glue code for running the backends is in ocrmultieval.runner.py.

Usage

CLI

The ocrmultieval compare command line tool allows evaluating individual pages of GT and detection with any of the available backends.

Usage: ocrmultieval compare [OPTIONS] {dinglehopper|ocrevalUAtion|PrimaTextEva
                            l|CorAsvAnnEval|CorAsvAnnCompare|OcrdSegmentEvalua
                            te|IsriOcreval} GT_FILE OCR_FILE

Options:
  --gt-mediatype TEXT
  --ocr-mediatype TEXT
  --format [csv|json|yaml|xml]
  -g, --pageId TEXT             pageId to uniquely identify pages in a work
  --help                        Show this message and exit.

OCR-D processor

The ocrd-ocrmultieval command line tool implments the OCR-D processor API and can be used to process complete workspaces.

Usage: ocrd-ocrmultieval [OPTIONS]

  Evaluate

  > Eval processor

Options:
  -I, --input-file-grp USE        File group(s) used as input
  -O, --output-file-grp USE       File group(s) used as output
  -g, --page-id ID                Physical page ID(s) to process
  --overwrite                     Remove existing output pages/images
                                  (with --page-id, remove only those)
  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string
                                  or JSON file path
  -P, --param-override KEY VAL    Override a single JSON object key-value pair,
                                  taking precedence over --parameter
  -m, --mets URL-PATH             URL or file path of METS to process
  -w, --working-dir PATH          Working directory of local workspace
  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
                                  Log level
  -C, --show-resource RESNAME     Dump the content of processor resource RESNAME
  -L, --list-resources            List names of processor resources
  -J, --dump-json                 Dump tool description as JSON and exit
  -h, --help                      This help message
  -V, --version                   Show version

Parameters:
   "backend" [string - "PrimaTextEval"]
    Backend to use
    Possible values: ["PrimaTextEval", "ocrevalUAtion", "dinglehopper",
    "OcrdSegmentEvaluate", "IsriOcreval", "CorAsvAnnCompare"]
   "format" [string - "csv"]
    Output format
    Possible values: ["csv", "json", "yaml", "xml"]
   "config" [object]
    Configuration to override default

Default Wiring:
  ['GT,OCR1'] -> ['GT_VS_OCR1']

ocrmultieval's People

Contributors

kba avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ocrmultieval's Issues

CorAsvAnnEval: wrap CLI instead of API

https://github.com/kba/ocrmultieval/blob/5de79f3021b48f83f9cb798a484fd472d21ed94b/ocrmultieval/backends/CorAsvAnnEval.py#L18-L20

I can see it would be hard to repeat everything in ocrd_cor_asv_ann.scripts.compare. So why not instead call the CLI and have it write its report where we want it?

Alternatively, keep the API as it is and emulate Click's context as in:

from ocrd_cor_asv_ann.scripts.compare import cli as standalone_cli

...

def compare_files(self, gt_mediatype, gt_file, ocr_mediatype, ocr_file, pageId):
    outfile = StringIO()
    kwargs = {'output_file': outfile, 
              'gt_file': gt_file,
              'ocr_files': [ocr_file],
              # add more of your non-default choices here, e.g. normalization, gt_level, confusion, histogram
              # if gt_mediatype is plaintext path list, you need to set file_lists=True
    }
    standalone_cli(**kwargs)
    report = outfile.getvalue()
    # dive into this file pair's aggregates
    metrics = report[gt_file + ',' + ocr_file]
    # maybe delete the individual line metrics
    del metrics['lines']
    return self.make_report(gt_file, ocr_file, pageId, **metrics)

OcrdSegmentEvaluate: use different metric

https://github.com/kba/ocrmultieval/blob/5de79f3021b48f83f9cb798a484fd472d21ed94b/ocrmultieval/backends/OcrdSegmentEvaluate.py#L27-L28

This retrieves only the mAP score, which is the least useful/adequate, and has only been added for comparison with similar benchmarks. The better keys would be precision | recall | pixel_precision | pixel_recall | pixel_iou | oversegmentation | undersegmentation under either by-categorycategory or by-imagepageidcategory.

deployment: separate dependencies for backends?

I'm afraid the heavy toll of packages like ocrd_segment and ocrd_cor_asv_ann (both of which require TF) might not only be annoying here, but quickly lead to clashes between backends.

I understand it would be harder to integrate backends via CLI, but that would make it possible to call them from sub-venvs or Docker, and opt out of them (at least during ocrmultieval setup).

OcrdSegmentEvaluate: ensure binarized image fits page

https://github.com/kba/ocrmultieval/blob/5de79f3021b48f83f9cb798a484fd472d21ed94b/ocrmultieval/backends/OcrdSegmentEvaluate.py#L21-L23

This does not cover the case where the binary image is itself cropped or deskewed, i.e. does not represent the full PAGE. It will run into an assertion failure if not satisfied. You better watch the @comments for cropped or deskewed. Either you find some image without them (i.e. binarized before cropping), or do not pass the binary image (which will effectively run without only-fg, i.e. on the full segment masks). Also, try to fetch an image without clipped (as these obviously distort the evaluation).

In ocrd_segment.evaluate.EvaluateSegmentation (the OCR-D wrapper) we add relative coordinates in this case (i.e. whatever is consistent with the binary image).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.