Code Monkey home page Code Monkey logo

eark-ip-test-corpus's Introduction

eark-ip-test-corpus

The purpose of the E-ARK test corpus is to make it possible to see if any given validator is covering all the rules laid out in the specifications.

Quick Start

Introduction

This repository is home to the E-ARK information package validation test corpus.

The E-ARK test corpus comprises a set of human readable and machine readable test cases which also includes E-ARK information packages that contain valid and invalid samples related to each of the requirements within the following specifications:

The corpus serves two purposes:

  • It provides a set of information package samples that can be used to test software validators for the CSIP, SIP, and DIP, this includes our own validator.
  • The creation of the corpus tests our understanding of the current specification document. Every test case represents a simple, implemented example of a validation error or warning with accompanying valid cases.

Status

The test corpus is currently in development.


Scope & Coverage

Specification coverage

An ideal validation corpus should be a physical demonstration of each and every requirement stated in the specifications. In testing terms, this aspiration is expressed as 100% coverage of the specification. The progress towards this milestone can be expressed as a percentage coverage of the specification requirements.

Out of scope

There are some test case and corpus examples that we consider to be explicitly out of scope

Testing METS schema validation

We won't be providing test cases and corpus examples for issues that should be caught by automated XML parsing with XML schema validation enabled. The E-ARK CSIP METS implementation is based upon version 1.12 of the METS schema: https://www.loc.gov/standards/mets/version112/mets.xsd. We won't be creating test cases that represent a missing mets root element, the lack of a structMap element, or mis-spelled METS element and attribute names in general.


Corpus Structure

To Be Discussed


Validation Corpus Creation

Process Overview

The E-ARK Common Specification is the source of requirements and the yardstick by which progress is measured. The test corpus creation process is driven by the specification conditions. These requirements are read carefully and used to produce test cases. These documented test cases provide the backlog for the creation of corpus test packages. For each test case there should be corpus packages demonstrating the pass and fail conditions.

Requirements to Test Cases

While the aim is to produce test cases, analysis of the specification will also lead to improvements to the specification. To illustrate, consider a seemingly simple requirement from the specification:

CSIP4: The Information Package folder MUST include a metadata file named METS.xml, which includes information about the identity and structure of the package and its components;

At first this suggests a simple pass and fail case, one with the file present and another from which it is missing. A more careful reading reveals that the file should contain "information about the identity and structure of the package...". The wording is problematic as there are no details of what the information should be. Reading with a testing mindset yields further varieties of test case, e.g. what about case-sensitivity in the name - are mets.xml, METS.XML, or even mEts.xMl acceptable?

Adding Test Cases to the backlog

The next actions here should be:

  • add the set of test cases to be produced to the project issue tracker (Guide / Template / this example coming soon).
  • log an issue on the specification project tracker suggesting that the wording for the second part of the requirement is reviewed or removed.

Creating Test Cases

The full guide to test case creation is in the TESTCASES.md document.

Tracking Progress & Metrics

Test Case backlog

The number of open issues with a test case label... To Be Discussed

eark-ip-test-corpus's People

Contributors

aebkmd avatar andersbonielsen avatar carlwilson avatar koit avatar ltaimre avatar phillipaasvangtommerholt avatar philliptommerholt avatar scfkmd avatar yllemagin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.