Code Monkey home page Code Monkey logo

chronoi-corpus-processing's Introduction

Chronoi Corpus Processing

This is a collection of loosely related scripts and other resources that were used in setting up the Chronoi pilot corpus as well as the extended multilingual corpus. They are meant to document and make reproducible the following steps in corpus setup and analysis:

Some scripts cover experimental steps that were never actually used in the end. These include:

  • automatic translation of temponyms and corpus data (at translation)
  • using machine learning approaches in the detection step (at learning)

Setup and use

The main container chronoi-pilot expects two directory paths in a .env-file, one for output and for input. An example is in the .env.example. The input folder is expected to contain pdf- and/or text files to process.

To pull our heideltime fork as a submodule, run:

git submodule update --init

The total setup comes in the form of three docker containers which can now be started with:

docker-compose up

Besides the chronoi-pilot container there are two additional containers b which will be started by that command.

The heideltime container offers a command heideltime that can be used with e.g. docker exec. It mounts the output directory of the chronoi-pilot container so that it can work on the data produced by that container.

The container tempeval3 also mounts the output directory of the chronoi-pilot container so that it can work on the data produced by that container. It was mainly used for checking our evaluation against an official script and will probably not be needed for most use cases.

Examples for the usage of the containers from the host are given in the experiments folder.

chronoi-corpus-processing's People

Contributors

neuged avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.