Code Monkey home page Code Monkey logo

ekorpkit's Introduction

ekorpkit 【iːkɔːkɪt】 : eKonomic Research Python Toolkit

PyPI version Jupyter Book Badge DOI release CodeQL test CircleCI codecov markdown-autodocs

eKorpkit provides a flexible interface for NLP and ML research pipelines such as extraction, transformation, tokenization, training, and visualization. Its powerful config composition is backed by Hydra.

Key features

Easy Configuration

  • You can compose your configuration dynamically, enabling you to easily get the perfect configuration for each research.
  • You can override everything from the command line, which makes experimentation fast, and removes the need to maintain multiple similar configuration files.
  • With a help of the eKonf class, it is also easy to compose configurations in a jupyter notebook environment.

No Boilerplate

  • eKorpkit lets you focus on the problem at hand instead of spending time on boilerplate code like command line flags, loading configuration files, logging etc.

Workflows

  • A workflow is a configurable automated process that will run one or more jobs.
  • You can divide your research into several unit jobs (tasks), then combine those jobs into one workflow.
  • You can have multiple workflows, each of which can perform a different set of tasks.

Sharable and Reproducible

  • With eKorpkit, you can easily share your datasets and models.
  • Sharing configs along with datasets and models makes every research reproducible.
  • You can share each unit jobs or an entire workflow.

Pluggable Architecture

  • eKorpkit has a pluggable architecture, enabling it to combine with your own implementation.

Tutorials for ekorpkit package can be found at https://entelecheia.github.io/ekorpkit-book/

Install the latest version of ekorpkit:

pip install ekorpkit

To install all extra dependencies,

pip install ekorpkit[all]

The eKorpkit Corpus is a large, diverse, bilingual (ko/en) language modelling dataset.

ekorpkit corpus

Citation

@software{lee_2022_6497226,
  author       = {Young Joon Lee},
  title        = {eKorpkit: eKonomic Research Python Toolkit},
  month        = apr,
  year         = 2022,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.6497226},
  url          = {https://doi.org/10.5281/zenodo.6497226}
}
@software{lee_2022_ekorpkit,
  author       = {Young Joon Lee},
  title        = {eKorpkit: eKonomic Research Python Toolkit},
  month        = apr,
  year         = 2022,
  publisher    = {GitHub},
  url          = {https://github.com/entelecheia/ekorpkit}
}

License

  • eKorpkit is licensed under the MIT License. This license covers the eKorpkit package and all of its components.
  • Each corpus adheres to its own license policy. Please check the license of the corpus before using it!

ekorpkit's People

Contributors

entelecheia avatar trellixvulnteam avatar actions-user avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.