Code Monkey home page Code Monkey logo

corpora-1's Introduction

corpora

corpora is a Django project for gathering corpora for different languages. It's been built to support any number of languages with Te Reo Māori as its first.

The goal of this app is to streamline corpora gathering for minority languages so that dictation, personal assistants, and other technologies can work in te reo Māori, ʻōlelo Hawaiʻi, and other indigenous languages.

Supported Languages

  • Māori

You can help us add more languages by translating this app. The current project is live at https://koreromaori.com/. If you'd like to lead a copora gathering campaign for your language, get in touch as we'd love to help.

Kōrero Māori

Kōrero Māori is the project that's funding the build of corpora. Kōrero Māori is an initiative started by Te Hiku Media and supported by a number of organizations. The goal is to train machines to transcribe thousands of hours or native language speaker audio recordings to make native te reo Māori more accessible to language learners as our native speaker population is in decline.

We are always looking for more support either financially or through in kind contributions. If you're keen to get involved please get in touch by emailing us at [email protected].

Current Funding

Project Partners

License: Kaitiakitanga

Corpora (the code in this repository) is copyrighted by Te Reo Irirangi o Te Hiku o Te Ika (Te Hiku Media) under our Kaitiakitanga License. Kaitiaki is a Māori word without specific English translation, but its meaning is similar to the words guardian, protector, and custodian . In this context we protect the code in this repository and will provide access to the code as we deem fit through our tikanga (Māori customs and protocols).

While we recognize the importance of open source technology, we're mindful that the majority of tangata whenua and other indigenous peoples may not have access to the resources that enable them to benefit from open source technologies. As tangata whenua, our ability to grow, develop, and innovate has been stymied through colonization. We must protect our ability to grow as tangata whenua. By simply open sourcing our data and knowledge, we further allow ourselves to be colonised digitally in the modern world.

The Kaitiakitanga License is a work in progress. It's a living license. It will evolve as we see fit. We hope to develop a license that is an international example for indigenous people's retention of mana over data and other intellectual property in a Western construct.

While the Kaitiakitanga License is still under development, here are some of its terms:

  1. You must contact us and seek permission to access, use, contribute towards, or modify code in this repository;
  2. You may not use code in this repository or any derivations for commercial purposes unless we explicitly grant you the right to do so;
  3. All works derived from code in this repository are bound by the Kaitiakitanga License;
  4. All works that make use of any code in this repository are bound by the Kaitiakitanga License.

This project is funded through past grievances committed by the Crown

This project was made possible by funds that support the revitalisation of te reo Māori and that supports the growth of Māori people and organizations in the ICT industry. These funds were made available because the New Zealand Government failed to uphold The Treaty of Waitangi (https://teara.govt.nz/en/treaty-of-waitangi). Te reo Māori was onced banned in schools, with some students being physically abused for speaking te reo Māori. The Crown recognizes its unlawful behaviour in disenfranchising the tangata whenua of New Zealand, and legislation like the Māori Language Act have enabled steps towards the revitalisation of te reo Māori.

corpora-1's People

Contributors

edwardabraham avatar gregplaysguitar avatar kmahelona avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.