Code Monkey home page Code Monkey logo

our-voices-model-competition's Introduction

Our voices model competition

About

We want to see - and incentivise! - great diversity, equity and inclusion-conscious work being done with the Common Voice dataset. We are running a model and methods competition with three broad themes, plus an open category.

Register your interest

Start today by registering your interest with this form, and you'll receive a participant pack with guidance, resources, advice and more to help you.

Please read the full rules if you are considering applying.

Participant packs

Localised versions of the participant packs can be found on our Google Drive.

What are we looking for?

Your entry must be a diversity, equity and inclusion-conscious Model or Method under one of the following categories. It must primarily make use of Mozilla Common Voice data from the 11th release (September 2022). Outside of this, we are being deliberately open-ended. However, here are some illustrative examples;

Categories About
Gender An STT model for an under-resourced language that performs equally well for women
Variant, Dialect or Accent 1) Proof of concept for an under-served language variant delivered with a small ‘toy’ corpus 2) Accent classifiers by, and for, a community
Methods and Measures 1) A benchmark bias corpus 2) Dataset audit methodology
Open Exciting DEI work primarily using Common Voice that doesn't fit into the categories above

Judging will be done by Sponsor or its designees, who shall have sole discretion in determining winners based on the following equally weighted criteria:

  • Word Error Rate (“WER”) - how many word recognition mistakes your model makes when used on a fresh dataset
  • WER Score when balanced by gender or Accent demographics (as per competition theme)
  • Utility - this is a judgment scoring by panelists evaluating how effective, original and useful your method or measure would be
  • Social need / ecosystem value - whether this model adds value to the universe of other models for the same language. We do not disqualify submissions that are not open source, but when considering ecosystem value-add within the wider rubric, we will consider the license under which your work is available.
  • Deployability rating - this is a judgment scoring by panelists evaluating how easy would this be to install in an application
  • Environmental impact rating (via GPU usage) - this has two components - expert panel rating plus required provision of their processing stats - aka how 'hungry' is your model? is it written to be efficient?

How are you making sure it's easy for all languages to participate?

  • We are actively encouraging submissions at proof of concept stage that use a small or 'toy' corpus
  • Our methodology and methods category enables teams to submit outlines for tools that they do not yet have the resources to build out further
  • We have allowed a month of development time to accomodate those relying on CPU / slower compute
  • Languages will be judged within 'Bands' - high resource, medium resource and low resource - to ensure a fairer competition between languages that exist in different contexts
  • We are creating a flexible, holistic rubric that makes it possible for judges to look at ecosystem value-add factors beyond performance metrics like Word Error Rate
  • For marginalised communities who have governance concerns about releasing their model under an open source license, they are welcome to submit with an explanation to that effect, and this will be considered accordingly

Our judges

Who are the judging panel?

Submission process

In order to submit your code for the competition, you should do the following:

  • Fork this repository in GitHub.
  • Create a new directory in a subdirectory of submit/ that corresponds to the category you want to submit in.
  • Commit and push your code to that subdirectory in your fork.
  • Open a pull request to this repository.
  • You must then make your final submission using this form. Submissions which are not submitted via the Typeform will not be able to be considered.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.